← Back to Jobs
Apple logo

Service Reliability Engineer, G&A Solutions Engineering

Apple
4.1(13951)
👥10k+
Software Engineering
Austin, TX
2 weeks ago
🤖 AI-First🛠️ Cursor-friendly
Apply →

Explicitly requires vibe coding skills.

About the Role

Join Apple's G&A Solutions Engineering team as a Service Reliability Engineer to ensure the reliability, scalability, and performance of mission-critical production services. You will apply SRE principles, automate operational tasks, lead incident response, and collaborate with engineers and operations teams to maintain service health and resiliency.

Job Description

Role

As a Service Reliability Engineer on Apple’s General and Administrative (G&A) Solutions Engineering team, you will maintain the health, stability, and efficiency of global, mission-critical production services. The role focuses on applying SRE principles to ensure scalability and performance, automating operational work, leading incident response, and partnering with development and operations teams to design for operational excellence.

Key Responsibilities

  • Proactively monitor service performance, identify bottlenecks, and implement solutions to optimize efficiency and resilience
  • Lead incident response efforts, drive rapid resolution, and perform thorough root cause analysis (RCA)
  • Develop and implement automation strategies to reduce manual intervention and improve service resilience
  • Apply SRE principles to maintain and scale service infrastructure
  • Collaborate with development teams to design services with monitoring, alerting, and scalability best practices
  • Create and maintain documentation, runbooks, and service level objectives (SLOs)
  • Participate in on-call rotations to provide 24/7 support for critical services
  • Define and supervise key service level indicators (SLIs) and measure service reliability
  • Identify process improvement opportunities and drive continuous improvement initiatives
  • Promote a culture of continuous learning and knowledge sharing within the team

Requirements

Minimum Qualifications

  • 4+ years of experience in Site Reliability Engineering, DevOps, or a related role supporting large-scale enterprise services
  • Strong proficiency in at least one programming language (e.g., Python, Java, Go) and scripting languages (e.g., Bash, PowerShell)
  • Experience with cloud platforms (e.g., AWS, Azure, GCP) and cloud-native technologies (e.g., Kubernetes, Docker)
  • Hands-on experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Splunk, Datadog)
  • Bachelor’s degree in Computer Science or equivalent work experience

Preferred Qualifications

  • Familiarity with CI/CD pipelines and DevOps practices
  • Experience with database technologies (MySQL, PostgreSQL, NoSQL)
  • Knowledge of ITIL frameworks and incident management processes
  • Experience with vibe coding
  • Understanding of Linux/Unix system administration
  • Experience with configuration management tools (Ansible, Chef, Puppet)

Technologies and Tools Mentioned

Prometheus, Grafana, Splunk, Datadog, Kubernetes, Docker, AWS, Azure, GCP, Python, Java, Go, Bash, PowerShell, MySQL, PostgreSQL, NoSQL, Ansible, Chef, Puppet, Linux/Unix, CI/CD

Tech Stack

MySQLPostgreSQLNoSQLPythonJavaGoBashPowerShellAWSAzureGCPKubernetesDockerPrometheusGrafanaSplunkDatadogAnsibleChefPuppetLinux/UnixCI/CD pipelinesITILvibe coding

Skills

Site Reliability EngineeringIncident ResponseAutomationMonitoring and AlertingRoot Cause AnalysisSLO/SLA and SLI definitionCollaborationOn-call SupportProcess ImprovementDocumentationCI/CD and DevOps practicesConfiguration ManagementLinux/Unix AdministrationProgramming and ScriptingCloud OperationsITIL and Incident ManagementContinuous Learning

Experience Level

Mid