Service Reliability Engineer, G&A Solutions Engineering
Mentions "vibe coding" in preferred qualifications, so familiarity is a plus.
About the Role
Join Apple's G&A Solutions Engineering team as a Service Reliability Engineer to ensure the reliability, scalability, and performance of mission-critical production services. You will apply SRE principles, automate operational work, lead incident response, and collaborate across engineering, DBA, and network teams to maintain service health and resilience.
Job Description
Role
As a Service Reliability Engineer on Apple’s G&A Solutions Engineering team, you will be responsible for maintaining the health, stability, and efficiency of global production services. The role focuses on applying SRE principles to improve reliability, automate repetitive tasks, and collaborate with cross-functional teams to design services for operational excellence.
Key Responsibilities
- Proactively monitor service performance, identify bottlenecks, and implement optimization solutions
- Lead incident response, drive rapid resolution, and perform thorough root cause analysis (RCA)
- Develop and implement automation strategies to streamline operations and reduce manual intervention
- Apply SRE principles to maintain highly reliable and scalable service infrastructure
- Work with development teams to ensure new services include best practices for monitoring, alerting, and scalability
- Create and maintain documentation, including run-books and service level objectives (SLOs)
- Participate in on-call rotations to provide 24/7 support for critical services
- Identify process improvement opportunities and drive initiatives to enhance team effectiveness
- Define and supervise key service level indicators (SLIs) to measure and improve reliability
Minimum Qualifications
- 4+ years of experience in Site Reliability Engineering, DevOps, or a related role supporting large-scale, enterprise services
- Strong proficiency in at least one programming language (examples: Python, Java, Go) and scripting (examples: Bash, PowerShell)
- Experience with cloud platforms (examples: AWS, Azure, GCP) and cloud-native technologies (examples: Kubernetes, Docker)
- Hands-on experience with monitoring and alerting tools (examples: Prometheus, Grafana, Splunk, Datadog)
- Bachelor’s degree in Computer Science or equivalent work experience
- Experience supporting enterprise-level services and participating in on-call rotations
Preferred Qualifications
- Familiarity with CI/CD pipelines and DevOps practices
- Experience with database technologies (examples: MySQL, PostgreSQL, NoSQL databases)
- Knowledge of ITIL frameworks and incident management processes
- Experience with configuration management tools (examples: Ansible, Chef, Puppet)
- Understanding of Linux/Unix system administration
- Experience with “vibe coding”