← Back to Jobs
Service Reliability Engineer
Hyderabad, Telangana
4 days ago
✨ NewMentions "vibe coding" explicitly as a preferred qualification.
About the Role
Apple is hiring a Service Reliability Engineer to ensure the reliability, scalability, and performance of global production services. The role focuses on applying SRE principles, automating operational tasks, leading incident response and RCA, and collaborating across engineering, DBA, data, and network teams to improve service resilience.
Job Description
Role
Apple is seeking a Service Reliability Engineer to maintain the health, stability, and efficiency of global production services. The engineer will apply SRE principles, automate operational work, lead incident response, and collaborate with development, data, DBA, and network teams to ensure services are scalable and reliable.
Key Responsibilities
- Proactively supervise service performance and identify bottlenecks to optimize efficiency and resilience.
- Lead incident response efforts, drive rapid resolution, and conduct detailed root cause analysis (RCA).
- Develop and implement automation strategies to streamline operational tasks and reduce manual intervention.
- Apply Site Reliability Engineering principles to maintain highly reliable and scalable infrastructure.
- Collaborate with development teams to ensure new services include monitoring, alerting, and scalability best practices.
- Create and maintain run-books, documentation, and service level objectives (SLOs).
- Participate in on-call rotations to provide 24/7 support for critical services.
- Define and supervise key service level indicators (SLIs) and drive process improvement initiatives.
- Foster a culture of continuous learning within the team.
Preferred Qualifications
- Familiarity with CI/CD pipelines and DevOps practices.
- Experience with database technologies (e.g., MySQL, PostgreSQL, NoSQL).
- Knowledge of ITIL frameworks and incident management processes.
- Experience with “vibe coding.”
- Understanding of Linux/Unix system administration.
- Experience with configuration management tools (Ansible, Chef, Puppet).
- Strong proficiency in at least one programming language (e.g., Python, Java, Go) and scripting languages (e.g., Bash, PowerShell).
- Experience with cloud platforms (AWS, Azure, GCP) and cloud-native technologies (Kubernetes, Docker).
- Hands-on experience with monitoring and alerting tools (Prometheus, Grafana, Splunk, Datadog).
- Proven experience handling issues in distributed systems.
Minimum Qualifications
- 4+ years of experience in Site Reliability Engineering, DevOps, or a related role supporting large-scale, enterprise-level services.
- Bachelor’s degree in Computer Science or a related field, or equivalent experience.
- Experience performing RCA of technical issues.