Site Reliability Engineer (SRE III)
Salary budget : Approximate Rands 40,000-50,000 / month
Working location : Cape Town, South Africa
Working mode : Hybrid
Company background : A leading technology-driven organization specializing in scalable cloud and platform engineering solutions, committed to innovation, automation, and high system reliability.
Employment type : 5 years contract renewable
Role Summary
Are you passionate about building resilient, automated, and scalable systems in the cloud?
Do you thrive in fast-paced environments where reliability and performance are key to success?
We’re looking for a Site Reliability Engineer (SRE III) to join our Platform Engineering team supporting the CDP environment . In this role, you will design, maintain, and improve critical infrastructure that powers our applications. You will work closely with developers and operations teams to build systems that are automated, observable, and secure.
You’ll have the opportunity to shape our CI / CD pipelines, Infrastructure-as-Code (IaC) practices, and monitoring frameworks—ensuring that our systems are performant, compliant, and aligned with best DevOps standards.
Key Responsibilities
- Maintain and improve system reliability, uptime, and performance across production and non-production environments.
- Design, implement, and optimize CI / CD pipelines using GitHub and related automation tools.
- Implement and manage AWS-based infrastructure using Infrastructure as Code (IaC) practices.
- Develop scalable Kubernetes clusters and ensure containerized workloads meet performance and security standards.
- Proactively monitor and respond to incidents using tools such as DataDog, driving root cause analysis and long-term stability improvements.
- Enhance automation and observability to reduce manual intervention and mean time to recovery (MTTR).
- Collaborate cross-functionally with engineering, product, and operations teams to ensure seamless deployment and reliability.
- Ensure compliance with security and operational standards across environments.
Requirements
Experience : Minimum 4+ years in Site Reliability Engineering, DevOps, or related roles.
Primary Skills
GitHub and CI / CD pipeline design and maintenanceAutomation and scripting for infrastructure reliabilitySecondary Skills
Kubernetes and container orchestrationMonitoring and alerting tools (DataDog preferred)Security compliance and environment hardeningNice-to-have
Experience with cost optimization and performance tuning on AWSHands‑on experience with microservices and distributed systemsExposure to DevSecOps or modern SRE frameworksWhat We Offer
A full‑time permanent position in a technology‑driven environment.Opportunities to lead reliability initiatives and influence infrastructure strategy.Exposure to cutting‑edge cloud technologies and automation frameworks.A collaborative, multicultural engineering culture focused on growth and innovation.#J-18808-Ljbffr