Achieve measurable improvements in system uptime and performance by implementing robust reliability engineering practices and leading incident prevention initiatives.
Reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through streamlined incident response protocols and team readiness, ensuring minimal disruption to customers.
Build, lead, and develop a skilled team of Customer Reliability Engineers with a strong focus on ownership, collaboration, and continuous learning.
Ensure that reliability is embedded into service design, development, deployment, and operations by partnering with engineering, product, and operations teams.
Deliver clear and actionable reporting on reliability metrics to support leadership decision-making and continuous improvement.
Align reliability goals with customer expectations by addressing root causes of service degradation and championing seamless user experiences.
Identify and address potential reliability risks before they impact customers by implementing observability tools, runbooks, and automated responses.
Drive reliability improvements that also reduce operational costs by eliminating manual processes, optimizing resource usage, and reducing reactive work.
Ensure the continuous and stable operation of customer-facing systems by applying reliability engineering principles and best practices.
Oversee timely incident response, root cause analysis, and implementation of long-term fixes to prevent recurring issues and improve service resilience.
Build and lead a high-performing reliability engineering team, providing coaching, mentorship, and career development to support individual and team growth.
Work closely with software engineering, DevOps, product, and support teams to embed reliability into the end-to-end service lifecycle.
Ensure effective monitoring systems, dashboards, and alerts are in place to detect, respond to, and analyze system performance and failures.
Define and drive the implementation of a reliability roadmap aligned with business objectives, system scalability, and customer needs.
Translate system performance into customer impact metrics (., NPS, downtime minutes) and work to continuously enhance the end-user experience.
Track and report on key reliability metrics such as uptime, latency, error rates, and incident frequency to support transparency and data-driven decisions.
Proactively identify technical and operational risks, ensuring mitigation strategies are in place and aligned with compliance standards.
Foster a culture of experimentation and improvement by exploring automation, new tools, and process enhancements to strengthen reliability practices
Education
Bachelor's Degree in Computer Science, Software Engineering, Information Technology, or a related technical discipline.
Certifications in relevant areas such as Site Reliability Engineering (SRE), DevOps, ITIL, or Cloud Infrastructure (., AWS, Azure, GCP) are highly desirable.
A Master's Degree in Technology Management, Engineering, or Business Administration is an added advantage
Experience
Experience : 7–10 years of experience in IT operations, systems engineering, or reliability engineering within a technology-driven environment.
At least 3–5 years in a leadership or managerial role, with proven experience leading reliability or DevOps team
Skills
Hands-on experience implementing and managing observability platforms, monitoring tools (., Prometheus, Grafana, Splunk), and automation frameworks.
Demonstrated ability to lead incident response efforts, conduct root cause analysis, and implement sustainable, long-term service reliability improvements.
Experience working in agile environments and with cross-functional teams, including software development, infrastructure, product, and support.
Strong understanding of cloud-native technologies, container orchestration (., Kubernetes), CI / CD pipelines, and infrastructure as code (., Terraform, Ansible).
#J-18808-Ljbffr
Create a job alert for this search
Solution Manager • Roodepoort, Gauteng, South Africa
Related jobs
Promoted
Site Reliability Engineer – Midrand / Semi-Remote – Contract – R600 Per hour
E-MergePretoria, South Africa
Remote
Java development experience of at least 1 year.You will be coordinate with internal and external team members, including QA and BA, and knowledge sharing and compiling of operational manuals.Degree...Show moreLast updated: 30+ days ago
Promoted
Site Reliability Engineer (Egypt, USA, Phillipines, Mexico)
ABC WorldwidePretoria, South Africa
Site Reliability Engineer (SRE).Egypt (Cairo), USA, Phillipines, India.Initial 1-year Fixed term contract with option to move into a permanent position.
Our client, a global Business Process Outsour...Show moreLast updated: 30+ days ago
Promoted
Software Development Manager (C#) – Centurion (Hybrid) – up to R1.5m per annum
E-MergePretoria, South Africa
Join a forward-thinking company at the forefront of digital transformation in the insurance industry theyre looking for a Software Development Manager whos ready to take the wheel.This full-time, ...Show moreLast updated: 30+ days ago
Promoted
Storage Engineer
Recru-itJohannesburg, South Africa
The HCI and Storage Engineer will be primarily responsible for installing, monitoring, testing and.Converged Infrastructure (HCI), VxRail, UNITY Storage and VNX(e) Hardware and software.This role w...Show moreLast updated: 14 days ago
Promoted
Senior L2 Infra Engineer
A 1L Realization (Pty) LtdJohannesburg, South Africa
IT infrastructure support or systems administration.Strong knowledge of Windows / Linux servers, virtualization (VMware, Hyper-V), and networking.
Experience with cloud platforms (AWS, Azure, GCP) and...Show moreLast updated: 24 days ago
Promoted
Site Reliability Engineer – Midrand / Semi-Remote @ R750 PH
E-MergePretoria, South Africa
Remote
We are looking for a proactive and detail-oriented.IT operations, ensuring our systems are not only fast and reliable, but continuously improving.
Site Reliability Engineer (SRE),.IT Degree and / or r...Show moreLast updated: 30+ days ago
Manager - Customer Reliability Engineer.Achieve measurable improvements in system uptime and performance by implementing robust reliability engineering practices and leading incident prevention ini...Show moreLast updated: 3 hours ago
Promoted
Cloud Engineer (AWS)
Network RecruitmentJohannesburg, South Africa
Design and manage cloud-native infrastructure in.EC2, ECS / EKS, Lambda, IAM, and RDS).Build secure environments with proper IAM roles, encryption, and compliance controls.
Implement monitoring and al...Show moreLast updated: 24 days ago
Promoted
Customer Support Manager - Engineering Processing Equipment
Prostaff HoldingsJohannesburg, South Africa
Minimum requirements for the role : .A national diploma or degree in mechanical engineering or related field is essential for this role.
Previous experience having worked within a customer support, te...Show moreLast updated: 30+ days ago
Were looking for a talented and motivated.In this role, youll design, develop, test, and maintain software solutions that are scalable, efficient, and user centric.
Youll collaborate with cross-func...Show moreLast updated: 30+ days ago
Promoted
Quality & Reliability Engineer (QRE)
WatersEdge SolutionsJohannesburg, South Africa
You’ll be a central figure in ensuring our platform is secure, stable, and continuously improving.Working across product, development, and operations, you’ll own our CI / CD infrastructure, enforce q...Show moreLast updated: 30+ days ago
Promoted
Platform Engineer
WatersEdge SolutionsJohannesburg, South Africa
Hybrid (Johannesburg / Remote).SaaS | Financial Services | IT Infrastructure.WatersEdge Solutions is hiring on behalf of a rapidly growing technology company that’s redefining how organisations man...Show moreLast updated: 24 days ago
Promoted
Software Development Manager - Remote
Stafflink Recruitment SolutionsJohannesburg, South Africa
Remote
We are looking for a Software Development Manager to lead and mentor a team of software engineers and software testers, drive technical.
Have a minimum of 8- 10 years experience in software developm...Show moreLast updated: 30+ days ago
Our client in the Manufacturing industry specialises with building premium vehiclesthey engineer the future of mobility.Senior Site Reliability Engineer.
If you''re passionate about automation, clo...Show moreLast updated: 30+ days ago
Promoted
Senior Customer Success Manager
WatersEdge SolutionsJohannesburg, South Africa
Are you a strategic thinker with a passion for customer success and account growth? We are hiring a.Senior Customer Success Manager.
In this role, you’ll play a key part in driving customer retentio...Show moreLast updated: 30+ days ago
Promoted
Reliability & Qualification Engineer
The Hiring HouseJohannesburg, South Africa
Lead and coordinate the execution of product qualification testing with support from a team of technicians.Draft qualification test procedures aligned with relevant environmental and performance st...Show moreLast updated: 15 days ago
Promoted
Technical Sales Engineer
Pro Tem RecruitmentJohannesburg, South Africa
Job Title : Technical Sales Engineer.Sales Manager / Head of Sales / Technical Director.The Technical Sales Engineer acts as a key link between the sales team and the engineering or technical team.T...Show moreLast updated: 30+ days ago
Promoted
Software Systems Engineer
Network RecruitmentCenturion, South Africa
Lead and manage software system engineering processes for projects.Create and maintain system software development and requirements documentation.
Design software system architecture and related des...Show moreLast updated: 30+ days ago