Job title : Operations Support Engineer
Job Location : Gauteng, Johannesburg
Deadline : November 06, 2025
Quick Recommended Links
- Jobs by Location
- Job by industries
Role Purpose / Business Unit :
The job role for Operations Support Engineer is :
To ensure operational excellence across all production systems supporting Big Data and Machine Learning platforms. This role is responsible for maintaining system reliability, managing incidents, changes and delivering ITIL-aligned support services to ensure seamless platform performance and user satisfaction.Your responsibilities will include :
Production Operations & Support
Monitor and maintain the health of production data and ML pipelines, platforms, and services.Perform root cause analysis and resolution of production incidents.Manage and coordinate incident response, escalation, and communication.Ensure timely resolution of support tickets and service requests.ITIL Service Management
Implement and manage ITIL processes including Incident, Problem, Change, and Release Management.Maintain service documentation, runbooks, and operational procedures.Participate in CAB (Change Advisory Board) reviews and ensure compliance with change protocols.Operational Excellence
Drive continuous improvement initiatives to enhance system reliability and performance.Collaborate with DevOps, MLOps, and Platform Engineering teams to automate operational tasks.Track and report on SLAs, SLOs, and KPIs for operational services.Monitoring & Observability
Set up and maintain monitoring, alerting, and logging systems.Ensure visibility into system performance and proactively identify issues.Support observability tooling and dashboards for platform health.Stakeholder Engagement
Act as the first point of contact for production-related issues.Liaise with internal teams and external vendors to resolve operational challenges.Provide regular updates and reports to leadership on operational status and risks.The ideal candidate for this role will have :
Bachelor’s degree in computer science, Engineering, or related field.5+ years of experience in Big Data, ML Operations Support.Experience with Cloud based data technologies such as AWS, GCP or Azure.5+ years of overall IT experience with Big Data, Advance Analytics, Data Warehousing and Business Intelligence.Relevant cloud certification at professional or associate level would be advantageous.Strong communication and collaboration skills.Agile exposure, Kanban, or Scrum.Core competencies, knowledge, and experience :
In-depth knowledge of data as a product & Information best practices.Experience in using a wide range for data tools such as AWS services – S3, SFTP, Glue, EMR (Spark), Airflow, Athena, CloudWatch, CouldTrail, KMS, Kinesis, OpenSearch, etc.Strong understanding of ITIL frameworks and service management.Experience in production support for data platforms, ML systems, or cloud infrastructure.Familiarity with monitoring tools (e.g. Prometheus, Grafana, Postgress).Knowledge of incident and change management workflows.Excellent troubleshooting, communication, and documentation skills.Working experience with Cloud platforms such as AWS and GCP.Working experience with Kubernetes and Docker containers.Working experience with CI / CD, IAC and DevOps tools such as CDK, Code Repos, etc.Strong programming skills in Python and SQL.ICT jobs#J-18808-Ljbffr