Talent.com
This job offer is not available in your country.
SKA Mid – Senior Compute Systems Engineer

SKA Mid – Senior Compute Systems Engineer

South African Radio Astronomy ObservatoryCape Town, ZA
11 hours ago
Job description

Job title : SKA Mid – Senior Compute Systems Engineer

Job Location : Western Cape, Cape Town

Deadline : October 30, 2025

Quick Recommended Links

  • Jobs by Location
  • Job by industries

Purpose :

  • The SKA-Mid Senior Computer Systems Engineer, will lead the compute and storage systems team for SKA-Mid and will report to the SKA-Mid Site Reliability Engineering (SRE) Manager within SKA-Mid Computing & Software, providing hands-on technical leadership in the design, implementation, and long-term operation and maintenance of secure, reliable, and high-performance computer systems infrastructure for the Telescopes hosted by SARAO.
  • While contributing to computing systems enablement, this role also focuses on shaping operational practices, supporting local delivery partnerships, and helping build the team that will manage computing systems operations as the telescopes transition from construction to steady-state operations.
  • This role involves guiding infrastructure development, mentoring team members, and ensuring systems align with SRE principles. Responsibilities include deploying and optimising systems, managing faults, contributing to long-term infrastructure planning, and ensuring scalable, maintainable operations.
  • The position plays a key role in cross-team collaboration, driving innovation while supporting sustainable and resilient computing environments.
  • Key Responsibilities :

  • Contribute to the global design and implementation of scalable and fault tolerant infrastructure systems that support engineering and operational needs.
  • Contribute to the deployment, configuration, and maintenance of distributed storage and database systems
  • Analyse system failures, performance issues, and misconfigurations across hardware, software, and network layers.
  • Lead and mentor the computer systems engineers and contribute to strategic technical planning.
  • Key Requirements : Qualification :

  • BTech in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 13 years’ experience, OR
  • BENG / MTech in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 9 years’ experience, OR
  • MENG in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 7 years’ experience, OR
  • PHD in Computer Science, Software Engineering, Information Systems, Electronic Engineering or equivalent qualifications coupled with 5 years’ experience.
  • Experience :

  • 3+ years in a technical leadership or software / system architectural role with direct responsibility for large- / platform-scale distributed systems.
  • Demonstrated hands-on experience in infrastructure design and automation, distributed systems, observability, CI / CD, container
  • orchestration (e.g. Kubernetes), DevOps / SRE practices and cloud-native technologies.
  • Experience leading teams or initiatives that intersect with data platforms, storage, networking, and systems engineering domains
  • Knowledge :

  • In-depth understanding of systems engineering principles, including performance optimisation, fault tolerance, and resource scheduling in Linux-based environments.
  • Strong knowledge of containerised environments (Docker, Podman), orchestration platforms (Kubernetes, Helm), and runtime architectures (containerd, CRI).
  • Expertise in infrastructure-as-code, continuous integration / deployment (CI / CD), and configuration management tools (e.g., GitLab CI, Ansible, Terraform, ArgoCD).
  • Advanced understanding of distributed computing and storage architectures, including Ceph, S3, NFS, and local / clustered file systems.
  • Operational and architectural fluency in relational and NoSQL database systems (e.g., PostgreSQL, MySQL, MongoDB), including replication, backups, and performance tuning.
  • Working knowledge of networking fundamentals, security protocols, and systems-level observability (e.g., Prometheus, Grafana, ELK / EFK stack).
  • Familiarity with the HPC ecosystem (e.g., SLURM, job schedulers) is beneficial for environments supporting scientific or research computing.
  • Additional Notes :

    Competency – Essential :

  • Demonstrated technical leadership (3+ years), leading cross-functional efforts across systems, storage, and database infrastructure,
  • driving technical decisions from architecture through implementation.
  • Systems engineering expertise, with a focus on Linux administration, infrastructure automation, service orchestration, and performance
  • optimisation across diverse environments.
  • Expertise in distributed systems architecture, including the design and deployment of scalable, resilient services using microservices,
  • event-driven, and cloud-native design patterns.
  • Containerisation and orchestration fluency, including production-grade usage of Kubernetes, Docker, and Helm for system and
  • application-level deployments.
  • Infrastructure automation and CI / CD, using tools such as GitLab CI, ArgoCD, FluxCD, Jenkins, or GitHub Actions to streamline and secure
  • platform operations.
  • Complementary DevOps and SRE practices, blending infrastructure-as-code, configuration management, and release automation (DevOps) with incident response, monitoring, SLIs / SLOs, and system reliability engineering (SRE)
  • Linux expertise, including advanced troubleshooting, kernel tuning, systemd orchestration, and optimisation at scale.
  • Technical delivery and planning capabilities, including backlog scoping, cross-team collaboration, and Agile sprint execution.
  • Database administration skills, with operational experience in administering relational and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB), including high availability, backups, replication, and performance tuning.
  • Diagnostic skills, with a root-cause-first approach, and a strong bias for ownership, accountability, and long-term operational stability.
  • ICT jobs
  • Create a job alert for this search

    System Engineer • Cape Town, ZA