Talent.com
Site Reliability Engineer

Site Reliability Engineer

Level-UpJohannesburg, South Africa
30+ days ago
Job description

We are looking for a skilled Site Reliability Engineer (SRE) with expertise in Ansible and Linux to join our dynamic team. The successful candidate will play a critical role in maintaining the reliability, scalability, and performance of our infrastructure, driving automation, and collaborating with development teams to optimize system efficiency.

Key Responsibilities

  • Infrastructure Automation

Automate and maintain IT infrastructure using Ansible to streamline operations.

  • System Administration (Linux and Windows)
  • Manage virtual and physical Windows and Linux servers.

  • Automate server patching and updates to ensure systems remain current.
  • Implement automated security measures for all servers.
  • Monitor server performance and health.
  • Maintain comprehensive system documentation, including configuration and troubleshooting guides.
  • Conduct troubleshooting and root cause analysis as needed.
  • Ensure robust backup, disaster recovery, and business continuity plans are in place and followed.
  • Azure Cloud Management
  • Collaborate with DevOps to deploy, configure, and manage Azure virtual machines and resources.

  • Monitor cloud services for availability, performance, and security.
  • Work with the networking team to implement, monitor, and secure cloud networking infrastructure.
  • Ensure backup, disaster recovery, and business continuity plans are maintained for cloud systems.
  • System Monitoring and Optimization
  • Deploy and maintain monitoring tools for proactive system oversight and alerting.

  • Analyze performance data to identify and resolve bottlenecks.
  • Conduct capacity planning to support scalability and meet business needs.
  • Partner with development teams to enhance application performance on infrastructure.
  • Documentation and Collaboration
  • Create and update technical documentation, including system configurations and procedures.

  • Work with cross-functional teams to provide technical support and solutions.
  • Participate in on-call rotations and respond promptly to system emergencies.
  • Stay informed on industry trends, emerging technologies, and best practices in system administration, cloud computing, and virtualization.
  • Qualifications

  • Bachelors degree in Computer Science, Information Technology, or a related field (or equivalent experience).
  • Relevant certifications (e.g., Linux Professional Institute (LPIC), Microsoft Certified : Azure Administrator Associate) are a plus.
  • Experience & Technical Skills

  • Minimum of 8 years in an Enterprise IT environment, with at least 3 years in a DevOps or SRE role.
  • Strong expertise in Ansible for automation and configuration management.
  • Proficient in Linux system administration (installation, configuration, troubleshooting).
  • Hands-on experience with hypervisor technologies (e.g., VMware, Hyper-V, Proxmox).
  • Knowledge of containerization technologies (e.g., Docker, Kubernetes).
  • Experience managing Azure cloud services, including VMs, storage, networking, and security.
  • Proficiency in scripting languages (e.g., Bash, PowerShell, Python) for automation.
  • Skills & Competencies

  • Excellent problem-solving skills and ability to work independently or in a high-performance team.
  • Strong sense of ownership over tasks, projects, and issues.
  • Effective communication and interpersonal skills to collaborate with stakeholders at all levels.
  • Create a job alert for this search

    Reliability Engineer • Johannesburg, South Africa

    Related jobs
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    LexisNexisJohannesburg, Gauteng, South Africa
    LexisNexis Legal & Professional, which serves customers in more than 150 countries with 11,800 employees worldwide, is part of. Our company has been a long-time leader in deploying AI and advanced t...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    LunoWorkFromHome, Gauteng, South Africa
    Luno is the crypto investment app you can rely on, enabling you to buy, store and explore crypto securely.We're committed to putting the power of cryptocurrency in everyone's hands sensibly and res...Show moreLast updated: 1 day ago
    • Promoted
    Reliability Engineer

    Reliability Engineer

    STELO GROUPJohannesburg, Gauteng, South Africa
    Contract opportunity exists for Reliability Engineer at mine sites in Southern Africa.The Reliability Engineer will be responsible to develop Maintenance Strategies, Tactics and Plans and implement...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Impronics TechnologiesJohannesburg, Gauteng, South Africa
    Site Reliability Engineer (SRE).Be among the first 25 applicants.Site Reliability Engineer (SRE).Get AI-powered advice on this job and more exclusive features. Site Reliability Engineer (SRE).The id...Show moreLast updated: 30+ days ago
    • Promoted
    Systems Reliability Engineering Technical Specialist

    Systems Reliability Engineering Technical Specialist

    Cummins Inc.Gauteng, South Africa
    We are looking for a talented Systems Reliability Engineering Technical Specialist to join our team specializing in Engineering for our Distribution Business Unit in Johannesburg, Gauteng.In this r...Show moreLast updated: 11 days ago
    • Promoted
    Engineer, Reliability

    Engineer, Reliability

    Standard Bank of South Africa LimitedJohannesburg, Gauteng, South Africa
    Business Segment : Personal & Private Banking.Location : ZA, GP, Johannesburg, Simmonds Street.We are seeking a detail-oriented and analytical Reliability Engineer to join our team in Johannesburg, S...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineering Manager

    Site Reliability Engineering Manager

    CanonicalWorkFromHome, Gauteng, South Africa
    Site Reliability Engineering Manager role at Canonical.Location : Remote in APAC region.Lead your team in daily agile devops practices. Represent the IS team to stakeholders, customers, and internal...Show moreLast updated: 30+ days ago
    • Promoted
    Site Engineer

    Site Engineer

    Oxyon Human Capital SolutionsJohannesburg, South Africa
    The Site Engineer is data driven to create insights and trends to effectively and efficiently supervise, telecom work, civil works, on-site survey, network design, installation and integration on p...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    DatacentrixJohannesburg, Gauteng, South Africa
    Gauteng, JHB - Eastern Suburbs (Market related) Are you a Site Reliability Engineer with solid Datadog experience? Our client in the Warehousing and Logistics sector is looking to employ an Enginee...Show moreLast updated: 20 days ago
    • Promoted
    Engineer, Site Reliability

    Engineer, Site Reliability

    Standard Bank of South Africa LimitedJohannesburg, Gauteng, South Africa
    Business Segment : Business & Commercial Banking.Location : ZA, GP, Johannesburg, 3 Simmonds Street.Responsible for the resilience of Group Information Technology across the entire eco system of the ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Engineer

    Site Engineer

    Paton PersonnelJohannesburg, South Africa
    An established engineering firm with a strong presence in infrastructure development is offering a 16-month contract for a Site Engineer to support major water and sanitation initiatives.This role ...Show moreLast updated: 30+ days ago
    • Promoted
    Applications Engineer

    Applications Engineer

    Boardroom AppointmentsKempton Park, South Africa
    Work with suppliers to create detailed technical requirements, including system architecture, hardware, software, and data, ensuring risk management, quality considerations, and product deployment ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (Datadog)

    Site Reliability Engineer (Datadog)

    Data CentrixJohannesburg, Gauteng, South Africa
    Datadog Certified Fundamentals Must have.Degree in Information Technology or Computer Science.Management of operations on virtualized and distributed infrastructures. Management of operations on env...Show moreLast updated: 21 days ago
    • Promoted
    Reliability & Qualification Engineer

    Reliability & Qualification Engineer

    The Hiring HouseJohannesburg, South Africa
    Lead and coordinate the execution of product qualification testing with support from a team of technicians.Draft qualification test procedures aligned with relevant environmental and performance st...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    RELXJohannesburg, Gauteng, South Africa
    Our CEMEA Cloud / SRE team is looking for an experienced DevOps Engineer to help build scalable, secure, and reliable systems. Our team specializes in cloud and DevOps technologies, with members pos...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CueWorkFromHome, South Africa
    Cue provides modern customer service software that enables businesses to communicate with people using chatbots and live chat on platforms like WhatsApp, Messenger, Web chat, Telegram, and more.Our...Show moreLast updated: 11 days ago
    • Promoted
    Platform / DevOps / Site Reliability Engineer

    Platform / DevOps / Site Reliability Engineer

    Elite Search & SelectionJohannesburg, Gauteng, South Africa
    Platform / DevOps / Site Reliability Engineer.Remote but ideally based in Johannesburg, Cape Town, Durban.Part of a large ICT group, this company offers globally available cloud services, solutions...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (SRE II) (Kubernetes / Python)

    Site Reliability Engineer (SRE II) (Kubernetes / Python)

    k0deHutJohannesburg, Gauteng, South Africa
    Site Reliability Engineer (SRE II) (Kubernetes / Python).Job Openings Site Reliability Engineer (SRE II) (Kubernetes / Python). About the job Site Reliability Engineer (SRE II) (Kubernetes / Python).Inter...Show moreLast updated: 3 days ago