Talent.com
Site Reliability Engineer

Site Reliability Engineer

DatacentrixJohannesburg, Gauteng, South Africa
20 days ago
Job description

Overview

Gauteng, JHB - Eastern Suburbs (Market related) Are you a Site Reliability Engineer with solid Datadog experience? Our client in the Warehousing and Logistics sector is looking to employ an Engineer to support the design, implementation, and optimization of Datadog monitoring solutions across infrastructure, applications, and services.

Qualifications

  • Datadog Certified Fundamentals – Must have
  • Degree in Information Technology or Computer Science
  • Management of operations on virtualized and distributed infrastructures
  • Management of operations on environment with clustering, replication, load balancer
  • ITIL Practitioner (V3) / ITIL Specialist (V4)
  • Windows Server : Advantage
  • 1–3 years of experience working with a modern monitoring / observability tool, ideally Datadog (or alternatives like Prometheus, Grafana, New Relic, or Dynatrace)

Experience in

  • Deploying and configuring monitoring agents
  • Creating dashboards and monitors
  • Parameterizing tags and labels for proper data correlation
  • Basic familiarity with cloud platforms (AWS, Azure or GCP) and container environments (Docker / Kubernetes)
  • Experience working with Centreon - Advantage
  • Strong interest in monitoring, DevOps, SRE, or cloud infrastructure
  • Knowledge of basic scripting (e.g., Bash, Python) is a plus
  • Duties

  • Support the design, implementation, and optimization of Datadog monitoring solutions across infrastructure, applications, and services.
  • Work alongside DevOps, infrastructure, and application teams to ensure complete observability using custom dashboards, alerts, and tagging strategies.
  • Assist in the deployment and onboarding of new systems into the monitoring ecosystem.
  • Serve as the go-to person for building visualizations, improving signal-to-noise ratios in alerting, and aligning monitoring with business objectives.
  • Ideal for a young and motivated engineer looking to grow within observability and cloud-native monitoring.
  • Deploy and configure Datadog agents across various environments (cloud and on-prem).
  • Create and customize dashboards, monitors, and alerts for systems, services, containers, and applications.
  • Implement tagging strategies to organize, filter, and correlate metrics and logs effectively.
  • Integrate Datadog with various platforms (AWS, Azure, GCP, Kubernetes, Docker, etc.) to collect telemetry data.
  • Collaborate with developers, DevOps, and infrastructure teams to identify key business and system metrics to monitor.
  • Continuously tune and optimize monitors to reduce false positives and improve actionable alerting.
  • Document dashboards, alert logic, best practices, and knowledge for cross-team enablement.
  • Analyze incidents and outages post-mortem to identify monitoring gaps and enhance visibility.
  • Assist in evangelizing observability practices within the organization and contribute to monitoring as code efforts (e.g., Terraform for Datadog resources).
  • Stay up to date with new Datadog features and industry trends in observability and monitoring.
  • Contact

    For more information please contact : Lesedi Danguru

    #J-18808-Ljbffr

    Create a job alert for this search

    Reliability Engineer • Johannesburg, Gauteng, South Africa

    Related jobs
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    LexisNexisJohannesburg, Gauteng, South Africa
    LexisNexis Legal & Professional, which serves customers in more than 150 countries with 11,800 employees worldwide, is part of. Our company has been a long-time leader in deploying AI and advanced t...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    LunoWorkFromHome, Gauteng, South Africa
    Luno is the crypto investment app you can rely on, enabling you to buy, store and explore crypto securely.We're committed to putting the power of cryptocurrency in everyone's hands sensibly and res...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Level-UpJohannesburg, Gauteng, South Africa
    We are looking for a skilled Site Reliability Engineer (SRE) with expertise in Ansible and Linux to join our dynamic team. The successful candidate will play a critical role in maintaining the relia...Show moreLast updated: 30+ days ago
    • Promoted
    Reliability Engineer

    Reliability Engineer

    STELO GROUPJohannesburg, Gauteng, South Africa
    Contract opportunity exists for Reliability Engineer at mine sites in Southern Africa.The Reliability Engineer will be responsible to develop Maintenance Strategies, Tactics and Plans and implement...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Impronics TechnologiesJohannesburg, Gauteng, South Africa
    Site Reliability Engineer (SRE).Be among the first 25 applicants.Site Reliability Engineer (SRE).Get AI-powered advice on this job and more exclusive features. Site Reliability Engineer (SRE).The id...Show moreLast updated: 30+ days ago
    • Promoted
    Systems Reliability Engineering Technical Specialist

    Systems Reliability Engineering Technical Specialist

    Cummins Inc.Gauteng, South Africa
    We are looking for a talented Systems Reliability Engineering Technical Specialist to join our team specializing in Engineering for our Distribution Business Unit in Johannesburg, Gauteng.In this r...Show moreLast updated: 12 days ago
    • Promoted
    Engineer, Reliability

    Engineer, Reliability

    Standard Bank of South Africa LimitedJohannesburg, Gauteng, South Africa
    Business Segment : Personal & Private Banking.Location : ZA, GP, Johannesburg, Simmonds Street.We are seeking a detail-oriented and analytical Reliability Engineer to join our team in Johannesburg, S...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineering Manager

    Site Reliability Engineering Manager

    CanonicalWorkFromHome, Gauteng, South Africa
    Site Reliability Engineering Manager role at Canonical.Location : Remote in APAC region.Lead your team in daily agile devops practices. Represent the IS team to stakeholders, customers, and internal...Show moreLast updated: 30+ days ago
    • Promoted
    Site Engineer

    Site Engineer

    Oxyon Human Capital SolutionsJohannesburg, South Africa
    The Site Engineer is data driven to create insights and trends to effectively and efficiently supervise, telecom work, civil works, on-site survey, network design, installation and integration on p...Show moreLast updated: 30+ days ago
    • Promoted
    Engineer, Site Reliability

    Engineer, Site Reliability

    Standard Bank of South Africa LimitedJohannesburg, Gauteng, South Africa
    Business Segment : Business & Commercial Banking.Location : ZA, GP, Johannesburg, 3 Simmonds Street.Responsible for the resilience of Group Information Technology across the entire eco system of the ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Engineer

    Site Engineer

    Paton PersonnelJohannesburg, South Africa
    An established engineering firm with a strong presence in infrastructure development is offering a 16-month contract for a Site Engineer to support major water and sanitation initiatives.This role ...Show moreLast updated: 30+ days ago
    • Promoted
    Construction Site Agent

    Construction Site Agent

    JustTheJob.co.zaGauteng, South Africa
    Construction Site Agent - Gauteng.Join a dynamic company specialising in Construction, Transportation and Plant Hire.As a Site Agent, you will play a key role in Managing Construction tasks, drivin...Show moreLast updated: 20 days ago
    • Promoted
    Site Reliability Engineer (SRE II) (Kubernetes / Python)

    Site Reliability Engineer (SRE II) (Kubernetes / Python)

    k0deHutWorkFromHome, Gauteng, South Africa
    Site Reliability Engineer (SRE II) (Kubernetes / Python).Job Openings Site Reliability Engineer (SRE II) (Kubernetes / Python). About the job Site Reliability Engineer (SRE II) (Kubernetes / Python).Inter...Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer (Datadog)

    Site Reliability Engineer (Datadog)

    Data CentrixJohannesburg, Gauteng, South Africa
    Datadog Certified Fundamentals Must have.Degree in Information Technology or Computer Science.Management of operations on virtualized and distributed infrastructures. Management of operations on env...Show moreLast updated: 21 days ago
    • Promoted
    Reliability & Qualification Engineer

    Reliability & Qualification Engineer

    The Hiring HouseJohannesburg, South Africa
    Lead and coordinate the execution of product qualification testing with support from a team of technicians.Draft qualification test procedures aligned with relevant environmental and performance st...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    RELXJohannesburg, Gauteng, South Africa
    Our CEMEA Cloud / SRE team is looking for an experienced DevOps Engineer to help build scalable, secure, and reliable systems. Our team specializes in cloud and DevOps technologies, with members pos...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CueWorkFromHome, South Africa
    Cue provides modern customer service software that enables businesses to communicate with people using chatbots and live chat on platforms like WhatsApp, Messenger, Web chat, Telegram, and more.Our...Show moreLast updated: 12 days ago
    • Promoted
    Platform / DevOps / Site Reliability Engineer

    Platform / DevOps / Site Reliability Engineer

    Elite Search & SelectionJohannesburg, Gauteng, South Africa
    Platform / DevOps / Site Reliability Engineer.Remote but ideally based in Johannesburg, Cape Town, Durban.Part of a large ICT group, this company offers globally available cloud services, solutions...Show moreLast updated: 30+ days ago