Talent.com
Intermediate Site Reliability Engineer, Database Operations

Intermediate Site Reliability Engineer, Database Operations

GitLabWorkFromHome, South Africa
17 days ago
Job description

Overview

GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. Our mission is to enable everyone to contribute to and co-create the software that powers our world. When everyone can contribute, consumers become contributors, significantly accelerating human progress. Our platform unites teams and organizations, breaking down barriers and redefining what\'s possible in software development. Thanks to products like Duo Enterprise and Duo Agent Platform, customers get AI benefits at every stage of the SDLC. The same principles built into our products are reflected in how our team works : we embrace AI as a core productivity multiplier, with all team members expected to incorporate AI into their daily workflows to drive efficiency, innovation, and impact. GitLab is where careers accelerate, innovation flourishes, and every voice is valued. Our high-performance culture is driven by our values and continuous knowledge exchange, enabling our team members to reach their full potential while collaborating with industry leaders to solve complex problems. Co-create the future with us as we build technology that transforms how the world develops software.

An Overview Of This Role

Site Reliability Engineers (SREs) are responsible for keeping all user-facing services and other GitLab production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our environments and the GitLab codebase. We specialize in systems, whether it be networking, the Linux kernel, or some more specific interest in scaling, algorithms, or distributed systems.

The Database Operations team’s mission is to build, run, own and evolve the entire lifecycle of the PostgreSQL database engine for GitLab.com. The team is focused on owning the reliability, scalability, evolution, performance & security of the database engine and its supporting services. The team should be seeking to build their services on top of Reliability : : Foundations services and cloud vendor managed products, where appropriate, to reduce complexity, improve efficiency and deliver new capabilities quicker.

GitLab.com is a unique site and it brings unique challenges–it’s the biggest GitLab instance in existence. In fact, it’s one of the largest single-tenancy open-source SaaS sites on the internet. The experience of our team feeds back into other engineering groups within the company, as well as to GitLab customers running self-managed installations.

Responsibilities

  • Automating every operational task is a core requirement for this role. For example, package updates, configuration changes across all environments, creating tools for automatic provisioning of user facing services, etc.
  • Responding to platform emergencies, alerts, and escalations from Customer Support.
  • Ensure systems exist to manage software life-cycles (e.g. Operating Systems) with a minimum of manual effort.
  • Develop a fully automated multi-environment observability stack based on the existing SaaS system, and extend it to predict capacity needs based on the usage patterns.
  • Plan for new service roll-outs, expansion and capacity management of existing services, and work with users to optimize their resource consumption.

You may be a fit to this role if you

  • Have primary experience running PostgreSQL in high-growth, large production environments using both self-managed (VM, Kubernetes with modern PostgreSQL Operators) as well DBaaS services.
  • Have hands-on experience using data from PostgreSQL internals to design, build and troubleshoot systems.
  • Have primary experience with infrastructure automation, orchestration and configuration management (Chef, Ansible, Puppet, Terraform).
  • Have solid understanding of SQL and PL / pgSQL.
  • Significant experience working in a Large SaaS distributed Systems production environment.
  • Share our values, and work in accordance with those values.
  • Have excellent written and verbal English communication skills, with an urge to collaborate and communicate asynchronously.
  • Have an urge to document all the things so you don\'t need to learn the same thing twice, and an urge for delivering quickly and iterating fast.
  • Have a proactive, go-for-it attitude. When you see something broken, you can\'t help but fix it.
  • Solid data modeling and data structure design skills.
  • Bonus : Solid programming skills as a (former) backend engineer - Preferably with Ruby and / or Go.
  • Bonus : Experience with Clickhouse, or other modern OLAP database.
  • Projects You Could Work On

  • Review, analyze and implement solutions regarding database administration (e.g., backups, performance tuning).
  • Work with Ansible, Terraform, Chef and other tools to build mature automation (automate setup of new replicas or testing and monitoring of backups).
  • Implement self-service tools for our engineers using GitLab ChatOps.
  • Provide technical assistance and support to other teams on database and database-related application design methodologies, system resources, application tuning.
  • Review database related changes from engineering teams (e.g., database migrations).
  • Recommend query and schema changes to optimize the performance of database queries.
  • Jump on a production incident to mitigate database-related issues on GitLab.com.
  • Participate actively in the infrastructure design and scalability considerations focusing on data storage aspects.
  • Make sure we know how to take the next step to scale the database.
  • Design and develop specifications for future database requirements including enhancements, upgrades, and capacity planning; evaluate alternatives; and make appropriate recommendations.
  • Intermediate Site Reliability Engineer Criteria

  • Technical
  • Expertise in at least 1 area of SRE work, with general knowledge of all areas.

  • Capable of mentoring Junior team members.
  • Contributes small improvements to the GitLab codebase to resolve issues.
  • Execution

  • Identifies projects that result in substantial cost savings or revenue.
  • Identifies changes for the product architecture from the reliability, performance and availability perspective with a data driven approach.
  • Proactively work on the efficiency and capacity planning to set clear requirements and reduce the system resources usage to make GitLab cheaper to run for all our customers.
  • Identify parts of the system that do not scale, provides immediate palliative measures and drives long term resolution of these incidents.
  • Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives.
  • Collaboration And Communication

  • Ability to thrive in a fully remote, asynchronous work environment that places a high emphasis on documentation and written communication.
  • Develop expertise in a domain and radiate that knowledge.
  • Participate in blameless RCAs on incidents and outages, looking for answers that will prevent the incident from ever happening again.
  • Influence And Maturity

  • Lead Junior SREs by setting the example.
  • Develop ownership of a major part of the infrastructure.
  • Trusted to de-escalate conflicts inside the team.
  • Performance Indicators

  • GitLab.com Availability
  • GitLab.com Performance
  • Apdex and Error SLO per Service
  • Mean Time to Detection
  • Mean Time to Resolution
  • Mean Time Between Failure
  • Mean Time to Production
  • Disaster Recovery Time to Recovery
  • How GitLab Will Support You

  • Benefits to support your health, finances, and well-being
  • Flexible Paid Time Off
  • Team Member Resource Groups
  • Equity Compensation & Employee Stock Purchase Plan
  • Growth and Development Fund
  • Parental leave
  • Home office support
  • Please note that we welcome interest from candidates with varying levels of experience; many successful candidates do not meet every single requirement. Additionally, studies have shown that people from underrepresented groups are less likely to apply to a job unless they meet every single qualification. If you\'re excited about this role, please apply and allow our recruiters to assess your application.

    Country Hiring Guidelines : GitLab hires new team members in countries around the world. All of our roles are remote, however some roles may carry specific location-based eligibility requirements. Our Talent Acquisition team can help answer any questions about location after starting the recruiting process.

    Privacy Policy : Please review our Recruitment Privacy Policy. Your privacy is important to us.

    GitLab is proud to be an equal opportunity workplace and is an affirmative action employer. GitLab’s policies and practices relating to recruitment, employment, career development and advancement, promotion, and retirement are based solely on merit, regardless of race, color, religion, ancestry, sex (including pregnancy, lactation, sexual orientation, gender identity, or gender expression), national origin, age, citizenship, marital status, mental or physical disability, genetic information (including family medical history), discharge status from the military, protected veteran status (which includes disabled veterans, recently separated veterans, active duty wartime or campaign badge veterans, and Armed Forces service medal veterans), or any other basis protected by law. GitLab will not tolerate discrimination or harassment based on any of these characteristics. See also GitLab’s EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know during the recruiting process.

    Seniority level

  • Associate
  • Employment type

  • Full-time
  • Job function

  • Engineering and Information Technology
  • Industries
  • IT Services and IT Consulting and Software Development
  • Referrals increase your chances of interviewing at GitLab by 2x

    Note

    Sign in to set job alerts for “Site Reliability Engineer” roles.

    #J-18808-Ljbffr

    Create a job alert for this search

    Reliability Engineer • WorkFromHome, South Africa

    Related jobs
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    LunoWorkFromHome, Gauteng, South Africa
    Luno is the crypto investment app you can rely on, enabling you to buy, store and explore crypto securely.We're committed to putting the power of cryptocurrency in everyone's hands sensibly and res...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    LexisNexisJohannesburg, Gauteng, South Africa
    LexisNexis Legal & Professional, which serves customers in more than 150 countries with 11,800 employees worldwide, is part of. Our company has been a long-time leader in deploying AI and advanced t...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (Egypt, USA, Phillipines, Mexico)

    Site Reliability Engineer (Egypt, USA, Phillipines, Mexico)

    ABC WorldwidePretoria, South Africa
    Site Reliability Engineer (SRE).Egypt (Cairo), USA, Phillipines, India.Initial 1-year Fixed term contract with option to move into a permanent position. Our client, a global Business Process Outsour...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Impronics TechnologiesJohannesburg, Gauteng, South Africa
    Site Reliability Engineer (SRE).Be among the first 25 applicants.Site Reliability Engineer (SRE).Get AI-powered advice on this job and more exclusive features. Site Reliability Engineer (SRE).The id...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer – Midrand / Semi-Remote @ R750 PH

    Site Reliability Engineer – Midrand / Semi-Remote @ R750 PH

    E-MergePretoria, South Africa
    Remote
    We are looking for a proactive and detail-oriented.IT operations, ensuring our systems are not only fast and reliable, but continuously improving. Site Reliability Engineer (SRE),.IT Degree and / or r...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineering Manager

    Site Reliability Engineering Manager

    CanonicalWorkFromHome, Gauteng, South Africa
    Site Reliability Engineering Manager role at Canonical.Location : Remote in APAC region.Lead your team in daily agile devops practices. Represent the IS team to stakeholders, customers, and internal...Show moreLast updated: 30+ days ago
    • Promoted
    Engineer, Reliability

    Engineer, Reliability

    Standard Bank of South Africa LimitedJohannesburg, Gauteng, South Africa
    Business Segment : Personal & Private Banking.Location : ZA, GP, Johannesburg, Simmonds Street.We are seeking a detail-oriented and analytical Reliability Engineer to join our team in Johannesburg, S...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Cloud Engineer Centurion

    Senior Cloud Engineer Centurion

    Liyema ConsultingCenturion, Gauteng, South Africa
    Cloud Database Engineer (Senior).Seeking a skilled Database Cloud Architect with Oracle expertise and a flair for cloud architectures. The ideal candidate will spearhead the design and implementatio...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    DatacentrixJohannesburg, Gauteng, South Africa
    Gauteng, JHB - Eastern Suburbs (Market related) Are you a Site Reliability Engineer with solid Datadog experience? Our client in the Warehousing and Logistics sector is looking to employ an Enginee...Show moreLast updated: 20 days ago
    • Promoted
    Engineer, Site Reliability

    Engineer, Site Reliability

    Standard Bank of South Africa LimitedJohannesburg, Gauteng, South Africa
    Business Segment : Business & Commercial Banking.Location : ZA, GP, Johannesburg, 3 Simmonds Street.Responsible for the resilience of Group Information Technology across the entire eco system of the ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Level-UpJohannesburg, South Africa
    We are looking for a skilled Site Reliability Engineer (SRE) with expertise in Ansible and Linux to join our dynamic team. The successful candidate will play a critical role in maintaining the relia...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (SRE II) (Kubernetes / Python)

    Site Reliability Engineer (SRE II) (Kubernetes / Python)

    k0deHutWorkFromHome, Gauteng, South Africa
    Site Reliability Engineer (SRE II) (Kubernetes / Python).Job Openings Site Reliability Engineer (SRE II) (Kubernetes / Python). About the job Site Reliability Engineer (SRE II) (Kubernetes / Python).Inter...Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer (Datadog)

    Site Reliability Engineer (Datadog)

    Data CentrixJohannesburg, Gauteng, South Africa
    Datadog Certified Fundamentals Must have.Degree in Information Technology or Computer Science.Management of operations on virtualized and distributed infrastructures. Management of operations on env...Show moreLast updated: 21 days ago
    • Promoted
    Senior Site Reliability Engineer – Midrand / Centurion- Semi- Remote – R650 PH

    Senior Site Reliability Engineer – Midrand / Centurion- Semi- Remote – R650 PH

    E-MergePretoria, South Africa
    Remote
    Our client in the Manufacturing industry specialises with building premium vehiclesthey engineer the future of mobility.Senior Site Reliability Engineer. If you''re passionate about automation, clo...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    RELXJohannesburg, Gauteng, South Africa
    Our CEMEA Cloud / SRE team is looking for an experienced DevOps Engineer to help build scalable, secure, and reliable systems. Our team specializes in cloud and DevOps technologies, with members pos...Show moreLast updated: 30+ days ago
    • Promoted
    Platform / DevOps / Site Reliability Engineer

    Platform / DevOps / Site Reliability Engineer

    Elite Search & SelectionJohannesburg, Gauteng, South Africa
    Platform / DevOps / Site Reliability Engineer.Remote but ideally based in Johannesburg, Cape Town, Durban.Part of a large ICT group, this company offers globally available cloud services, solutions...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CueWorkFromHome, South Africa
    Cue provides modern customer service software that enables businesses to communicate with people using chatbots and live chat on platforms like WhatsApp, Messenger, Web chat, Telegram, and more.Our...Show moreLast updated: 11 days ago
    • Promoted
    (Senior) : Cloud Database Engineer (Senior)

    (Senior) : Cloud Database Engineer (Senior)

    Liyema ConsultingCenturion, Gauteng, South Africa
    Senior) : Cloud Database Engineer.Seeking a skilled Database Cloud Architect with Oracle expertise and a flair for cloud architectures. The ideal candidate will spearhead the design and implementatio...Show moreLast updated: 30+ days ago