Overview
An opportunity exists for a Platform Engineer to contribute to the development, integration, and operation of shared platform services supporting large-scale scientific computing and complex software systems. Working within the Site Reliability Engineering (SRE) team, this role will focus on automation, observability, and operational readiness as the platform transitions from construction into steady-state operations.
Responsibilities
- Develop and enhance platform services to support engineering and operational teams
- Integrate platform services with application and infrastructure systems
- Contribute to automation, monitoring, and service reliability improvements
- Support operational readiness and continuous improvement efforts
- Collaborate with senior engineers and cross-functional teams to deliver resilient and scalable solutions
Minimum Requirements
Qualification(s) required :
National Diploma, BTech, BEng / MTech, MEng, or PhD in Computer Science, Software Engineering, Information Systems, Electronic Engineering, or equivalent (qualification level aligned with experience requirements below)Experience Required
Experience required (qualification-dependent) :
7 years' relevant experience, coupled with a National Diploma, OR6 years' relevant experience, coupled with a BTech, OR4 years' relevant experience, coupled with a BEng / MTech, OR3 years' relevant experience, coupled with a MEng, OR1 year relevant experience, coupled with a PhDAdditional Experience Required
Minimum 2 years' hands-on experience in infrastructure automation, distributed systems, observability, CI / CD, and container orchestration (e.g., Kubernetes)Experience working in teams across data platforms, storage, networking, and systems engineeringExposure to DevOps and SRE practices including monitoring, alerting, incident response, and resilience engineeringPractical experience with infrastructure-as-code, deployment pipelines, and observability stacksKnowledge & Competencies Required
Solid understanding of distributed systems, service meshes, and microservices architecturesProficiency in containerisation and orchestration (Docker, Kubernetes, Helm)Strong Linux administration, troubleshooting, and scripting skillsFamiliarity with networking, security, and storage systems (object, block, distributed)Working knowledge of CI / CD tools (e.g., GitLab CI, Jenkins, ArgoCD)Exposure to cloud platforms (AWS, GCP, Azure, or OpenStack)Advantageous : knowledge of control systems, data acquisition, or scientific computing platformsSkills & Attributes Required
Problem-solving and root cause analysisStrong communication and collaboration skills across technical and non-technical stakeholdersAgile delivery experience, including backlog grooming and sprint planningAbility to document technical solutions and share knowledge across teamsPassion for continuous learning and engineering excellenceDesired Skills
Platform EngineeringKubernetesDevOpsLinux SystemsInfrastructure AutomationObservabilityCloud ComputingCI / CDSRECloud NativeDockerHelmAgileGPUFPGAAWSGCPAzureOpenStackDesired Work Experience
5 to 10 yearsDesired Qualification Level
DiplomaSeniority level
Mid-Senior levelEmployment type
Full-timeJob function
Engineering and Information TechnologyIndustriesIT Services and IT Consulting#J-18808-Ljbffr