Our client is seeking a hands-on Data Engineer with strong experience in building scalable data pipelines and analytics solutions on Databricks. They will design, implement, and maintain end-to-end data flows, optimize performance, and collaborate with data scientists, analytics, and business stakeholders to turn raw data into trusted insights.
ESSENTIAL SKILLS :
- Expertise with Apache Spark (PySpark), Databricks notebooks, Delta Lake, and SQL
- Strong programming skills in Python for data processing
- Experience with cloud data platforms (Azure) and their Databricks offerings; familiarity with object storage (ADLS)
- Proficient in building and maintaining ETL / ELT pipelines, data modeling, and performance optimization
- Knowledge of data governance, data quality, and data lineage concepts
- Experience with CI / CD for data pipelines, and orchestration tools (GitHub Actions, Asset Bundles or Databricks jobs)
- Strong problem-solving skills, attention to detail, and ability to work in a collaborative, cross-functional team
ADVANTAGEOUS SKILLS :
Experience with streaming data (Structured Streaming, Kafka, Delta Live Tables).Familiarity with materialized views, streaming tables, data catalogs and metadata management.Knowledge of data visualization and BI tools (Splunk, Power BI, Grafana).Experience with data security frameworks and compliance standards relevant to the industry.Certifications in Databricks or cloud provider platforms.QUALIFICATIONS / EXPERIENCE :
Bachelors or Masters degree in Computer Science, Data Engineering, Information Systems, or a related field.
3+ years of hands-on data engineering experience.
Key Responsibilities :
Design, develop, test, and maintain robust data pipelines and ETL / ELT processes on Databricks (Delta Lake, Spark, SQL, Python / Scala / SQL notebooks)Architect scalable data models and data vault / dimensional schemas to support reporting, BI, and advanced analyticsImplement data quality, lineage, and governance practices; monitor data quality metrics and resolve data issues proactivelyCollaborate with Data Platform Engineers to optimize cluster configuration, performance tuning, and cost management in cloud environments (Azure Databricks)Build and maintain data ingestion from multiple sources (RDBMS, SaaS apps, files, streaming queues) using modern data engineering patterns (CDC, event-driven pipelines, change streams, Lakeflow Declarative Pipelines)Ensure data security and compliance (encryption, access controls) in all data pipelinesDevelop and maintain CI / CD pipelines for data workflows; implement versioning, testing, and automated deployments