Senior Data Engineer
Mentions vibe coding for prototyping analytics UIs with AI assistants (Claude/Lovable) and building AI/LLM-ready data layers for RAG workflows.
About the Role
Senior Data Engineer responsible for architecting and implementing scalable data platforms, building silver and gold data layers, optimizing compute costs, enforcing data quality and governance, and creating semantic layers for AI/LLM consumption. The role combines hands-on ETL/ELT implementation, performance tuning/FinOps, orchestration, and operational excellence to deliver reliable analytics and AI-ready data.
Job Description
Role
Senior Data Engineer focused on end-to-end data solutioning: architecting and implementing scalable silver and gold data layers, building semantic layers for AI/LLM consumption, optimizing compute cost and performance, enforcing data quality and governance, and operationalizing pipelines.
Key Responsibilities
- Design and document data architecture and flow from Raw (Bronze) to Cleaned/Enriched (Silver) to Metrics (Gold) for visualization and discovery.
- Architect extensible data models that decouple storage from compute to accommodate changing business needs.
- Design and implement semantic layers, metadata, definitions, relationships, and feature stores to make data AI/LLM-ready.
- Implement ETL/ELT pipelines to build silver aggregations and gold metrics; enforce RBAC/ABAC, row/column security, and PII handling (masking/tokenization).
- Implement and maintain data dictionaries, metadata, lineage, and data quality guardrails as part of the definition of done.
- Optimize compute for cost and performance (memory, cores, executors, partitions) considering data volume, transformation complexity, data movement, SLAs, and read volume.
- Manage, maintain, and optimize BAU pipelines, including bug fixes, enhancements, and porting pipelines across stacks.
- Implement automated data quality frameworks and tests to catch nulls, schema drift, and anomalies before they reach the gold layer.
- Orchestrate complex workflows and dependencies using tools like Airflow, Dagster, or ADF; manage DAG SLAs, backfills, retries, and alert routing.
- Apply DevOps and CI/CD best practices: version control (Git), automated testing, and deployment pipelines.
Requirements
- 5–8 years in analytics engineering / data engineering, with at least ~2 years in architecting and solutioning.
- Strong grounding in distributed computing principles and deep tuning experience to resolve execution issues (spill-to-disk, slow stages) and optimize configurations.
- Demonstrated ability to diagnose business problems and implement pragmatic, production-ready solutions (application & execution focus rather than research).
- Experience building data layers for Generative AI: Vector databases, knowledge graphs, semantic layers, and preparing data for RAG architectures.
- Proficiency in advanced SQL, Python, R, Spark, and Spark SQL; Scala is a plus.
- Experience with data modeling (dimensional modeling: Star Schema, Snowflake) and modern table formats (Delta Lake, Iceberg, Hudi).
- Familiarity with Spark, Hive, BigQuery and concepts like DAGs, shuffle, serialization, and partition pruning.
- Experience with data quality tools (e.g., Great Expectations, dbt tests) and orchestration tools (Airflow, Dagster, ADF).
- Operational experience with monitoring/alerting and incident workflows (e.g., PagerDuty, Teams).
- Behavioral attributes: high ethics, high agency, problem solver, first-principles thinker.
Nice to have
- Experience with BI/metric layer tools (LookML, Transform, MetricFlow, semantic layer tooling).
- Prior experience porting data stacks and building repeatable POCs using AI assistants for prototyping analytics UIs.
Compensation & Location
- Pay: ₹1,400,000 - ₹1,900,000 per year.
- Work location: In person.