Building large-scale data pipelines, real-time analytics & data warehousing solutions
Result-oriented professional with 7+ years in building large-scale data pipelines, real-time analytics, and data warehousing across technology, retail, and financial domains.
Expertise in Apache Spark, Kafka, Airflow, and Snowflake, designing pipelines, and reducing data lag by 35%. Proficient in data modeling, automation, dashboard development, and achieving cost savings and workload reduction.
Proven success in partnering with cross-functional teams, delivering high-impact data solutions, and driving measurable business outcomes. Currently working at Meta as a Data Engineer, handling 50B+ daily events and creating multi-dimensional data models for user engagement tracking.
Core Competencies: Real-time Analytics, Data Pipeline Architecture, ETL/ELT Development, Data Warehouse Design, Stream Processing, Data Quality Framework, Business Intelligence, Machine Learning Integration
Click on any card to reveal project details
GenAI & NLP
Click to reveal details →
Feb 2022 – Jul 2022
• Directed GenAI chatbot with LLMs, NLP, and vector search
• Extracted intelligence from earnings calls - 30% better data access
• Automated financial page synthesis - 20% workload reduction
• Accelerated insight extraction for executive teams
Real-time Analytics
Click to reveal details →
Meta - Current
• Kafka + Spark pipelines for 50B+ events/day
• dbt semantic layers powering 15+ Looker dashboards
• 99.9% data quality with Great Expectations
• Real-time analytics for business decisions
Cloud Engineering
Click to reveal details →
Amazon
• Kafka streaming handling 500k events/sec
• AWS Glue ETL processing 750GB daily
• Redshift optimization - 52% CPU reduction
• Campaign reach increased by 20%
Machine Learning
Click to reveal details →
University at Buffalo
• ML models for 3,500 at-risk students
• 88.74% accuracy with Random Forest
• 23% improvement in graduation rates
• Predictive interventions system
Data Engineering
Click to reveal details →
Nike India
• 5 queues processing 6M+ records/day
• 98% data availability achieved
• PySpark jobs for 20TB+ daily datasets
• Apache NiFi workflow automation
Time Series & Forecasting
Click to reveal details →
Freelance Project
• TensorFlow-based forecasting model
• 10% energy cost reduction achieved
• 8% fewer customer disruptions
• CI/CD with GitHub Actions & AWS
Jan 2021 – Aug 2022
Specialized in machine learning, big data analytics, and statistical modeling. Developed expertise in building scalable data solutions and predictive analytics systems.
Jun 2014 – Aug 2018
Strong foundation in computer science fundamentals, algorithms, data structures, and software engineering principles.
Let's discuss how we can work together on your next data engineering project.