Professional Experience
Data Specialist | University of Maryland
Part-time (20 hrs/week) | College Park, MD | SEP 2023 - May 2025
Building an NLP pipeline to transform surveys into analytics-ready datasets, leveraging PyTorch, LangChain, and HuggingFace Transformers, helping 8 research analysts save 20+ hours collectively on manual data querying efforts daily
Deployed R Shiny applications on GCP using ShinyProxy, Docker, and Terraform for multi-user collaboration on internal research applications. Incorporated security controls (IAM, RBAC) alongside a Flask application gateway with Google OAuth and reverse-proxy SSL to ensure secure access.
Prepared a proof-of-concept using GCP Pub/Sub, Apache Beam, and BigQuery to process 20M+ daily clickstream events from Canvas ELMS, enabling instant classroom analytics for the academic leadership team via Apache Superset dashboards
Optimized AWS ETL workflows through root cause analysis, implementing incremental ingestion and advanced SQL techniques (CTEs, partitioning, indexing) that cut down processing time from 7 hrs to 4 hrs while ensuring data normalization for accurate reporting.
Senior Software Engineer | Tiger Analytics
Full Time Employment | Chennai, India | JUL 2021 - JUL 2023
Partnered with data architects to prototype and launch 2 enterprise solutions: Tiger Intelligent Data Express and Tiger Data Observability Framework. Scaled the serverless backend of the MVPs to a microservice architecture leveraging FastAPI and Docker to support 3X user growth.
Developed a metadata-driven ETL ingestion framework using Airbyte, Apache Airflow, AWS Glue, and Python, facilitating rapid ingestion from diverse enterprise data sources (CDC, streaming, batch, on-premise databases), reducing asset onboarding time from weeks to hours
Engineered an ACID-compliant Lakehouse solution using Apache Iceberg, AWS S3, Athena, and Redshift. Implemented slowly changing dimensions (SCD Type-2) and time travel capabilities to ensure historical data integrity and support analytical workloads across business verticals
Built a data quality tool with Great Expectations, Apache Spark, and Airflow, used across multiple data projects to validate 30+ custom checks on multiple big data file formats and datasets, reducing data anomalies by 60% across downstream BI and analytics pipelines
Implemented a real-time infrastructure observability pipeline utilizing AWS CloudWatch, ELK stack (Elasticsearch, Logstash, Kibana), and Grafana, accelerating root cause analysis and decreasing Mean-Time-To-Resolution (MTTR) for production issues by 40%
Established data lineage and governance capabilities through seamless integration of LinkedIn DataHub, providing detailed tracking, auditability, and visualization of data flow, aiding regulatory compliance (GDPR, CCPA) and trust in data assets
Collaborated with business analysts and sales teams to translate functional requirements into engineering solutions, while effectively communicating complex technical architectures through simplified presentations during client-facing discussions to build stakeholder engagement.
Took a break from DEC 2019 to JUN 2021 to focus on personal ambitions
Intern & Software Engineer | Xenonstack
Full Time Employment | Chandigarh, India | JAN 2019 - NOV 2019
Modernized a data platform by migrating legacy Hadoop workflows and Pig scripts to Scala/PySpark ETL jobs on Databricks, processing IoT sensor and weather data from 45 geo-locations via Kafka to create a high-availability data lake powering data science workflows
Collaborated with MLOPS team to build a Python framework using MLFlow to automate model management & scoring, improving feature engineering efficiency by 33% and reducing model discovery time in production
Scaled TensorFlow model training by implementing a Ray-based distributed pipeline across a 6-node cluster, reducing computation time by ~10%
Developed a cost-optimization strategy to bid and select EC2 spot instances for AWS EMR jobs during off-peak hours.
Close
Educational background
University of Maryland - College Park | USA
Master's in Information Management
2023-2025 | GPA: 4.0/4.0
Relevant Coursework: Big Data Infrastructure, Data Analytics, Data Integration, Advance Data Science, Cloud Computing, Product Management
Received complete tuition fee waiver for the entire duration of the degree program
Panjab University - Chandigarh | India
B.E. Information Technology
2015-2019 | GPA: 3.74/4.0
Relevant Coursework: Data Structures and Algorithms, Database Systems, Network Security, Operating Systems, Object Oriented Programming
Close
Selected Works
Data Fusion Engineering
Google Cloud
Apache Spark
Terraform
Apache Superset
BASH
SQL
- Developed a complete analytics solution on GCP to ingest, store, transform, and analyze data from 6 different Open NYC dataset APIs incrementally
- Created automated ETL workflows and auto-updating dashboards to visualize KPIs, aiding city planning and identifying accident-prone areas
View on GitHub →
Intelligent Record Management
PyTorch
NLP Tools
Streamlit
Elasticsearch
Gemma2
- A document processing and semantic search system for intelligent indexing of congressional archives
- Allows users to input a query and retrieve relevant past press releases and document summaries using embeddings, NER, and keyword extraction
View on GitHub →
Loan Default Prediction System
PySpark
Pandas
Seaborn
SciKit Learn
XGBoost
Random Forest
K-Means Clustering
Dimensionality Reduction
Customer segmentation
- Crafted an end-to-end machine learning pipeline to predict loan defaults incorporating data preprocessing, feature engineering, & supervised ML modeling
- Prepared a borrower segmentation visualizer using K-Means clustering to identify high-risk defaulter profiles
View on GitHub →
Data Preparation for Fintech Analytics
AWS
Python
Great Expectations
Postgres
Tableau
- Developed a 5-step AWS Glue and Great Expectations pipeline for metadata extraction, profiling, validation, and transformation.
- Built an event-driven pipeline with AWS Lambda, Step Functions, and S3 for real-time quality checks and transformations.
- Integrated cleaned datasets into AWS RDS and created Tableau dashboards to visualize data quality trends and anomalies.
View on GitHub →
Monitoring EKS Cluster
AWS
Terraform
HELM
Prometheus
Jenkins CI/CD
Kubernetes
- Streamlined the setup for deploying OpenTelemetry Webshop
- Enabled modular deployment by splitting large Kubernetes manifests into component YAML files
- Provisioned EKS, Grafana, and Prometheus with Terraform and Helm
- Developed a system to collect logs from the kube-system and track unhealthy pods
- Built pipelines to test and build Docker images via a remote Jenkins server
View on GitHub →
Sports Analytics System
Python
Plotly
Pandas
Tableau
- Developed EDA framework analyzing 10+ performance KPIs
- Created interactive dashboards for tactical analysis
- Identified 3 key success factors through Bayesian analysis
View on Kaggle →
Close