Automated Log Ingestion and Cloud Storage with Kafka & AWS
→
Summary
Designed a real-time log processing pipeline using Apache Kafka and AWS S3 to efficiently handle high-volume streaming data.
Highly skilled Data Engineer with expertise in designing, building, and optimizing scalable, cloud-native ETL pipelines and data platforms using AWS (Glue, Lambda, S3, Athena), Python, SQL, and BI tools (Power BI, QuickSight). Proven track record in automating complex workflows, enhancing data reliability, and delivering real-time analytics to drive measurable business outcomes and strategic decision-making.
→
Summary
Designed a real-time log processing pipeline using Apache Kafka and AWS S3 to efficiently handle high-volume streaming data.
→
Summary
Developed a modern data pipeline for stock market analytics, transforming raw trading data into analysis-ready datasets using Snowflake, dbt, and AWS S3.
→
Summary
Designed and deployed a serverless ETL pipeline leveraging AWS services to automate CSV ingestion and transformation processes.
Data Analyst
Remote, Remote, ZZ
→
Summary
Led data analysis initiatives, developing robust pipelines and models to extract actionable insights for Gen Z workplace dynamics and retention.
Highlights
Engineered and optimized reusable data pipelines using Python (Pandas) and SQL to clean and process raw survey data, facilitating trend analysis on Gen Z workplace expectations and retention.
Designed and implemented comprehensive data models for structuring demographic and career datasets, enabling efficient analysis and reporting for key business stakeholders.
Reduced manual data preparation time by 40% through automation, delivering critical insights that contributed to a proposed 15% improvement in Gen Z employee retention strategies.
Data Science Intern
Remote, Remote, ZZ
→
Summary
Developed and optimized end-to-end predictive models and ETL pipelines for diverse data science applications, driving enhanced business decision-making.
Highlights
Built and deployed end-to-end predictive models for sales forecasting, movie rating estimation, and fraud detection, integrating data ingestion, preprocessing, model training, and evaluation phases.
Developed robust ETL pipelines and feature engineering scripts using Python (Pandas, Scikit-learn) and SQL, transforming raw business data into structured, model-ready datasets for advanced analytics.
Improved model prediction accuracy by over 20% through iterative optimization of data pipelines and evaluation strategies, directly enhancing business decision-making and strategic insights.
→
Bootcamp
Data Science
→
Bachelor of Engineering
Civil Engineering
Grade: 7.42/10 CGPA