Projects

Home
5of5projects
PolicyAudit-RAG-Pipeline

PolicyAudit-RAG-Pipeline

Designed and built a production-grade Retrieval-Augmented Generation (RAG) system for insurance/ legal policy analysis using Google Gemini, LangChain, and ChromaDB. Implemented a PDF ingestion and chunking pipeline that transforms insurance documents into a searchable vector database. Enabled hallucination-free, explainable AI responses by returning the exact policy text snippets used for each answer. Developed an interactive Streamlit UI supporting document upload, natural language queries, and source-grounded outputs.

PythonRetrieval-Augmented Generation (RAG)Google GeminiLangChainChromaDBVector DatabasesLLMsStreamlitPDF ProcessingExplainable AI
GitHub Issues ETL Pipeline

GitHub Issues ETL Pipeline

Built a Dockerized Apache Airflow ETL pipeline to ingest and process nested GitHub API data. Reduced processing overhead by 40% using incremental loading and watermarking. Ensured 99%+ pipeline reliability through structured logging, automated retries, and Airflow Secrets Management. Modeled complex JSON responses into an analytical schema supporting historical event tracking and idempotent loads.

PythonApache AirflowETL / ELTIncremental LoadingREST APIsPostgreSQLDockerStructured Logging
Berlin Library Geospatial Data Pipeline

Berlin Library Geospatial Data Pipeline

Engineered a Python and GeoPandas data pipeline extracting 150+ geospatial records from OpenStreetMap. Improved dataset completeness from 60% to 95% through automated address enrichment using the Nominatim API. Enforced 100% coordinate accuracy with validation scripts and PostgreSQL constraints to ensure spatial data integrity.

PythonGeoPandasGeospatial DataAPI IntegrationPostgreSQLData ValidationOpenStreetMap
NYC Education Data ETL & Analytics Pipeline

NYC Education Data ETL & Analytics Pipeline

Developed a multi-stage ETL pipeline processing 4+ public education datasets using Python and PostgreSQL. Designed custom validation schemas and optimized analytical data models. Delivered actionable insights on school safety trends and SAT performance patterns through structured, analysis-ready tables.

PythonPostgreSQLETL PipelinesData ModelingData ValidationAnalytics
Retail Demand Forecasting Application

Retail Demand Forecasting Application

Built a demand forecasting solution to predict daily unit sales per store-item-date for a large Ecuadorian grocery retailer. Trained time-series and machine learning models and deployed an interactive Streamlit application to support inventory planning and promotion decisions.

PythonMachine LearningTime Series ForecastingScikit-learnStreamlitDemand Forecasting