Onyinyechi C. Ugba avatar

Onyinyechi C. Ugba

Data Science & Data Engineering Intern

Data and AI professional with several years of experience delivering analytics, data engineering, and machine learning solutions across financial and business domains. Experienced in building production-grade data pipelines, Retrieval-Augmented Generation (RAG) systems, and ML models that improve data quality, operational efficiency, and decision-making. Strong at bridging data, engineering, and business requirements to transform complex information into scalable, actionable insights.

Location

Göttingen, Germany

Looking for

Immediate Full-Time Opportunities

Projects

View all
PolicyAudit-RAG-Pipeline

PolicyAudit-RAG-Pipeline

Designed and built a production-grade Retrieval-Augmented Generation (RAG) system for insurance/ legal policy analysis using Google Gemini, LangChain, and ChromaDB. Implemented a PDF ingestion and chunking pipeline that transforms insurance documents into a searchable vector database. Enabled hallucination-free, explainable AI responses by returning the exact policy text snippets used for each answer. Developed an interactive Streamlit UI supporting document upload, natural language queries, and source-grounded outputs.

PythonRetrieval-Augmented Generation (RAG)Google GeminiLangChainChromaDBVector DatabasesLLMsStreamlitPDF ProcessingExplainable AI
GitHub Issues ETL Pipeline

GitHub Issues ETL Pipeline

Built a Dockerized Apache Airflow ETL pipeline to ingest and process nested GitHub API data. Reduced processing overhead by 40% using incremental loading and watermarking. Ensured 99%+ pipeline reliability through structured logging, automated retries, and Airflow Secrets Management. Modeled complex JSON responses into an analytical schema supporting historical event tracking and idempotent loads.

PythonApache AirflowETL / ELTIncremental LoadingREST APIsPostgreSQLDockerStructured Logging
Berlin Library Geospatial Data Pipeline

Berlin Library Geospatial Data Pipeline

Engineered a Python and GeoPandas data pipeline extracting 150+ geospatial records from OpenStreetMap. Improved dataset completeness from 60% to 95% through automated address enrichment using the Nominatim API. Enforced 100% coordinate accuracy with validation scripts and PostgreSQL constraints to ensure spatial data integrity.

PythonGeoPandasGeospatial DataAPI IntegrationPostgreSQLData ValidationOpenStreetMap
NYC Education Data ETL & Analytics Pipeline

NYC Education Data ETL & Analytics Pipeline

Developed a multi-stage ETL pipeline processing 4+ public education datasets using Python and PostgreSQL. Designed custom validation schemas and optimized analytical data models. Delivered actionable insights on school safety trends and SAT performance patterns through structured, analysis-ready tables.

PythonPostgreSQLETL PipelinesData ModelingData ValidationAnalytics

Experience

View full CV

Data Science/Engineer Intern

Oct 2025 – Present

Webeet.io · Berlin (Remote)

  • Built production-grade, Dockerized Apache Airflow ETL pipelines ingesting complex GitHub API data (issues, comments, timelines) using Python and PostgreSQL
  • Reduced API overhead by 40% through incremental loading strategies and watermarking, enabling idempotent pipeline re-runs
  • Improved system reliability with Airflow Secrets Management, structured logging, SLAs, automated retries, and in-memory caching
  • Achieved 99%+ pipeline uptime while maintaining GitHub API rate-limit compliance through optimized request strategies
  • Modeled deeply nested API responses into analytical schemas supporting historical event tracking and reproducible analytics
  • Engineered geospatial data pipelines extracting 150+ Berlin library records from OpenStreetMap using Python and GeoPandas
  • Improved data completeness from 60% to 95% via automated address enrichment using the Nominatim API with rate limiting
  • Designed PostgreSQL schemas with constraints and validation workflows to ensure spatial data integrity
  • Developed multi-stage ETL pipelines for NYC education datasets, delivering insights on school safety and SAT performance trends
PythonApache AirflowETL / ELTIncremental LoadingPostgreSQLDockerREST APIsGeoPandasData Modeling

Financial Officer

Jul 2017 – Jul 2018

Post-Graduate Fellowship, Nsukka · Nigeria

  • Built structured data tracking systems for departmental financial records exceeding ₦500K per quarter
  • Implemented standardized data entry and validation processes improving data consistency
Data ManagementData ValidationReportingProcess Optimization

Data Assistant

Jun 2015 – May 2016

Ikenne Local Government Area · Nigeria

  • Processed and validated 5,000+ municipal records using systematic quality control checks
  • Created standardized data templates to support monthly and quarterly reports, reducing reporting errors by 15%
Data QualityData ProcessingReportingData Organization

Machine Learning (Data Science)

Jul 2025 – Aug 2025

Masterschool Berlin · Remote

  • Developed a retail demand forecasting solution using XGBoost and 3+ years of historical sales data
  • Engineered time-based features including seasonality, holidays, day-of-week, and lag variables
  • Evaluated model performance using time-series validation strategies
  • Delivered an end-to-end forecasting pipeline supporting inventory and promotion planning
Machine LearningTime Series ForecastingXGBoostPythonFeature Engineering