Hi, I'm

Francy Hsu

|

MS in Computer Science (AI) from UChicago. I build end-to-end ML pipelines, RAG systems, and real-time streaming infrastructure across AWS and GCP.

scroll

01. About Me

I'm a graduate student at The University of Chicago pursuing an MS in Computer Science with a specialization in Artificial Intelligence (graduating Mar. 2026).

My background spans machine learning engineering, full-stack development, and data science. I've built production-grade AI systems — from RAG pipelines and fine-tuned object detection models to real-time streaming analytics — at a startup and at PwC.

Previously, I studied Quantitative Finance with a minor in Data Science at National Tsing Hua University, which gives me a strong foundation in both rigorous mathematics and applied software engineering.

Languages & Frameworks

PythonRSQLC FastAPIFlaskDjango StreamlitNest.js

ML & AI

PyTorchTensorFlowscikit-learn XGBoostNumPypandas LangChainLangGraphHugging Face

Data & DevOps

AWSGCPDockerKubernetes JenkinsSparkKafkaHadoop HBaseMySQLPostgreSQL
Francy Hsu

02. Experience

Machine Learning Engineer Intern

Vulcan Engineering Solutions · Laguna Niguel, CA

Jun 2025 – Aug 2025
  • Developed 3 AI products in Python for a civil engineering marketplace, integrated into Nest.js backend
  • Built RAG pipeline via LangChain, OpenAI embeddings, PostgreSQL (pgvector), and Llama-3 to process 10K+ pages for domain-specific Q&A, saving 200+ hours of manual research monthly
  • Fine-tuned YOLO from scratch for blueprint object detection, achieving 90% IoU, outperforming R-CNN baseline
  • Automated data entry via Azure OCR, achieving 98% time savings; labeled and augmented 500+ blueprints to 2K+ training samples
  • Implemented 5 RESTful APIs for CRUD operations and designed relational DB schemas (10+ tables in 2NF)
PythonRAGLangChainOpenAI API PostgreSQL · pgvectorYOLOObject Detection Azure OCRREST APIsNest.js

Data Scientist — Data Science Clinic

Groundwork Bridgeport · Chicago, IL

Mar 2025 – Jun 2025
  • Applied Meta's Canopy Height Model to 25+ NAIP imagery tiles via computer vision and GEE, extracting 60M+ geospatial points across 19 sq mi at 1-px resolution
  • Identified regional homogeneity via K-means clustering, discovering 5 clusters with similar canopy growth patterns
  • Analyzed global/local spatial autocorrelation using Moran's I and LISA, uncovering hot/cold spots
  • Conducted paired t-tests and EDA revealing tree canopy growth with distribution shifts over a 20-year period
PythonComputer VisionGoogle Earth Engine K-means ClusteringMoran's I · LISA GeoPandasEDANumPy · pandas

Data Science Intern

Worldie — Network for Good · Chicago, IL

Oct 2024 – Feb 2025
  • Scraped 500+ online harassment articles from 10+ media domains using BeautifulSoup for NLP research pipeline
  • Cleaned and preprocessed text via pandas, regex, and NLTK, removing advertisements and irrelevant content
  • Applied Coreference Resolution via NeuralCoref and AllenNLP in spaCy to standardize entity representation
  • Built GPT-4 powered bias detection model via OpenAI API, computing bias scores on victim-perpetrator relationships
PythonNLPBeautifulSoupspaCy NeuralCorefAllenNLPGPT-4OpenAI APIpandas · NLTK

Full-Stack Software Engineer Intern

PricewaterhouseCoopers · Taipei, TW

Jul 2023 – Dec 2023
  • Developed 5 Python Django risk assessment tools to audit cybersecurity compliance for 45+ companies
  • Automated data analysis of 10K+ system configs against security benchmarks, reducing manual workload by 89%
  • Migrated frontend from Bootstrap to Appkit UI using HTML, CSS, and JavaScript, improving UI consistency
  • Built ETL pipelines to manage 10K+ records in MySQL; resolved 20+ bugs across CI/CD pipelines
PythonDjangoETL Pipelines MySQLHTML · CSS · JSDocker GCPJenkinsCI/CD

Business Analyst Intern

Shopee · Taipei, TW

Oct 2021 – Jan 2022
  • Analyzed 10K+ financial and operational risk datasets using root cause analysis and Pareto charts to address a 573% increase in goods damage claims
  • Designed and launched a seller logistics insurance program, optimizing plans through financial modeling and EBIT calculations
  • Built a data-forecasting P&L model to validate $1.5M+ in projected profit from the insurance product
Data AnalysisFinancial ModelingRoot Cause Analysis Pareto AnalysisP&L ModelingExcel

03. Projects

MLB Real-Time Analytics Dashboard

Lambda architecture on Hadoop to process 1.4M+ baseball records. Built Spark ETL pipelines for team performance metrics and an XGBoost real-time win-probability model (90% accuracy) via Spark Streaming, Kafka, and HBase. Deployed as a Flask dashboard on AWS EC2.

HadoopSparkKafkaHBase XGBoostFlaskAWS EC2
Nov 2025 – Dec 2025

UChicago CS Course Recommendation System

Vector search system built on Streamlit by scraping 100+ UChicago CS courses via BeautifulSoup and pandas. Benchmarked Word2Vec, GloVe, and BERT embeddings, improving recommendation relevance by 30% over TF-IDF baseline.

PythonStreamlitBERTWord2Vec GloVeBeautifulSoupNLP
Jul 2025 – Aug 2025

04. Education

The University of Chicago

M.S. in Computer Science — Specialization in Artificial Intelligence

Chicago, IL · Expected Mar. 2026 · GPA: 3.67 / 4.0

National Tsing Hua University

B.S. in Quantitative Finance — Minor in Data Science

Hsinchu, TW · Jun. 2024 · GPA: 3.91 / 4.0

05. Get In Touch

I'm actively seeking full-time roles in machine learning engineering and software development, available from March 2026. If you have an exciting opportunity or just want to connect, my inbox is always open.

Say Hello