Sophia (Xiyuan) Shen

Sophia (Xiyuan) Shen

A data builder at the intersection of health & ML.

Data Science & Health Policy @ UNC–Chapel Hill

Data is everywhere, but the glamorous part is the smallest part. You spend something like 95% of your time cleaning data and maybe 5% actually modeling it. I've also stopped believing good data science means hitting 90%+ accuracy. It means serving the people the data came from, and that starts with picking the right metric to measure in the first place.

01 — Projects

Selected work

Machine learning, healthcare analytics, and full-stack builds.

Chinese Digit Classifier

A neural network built from scratch (no ML libraries) with a draw-to-predict web app.

  • Python
  • FastAPI
  • React
  • TypeScript
  • NumPy

FrozenLake RL

Reinforcement learning agents (DQN, DRQN, QR-DRQN, RND-DRQN) for a partially observable, windy grid world, with a full research paper.

  • Python
  • PyTorch
  • Gymnasium

Predicting Diabetes Risk

Modeling the top drivers of diabetes from CDC survey data.

  • Python
  • scikit-learn
  • pandas

Federal Budget Analysis

Predicting federal grant cuts across the UNC system with ML.

  • Python
  • scikit-learn
  • pandas

02 — Experience

Where I've worked

  1. Feb 2026 – Present

    Computational Neuroimaging Research Engineer

    Boerwinkle Lab

    Built a Python pipeline to classify resting-state fMRI components into brain networks, seizure-onset zones, and noise for pediatric epilepsy research.

  2. May 2025 – May 2026

    Predictive Analytics Researcher

    UNC School of Data Science & Society

    Developed a LASSO-based risk screener from 650+ survey records; contributed to a CDC-funded study and was selected to present at NACCHO360 2026.

  3. Jan – Sept 2025

    Data Visualization Intern

    UNC Water Institute

    Built an interactive Tableau dashboard synthesizing 4,000+ studies on environmental health services for policymakers.

03 — Skills

Tools I build with

Languages

  • Python
  • SQL
  • R

Libraries & ML

  • NumPy
  • Pandas
  • Scikit-Learn
  • Matplotlib
  • Seaborn

Web & App

  • React
  • FastAPI

Tools & Platforms

  • Tableau
  • Stata
  • Excel
  • GitHub
  • Claude Code

04 — Beyond Code

Off the clock

Things I do in my free time: Gym, get outside, and rate meals on Beli. Currently hunting for good hiking trails around RTP (recs welcome).

05 — Contact

Let's talk

Open to internships and collaborations in data, health analytics, and software engineering.