Cathy Jiao
I am a PhD student at the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University, advised by Chenyan Xiong. Recently, I spent a wonderful summer in NYC at Spotify Research, hosted by Paul Bennett.
My research focuses on data-centric AI. A central thread of my work is data attribution: quantifying how data influences model training in foundation models, which I explore through efficient approximations and practical applications such as dataset curation and data valuation/pricing. More broadly, I aim to develop frameworks that make data usage more transparent, reliable, and impactful for both research and deployment of foundation models.
Previously, I finished my MS at CMU LTI where I worked on dialgue systems, advised by Maxine Eskenazi and Aaron Steinfeld. Before grad school, I spent time in industry working on machine learning and deep learning applications for natural language processing. Prior to that, I graduated with distinction from the University of British Columbia with a B.S. in CS & Math.
In my spare time, I enjoy biking around Pittsburgh and cooking. If you work on similar topics or want to chat, feel free to reach out!
News
| Nov 01, 2025 | |
|---|---|
| Sep 18, 2025 | |
| Sep 18, 2025 | |
| Feb 01, 2025 | |