Cathy Jiao

Email: cljiao@cs.cmu.edu
I am a PhD student at the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University, advised by Chenyan Xiong. Recently, I spent a wonderful summer in NYC at Spotify Research, hosted by Paul Bennett.
My research focuses on data-centric AI: designing methods/frameworks to better understand, curate, and evaluate the data for large language models. A central thread of my work is data attribution – identifying how individual data points shape model outputs – which I explore through efficient approximations and practical applications such as dataset curation and data valuation/pricing. More broadly, I aim to develop frameworks that make data usage more transparent, reliable, and impactful for both research and deployment.
Previously, I finished my masters at CMU LTI where I was advised by Maxine Eskenazi and Aaron Steinfeld. Before grad school, I spent time in industry working on machine learning and deep learning applications for natural language processing. Prior to that, I graduated with distinction from the University of British Columbia with a B.S. in CS & Math.
In my spare time, I enjoy cooking and biking around Pittsburgh. If you work on similar topics or want to chat, feel free to reach out!