Cathy Jiao

Email: cljiao@cs.cmu.edu
Hello! I am a PhD student at the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University, advised by Chenyan Xiong.
I work broadly on data attribution for large language models (i.e., determining the contribution of training data samples towards model outputs). My current work investigates approximations for data attribution methods, with applications for dataset curation, and data valuation/pricing. Recently, I’m also interested in methodologies for evaluating data attribution methods.
Previously, I finished my masters at CMU LTI where I was advised by Maxine Eskenazi and Aaron Steinfeld. Before grad school, I spent time in industry working on machine learning and deep learning applications for natural language processing. Prior to that, I graduated with distinction from the University of British Columbia with a B.S. in Computer Science and Mathematics.
In my spare time, I enjoy cooking and biking around Pittsburgh. If you work on similar topics or want to chat, feel free to reach out!
Publications
*= equal contribution