I am a PhD student at the University of California, Berkeley, affiliated with Berkeley Artificial Intelligence Research (BAIR) and the School of Information. My research intersects natural language processing (NLP) with computational social science and digital humanities (e.g. cultural analytics), with an emphasis on equity and fairness. I am advised by David Bamman.
Underpinning my research is the premise that nearly all language, whether AI- or human-generated, is social and cultural data. Though I publish primarily in NLP venues, my work is deeply interdisciplinary, and my collaborators span fields including psychology, education, and English literature.
My work has been recognized by EECS Rising Stars, Rising Stars in Data Science, an American Educational Research Association (AERA) Best Paper Award, and an NSF Graduate Research Fellowship. I've interned at Microsoft Research and the Allen Institute for AI, and during the latter, I was awarded Outstanding Intern of the Year. Before my PhD, I completed my M.S. and B.S. at Stanford.
I am on the academic job market during the 2024-2025 school year. Here is my CV.
Katie Keith, Naitian Zhou, and I have a podcast, Diaries of Social Data Research, where we chat with researchers on the process behind interdisciplinary papers.
Prospective PhD applicants, especially those from underrepresented backgrounds, are welcome to email me questions about the application process or the PhD experience.
Pronouns: she/her
Recent news:
I publish with my name backwards, so citations should refer to "L. Lucy". I do this because my last name is one of the most common in the world, researchers are often recognized and remembered by last name, and computer vision researcher Fei-Fei Li does this, too. More thoughts from others about names and academia, here.
*equal contribution.AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters
Li Lucy, Suchin Gururangan, Luca Soldaini, Emma Strubell, David Bamman, Lauren Klein, Jesse Dodge.
Association of Computational Linguistics (ACL) 2024.
"One-Size-Fits-All"? Examining Expectations around What Constitute "Fair" or "Good" NLG System Behaviors.
Li Lucy, Su Lin Blodgett, Milad Shokouhi, Hanna Wallach, Alexandra Olteanu.
North American Association for Computational Linguistics (NAACL) 2024.
Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications.
Li Lucy, Jesse Dodge, David Bamman, Katherine A. Keith.
Findings of the Association of Computational Linguistics (ACL) 2023.
Discovering Differences in the Representation of People using Contextualized Semantic Axes.
Li Lucy, Divya Tadimeti, David Bamman.
Empirical Methods in Natural Language Processing (EMNLP) 2022.
Gender and Representation Bias in GPT-3 Generated Stories.
Li Lucy, David Bamman.
Workshop on Narrative Understanding (WNU) at the North American Association for Computational Linguistics (NAACL) 2021.
Characterizing English variation across social media communities with BERT.
Li Lucy, David Bamman.
Transactions of the Association of Computational Linguistics (TACL) 2021.
Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks.
Li Lucy*, Dora Demszky*, Patricia Bromley, Dan Jurafsky.
American Educational Research Association (AERA) Open 2020.
On Classification with Large Language Models in Cultural Analytics.
David Bamman, Kent K. Chang, Li Lucy, Naitian Zhou.
Computational Humanities Research (CHR) 2024.
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images
Sami Baral*, Li Lucy*, Ryan Knight, Alice Ng, Luca Soldaini, Neil Heffernan, Kyle Lo
NeurIPS Workshop on Mathematical Reasoning and AI 2024.
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula.
Li Lucy, Tal August, Rose E. Wang, Luca Soldaini, Courtney Allison, Kyle Lo.
Findings of Empirical Methods in Natural Language Processing (EMNLP) 2024.
"Othering" through War: Depiction of Asians/Asian Americans in U.S. History Textbooks from California and Texas.
Minju Choi*, Li Lucy*, Patricia Bromley, David Bamman.
Educational Researcher 2024 (forthcoming).
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters
Li Lucy, Suchin Gururangan, Luca Soldaini, Emma Strubell, David Bamman, Lauren Klein, Jesse Dodge.
Association of Computational Linguistics (ACL) 2024.
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research.
Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo.
Association of Computational Linguistics (ACL) 2024.
"One-Size-Fits-All"? Examining Expectations around What Constitute "Fair" or "Good" NLG System Behaviors.
Li Lucy, Su Lin Blodgett, Milad Shokouhi, Hanna Wallach, Alexandra Olteanu.
North American Association for Computational Linguistics (NAACL) 2024.
Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications.
Li Lucy, Jesse Dodge, David Bamman, Katherine A. Keith.
Findings of the Association of Computational Linguistics (ACL) 2023.
Discovering Differences in the Representation of People using Contextualized Semantic Axes.
Li Lucy, Divya Tadimeti, David Bamman.
Empirical Methods in Natural Language Processing (EMNLP) 2022.
Gender and Representation Bias in GPT-3 Generated Stories.
Li Lucy, David Bamman.
Workshop on Narrative Understanding (WNU) at the North American Association for Computational Linguistics (NAACL) 2021.
Characterizing English variation across social media communities with BERT.
Li Lucy, David Bamman.
Transactions of the Association of Computational Linguistics (TACL) 2021.
Investigating Causal Effects of Instructions in Crowdsourced Claim Matching.
Emma Lurie, Li Lucy, Masha Belyi, Sofia Dewar, Daniel Rincón, John Baldwin, Rajvardhan Oak.
Computation + Journalism Symposium (C+J) 2020.
Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks.
Li Lucy*, Dora Demszky*, Patricia Bromley, Dan Jurafsky.
American Educational Research Association (AERA) Open 2020.
Using Sentiment Induction to Understand Variation in Gendered Online Communities
Li Lucy, Julia Mendelsohn.
Society for Computation in Linguistics (SCiL) 2019.
Are distributional representations ready for the real world? Evaluating word vectors for grounded perceptual meaning.
Li Lucy, Jon Gauthier.
Language Grounding for Robotics (RoboNLP) Workshop at the Association for Computational Linguistics (ACL) 2017.
I was born in and grew up in Minnesota. My cat's name is Toast. When I was a kid, I wanted to be an ornithologist and a fiction writer.
Looking for something to read? Check out this list of papers from subfields that care about social aspects of NLP.
Thanks Martin Saveski for this website template.