Lucy Li

About Me

I am a final-year PhD student at the University of California, Berkeley, affiliated with Berkeley Artificial Intelligence Research (BAIR) and the School of Information. I am advised by David Bamman.

My research intersects natural language processing (NLP) with computational social science and humanities (e.g. cultural analytics), with an emphasis on responsible AI. Underpinning my work is the premise that nearly all language, whether AI- or human-generated, is social and cultural data.

Starting Fall 2026, I'll be an assistant prof at the University of Wisconsin-Madison's Computer Sciences department. Before then, I'll postdoc at the University of Washington, working with Yulia Tsvetkov and Noah Smith.

I'm recruiting PhD students for Fall 2026.

My CV.

I'm on Bluesky.

Recent news:

May 2025. Invited talk at Stanford hosted by the Asian Staff Forum & Filipino American Community. Happy AANHPI month!
May 2025. A paper on literary topic modeling has been accepted to ACL Findings!
April 2025. DrawEduMath won an outstanding paper award at NAACL!
April 2025. Invited talk at UIUC for the "Generative AI and the Future of Research" speaker series.
Jan 2025. An archival version of our NeurIPS MATH-AI workshop paper was accepted to NAACL.
Dec 2024. A new paper with collaborators at the Stanford School of Education and Stanford Psychology was accepted to the Journal of Cultural Analytics.
Oct 2024: A paper w/ the Allen Institute for AI was accepted to the MATH-AI workshop at NeurIPS.
Sept 2024: Invited talk at Max Planck Institute for Software Systems.
Sept 2024: A paper w/ my Berkeley labmates was accepted to the Computational Humanities Research conference.
Sept 2024: A paper w/ the Allen Institute for AI was accepted into Findings of EMNLP.
Sept 2024: Invited talk at the University of Massachusetts, Amherst.

Katie Keith, Naitian Zhou, and I have a podcast, Diaries of Social Data Research, where we chat with researchers on the process behind interdisciplinary papers.

Publications

I publish with my name backwards, so citations should refer to "L. Lucy". I do this because my last name is one of the most common in the world, researchers are often recognized and remembered by last name, and computer vision researcher Fei-Fei Li does this, too. More thoughts from others about names and academia, here.

"Selected" papers aren't necessarily my favorite papers -- my favorites change day-to-day 😊. Instead, the shorter list is curated to make it easier for you to skim the range of stuff I work on.

*equal contribution.

Selected
All

Tell, Don't Show: Leveraging Language Models' Abstractive Retellings to Model Literary Themes.

Li Lucy, Camilla Griffiths, Sarah Levine, Jennifer L Eberhardt, Dorottya Demszky, David Bamman.

Findings of the Association of Computational Linguistics (ACL), 2025.

Paper

DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images

Sami Baral*, Li Lucy*, Ryan Knight, Alice Ng, Luca Soldaini, Neil Heffernan, Kyle Lo

North American Association for Computational Linguistics (NAACL) 2025.

Paper

On Classification with Large Language Models in Cultural Analytics.

David Bamman, Kent K. Chang, Li Lucy, Naitian Zhou.

Computational Humanities Research (CHR) 2024.

Paper

AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters

Li Lucy, Suchin Gururangan, Luca Soldaini, Emma Strubell, David Bamman, Lauren Klein, Jesse Dodge.

Association of Computational Linguistics (ACL) 2024.

Paper Code Data

"One-Size-Fits-All"? Examining Expectations around What Constitute "Fair" or "Good" NLG System Behaviors.

Li Lucy, Su Lin Blodgett, Milad Shokouhi, Hanna Wallach, Alexandra Olteanu.

North American Association for Computational Linguistics (NAACL) 2024.

Paper

Gender and Representation Bias in GPT-3 Generated Stories.

Li Lucy, David Bamman.

Workshop on Narrative Understanding (WNU) at the North American Association for Computational Linguistics (NAACL) 2021.

Paper Code

Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks.

Li Lucy*, Dora Demszky*, Patricia Bromley, Dan Jurafsky.

American Educational Research Association (AERA) Open 2020.

Paper Slides Code

Tell, Don't Show: Leveraging Language Models' Abstractive Retellings to Model Literary Themes.

Li Lucy, Camilla Griffiths, Sarah Levine, Jennifer L Eberhardt, Dorottya Demszky, David Bamman.

Findings of the Association of Computational Linguistics (ACL), 2025.

Paper

DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images

Sami Baral*, Li Lucy*, Ryan Knight, Alice Ng, Luca Soldaini, Neil Heffernan, Kyle Lo

North American Association for Computational Linguistics (NAACL) 2025.

Paper

Racial and Ethnic Representation in Literature Taught in US High Schools.

Li Lucy, Camilla Griffiths, Claire Ying, JJ Kim-Ebio, Sabrina Baur, Sarah Levine, Jennifer Eberhardt, David Bamman, Dora Demszky.

Journal of Cultural Analytics 2025.

Paper

"Othering" through War: Depiction of Asians/Asian Americans in U.S. History Textbooks from California and Texas.

Minju Choi*, Li Lucy*, Patricia Bromley, David Bamman.

Educational Researcher 2025.

Paper

On Classification with Large Language Models in Cultural Analytics.

David Bamman, Kent K. Chang, Li Lucy, Naitian Zhou.

Computational Humanities Research (CHR) 2024.

Paper

Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula.

Li Lucy, Tal August, Rose E. Wang, Luca Soldaini, Courtney Allison, Kyle Lo.

Findings of Empirical Methods in Natural Language Processing (EMNLP) 2024.

Paper Code Data

AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters

Li Lucy, Suchin Gururangan, Luca Soldaini, Emma Strubell, David Bamman, Lauren Klein, Jesse Dodge.

Association of Computational Linguistics (ACL) 2024.

Paper Code Data

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research.

Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo.

Association of Computational Linguistics (ACL) 2024.

Paper Code Data

"One-Size-Fits-All"? Examining Expectations around What Constitute "Fair" or "Good" NLG System Behaviors.

Li Lucy, Su Lin Blodgett, Milad Shokouhi, Hanna Wallach, Alexandra Olteanu.

North American Association for Computational Linguistics (NAACL) 2024.

Paper

Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications.

Li Lucy, Jesse Dodge, David Bamman, Katherine A. Keith.

Findings of the Association of Computational Linguistics (ACL) 2023.

Paper Code

Discovering Differences in the Representation of People using Contextualized Semantic Axes.

Li Lucy, Divya Tadimeti, David Bamman.

Empirical Methods in Natural Language Processing (EMNLP) 2022.

Paper Code

Gender and Representation Bias in GPT-3 Generated Stories.

Li Lucy, David Bamman.

Workshop on Narrative Understanding (WNU) at the North American Association for Computational Linguistics (NAACL) 2021.

Paper Code

Characterizing English variation across social media communities with BERT.

Li Lucy, David Bamman.

Transactions of the Association of Computational Linguistics (TACL) 2021.

Paper Code

Investigating Causal Effects of Instructions in Crowdsourced Claim Matching.

Emma Lurie, Li Lucy, Masha Belyi, Sofia Dewar, Daniel Rincón, John Baldwin, Rajvardhan Oak.

Computation + Journalism Symposium (C+J) 2020.

Paper

Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks.

Li Lucy*, Dora Demszky*, Patricia Bromley, Dan Jurafsky.

American Educational Research Association (AERA) Open 2020.

Paper Slides Code

Using Sentiment Induction to Understand Variation in Gendered Online Communities

Li Lucy, Julia Mendelsohn.

Society for Computation in Linguistics (SCiL) 2019.

Paper

Are distributional representations ready for the real world? Evaluating word vectors for grounded perceptual meaning.

Li Lucy, Jon Gauthier.

Language Grounding for Robotics (RoboNLP) Workshop at the Association for Computational Linguistics (ACL) 2017.

Paper

Miscellaneous

I was born in and grew up in Minnesota.

When I was 7 years old, I wanted to be an ornithologist, and when I was 9, I wanted to be a fiction writer.

I have a cat named Toast (her middle name is Muffin).

I feel a sense of impending doom around moving across states with my plant collection.

Thanks Martin Saveski for this website template.