About Me
My research sits at the intersection of AI, human labor, and data infrastructure. I study how large language models can be used as tools for social science — how prompt design shapes annotation accuracy, how training data bias propagates into automated detection systems, and what it means to use off-the-shelf AI models responsibly for research. I’m also interested in the data infrastructure that makes social science possible: how researchers discover and access data, how data archives function, and how data access and reuse impact science and careers. I’m an Associate Professor at UMSI and direct the Resource Center for Minority Data and Social Media Archive at ICPSR.
I hold several positions at the University of Michigan:
Faculty
- Associate Professor, School of Information
- Research Associate Professor, Institute for Social Research
- Associate Professor, Digital Studies Institute, College of Literature, Science, and the Arts
Leadership
- Director, Resource Center for Minority Data, ICPSR
- Director, Social Media Archive, ICPSR
Selected Publications
Data Curation and Archiving
- Wofford, M. F., Thomer, A. K., Hemphill, L., Polasek, K., & Yakel, E. (2025). Valuing curation infrastructures. Journal of the Association for Information Science and Technology, asi.70015. doi: 10.1002/asi.70015
- Brown, M.A., Thomer, A., & Hemphill, L. (2025) “Unnecessarily cumbersome”: Researchers’ Opinions on Restricted Data Access Systems. Proceedings of the Association for Information Science and Technology (ASIS&T). doi: 10.1002/pra2.1308
- Hemphill, L., Schöpke-Gonzalez, A., & Panda, A. (2022). Comparative sensitivity of social media data and their acceptable use in research. Scientific Data, 9(1), 643. doi: 10.1038/s41597-022-01773-w
Generative AI and Social Science
- Schöpke-Gonzalez, A., Kim, N., & Hemphill, L. (2025) Embracing Training Dataset Bias for Automated Harmful Detection. Proceedings of the Association for Information Science and Technology (ASIS&T). doi: 10.1002/pra2.1282
- Atreja, S., Ashkinaze, J., Li, L., Mendelsohn, J., and Hemphill, L. (2025). What’s in a Prompt?: A Large-Scale Experiment to Assess the Impact of Prompt Design on the Compliance and Accuracy of LLM-generated Text Annotations. 19th International AAAI Conference on Web and Social Media (ICWSM 2025). doi: 10.1609/icwsm.v19i1.35807
- Fan, L., Li, L., Ma, Z., Lee, S., Yu, H., & Hemphill, L. (2024). A Bibliometric Review of Large Language Models Research from 2017 to 2023. Transactions on Intelligent Systems and Technology. doi: 10.1145/3664930
Science of Science
- Lafia, S., Fan, L., Thomer, A.K., and Hemphill, L. (2022) Subdivisions and Crossroads: Identifying Hidden Community Structures in a Data Archive’s Citation Network. Quantitative Science Studies. doi: 10.1162/qss_a_00209.
Social Media and Harmful Behavior
- Schöpke-Gonzalez, A., Wu, S., Kumar, S., & Hemphill, L. (2025). Using off-the-shelf harmful content detection models: Best practices for model reuse. Proceedings of the ACM on Human-Computer Interaction, 9(2), 1–27. doi: 10.1145/3711099
- Li, L., Fan, L., Atreja, S., & Hemphill, L. (2024). “HOT” ChatGPT: The Promise of ChatGPT in Detecting and Discriminating Hateful, Offensive, and Toxic Comments on Social Media. ACM Transactions on the Web, 18(2), 1–36. doi: 10.1145/3643829
- Schöpke-Gonzalez, A., Atreja, S., Shin, H. N., Ahmed, N., & Hemphill, L. (2022). Why do volunteer content moderators quit? Burnout, conflict, and harmful behaviors. New Media & Society. doi: 10.1177/14614448221138