Steph Buongiorno

I am a postdoctoral fellow in Emory University's Data and Decision Sciences department, where I direct Democracy Lab with Jo Guldi and the development of the Democracy Viewer.

Answering challenging questions requires lots of perspectives. This is why my approach to interdisciplinary data science emphasizes researching across traditional academic boundaries to learn from one another while fostering a willingness to experiment in both the arts and the sciences, guided by an open-minded spirit of possibility and discovery.

I perform design-based history research in a way that treats technology as objects of historical inquiry as well as instruments of analysis, forgrounding experimentation and tool-building as modes of knowledge production. Because tools function as epistemic instruments, the design of digital infrastructure shapes which historical questions can be asked and answered, as well as how universities and communities collaborate to build systems that support analysis and knowledge creation.

Using a design-based research approach, my tools have told global history at unprecedented scale, modeling consensus and dissent across Wikipedia’s 355 languages to advance multi-scale analyses of globality, connectivity, and interrelatedness that were previously infeasible. My other project, Democracy Viewer, is a public-history web application that provides citizens and researchers with accessible text-mining tools to examine how democratic discourse has changed over time.

My transdisciplinary scholarship has appeared or is forthcoming with Cambridge University Press, Journal of Early American Studies, Journal of Digital History, Association for the Advancement of Artificial Intelligence (AAAI), and the Institute of Electrical and Electronics Engineers (IEEE). My research has been supported by the National Science Foundation (NSF), the National Endowment for the Humanities (NEH), and the National Institute of Justice (NIJ).

Featured Topics

See all

Artificial Intelligence

Congress

Corpus Linguistics

Cuba

Digital Humanities

Gender

Government Data

Hansard

Human Trafficking

Knowledge Graphs

NLP

Ontologies

Russia

Social Networks

Spy Novels

Video Games

Text Mining for Historical Analysis

Text Mining for Historical Analysis offers a critical intervention into the evolving field of digital history. It introduces "computational historical thinking"-a mode of thinking that explores the epistemological entanglements between computation, theory, and historical analysis, emphasizing how computational procedures actively shape the questions we ask and the meanings we derive from data. Through sustained engagement with historical corpora—such as the 19th-century Hansard debates and contemporary U.S. Congressional Records—this book demonstrates how to attend to both structure and semantics, thus reimagining the relationship between computation and historical knowledge in the digital age.

Democracy Viewer

Democracy Viewer is an open-source text mining application that enables analysts to explore and interpret humanities texts using techniques like word counts, TF-IDF, and word embeddings. It supports both distant and close reading. Analysts can upload their own datasets or work with curated collections available on the platform. Democracy Viewer also provides access to open government data, including U.S. Congressional records, making public texts more accessible for research and civic engagement.

Mapping the Elusive: Using Network Analysis to Understand Slavery, Debt Relations, and the Emergence of a Free Population of African Descent in 18th-Century Colonial Havana

Transcripts of notarial records from the Fondo Escribanías in the National Archive of Cuba, an endangered repository. The transcripts document pawnship and selling practices involving enslaved people in Havana, Cuba during the 17th and 18th centuries. These records are important for analyzing how the enslaved person functioned as a “social connector,” linking a wide range of creditors and debtors, buyers and sellers, through contracts in the colonial urban economy.

Who is the Working Class, Socialist James Bond? Exploring Propaganda through Knowledge Graphs and AI-Assisted Forensic Reading in Bulgarian Spy Fiction

The Cold War between the "free world" and the Communist Bloc involved multiple fronts, including intense literary warfare. In the bloc, authors crafted characters that reflected traits and ideals promoted by those regimes. To deepen our understanding of these works and their ideological underpinnings, we employ an original method we call "AI-assisted forensic reading," using advanced natural language processing, knowledge graphs, and artificial intelligence. Our approach uncovers new knowledge in the target literature by illuminating how these authors construct meaning, disseminate propaganda, and mirror idealized traits or real-life events, likely under the influence or direction of intelligence agency leaders.

Foundations and Applications of Humanities Analytics

Computational methods allow researchers to systematically analyze and interpret large volumes of social, political, and cultural data, uncovering underlying patterns and insights at scale. These course materials, made for the Santa Fe Institute, are designed to equip humanities researchers with computational and quantitative tools. The course aims to foster a supportive community, build practical skills, and diversify the field of humanities analytics by welcoming participants from various backgrounds and stages of their academic careers.

Database Escrituras Protocolos 1640 a 44 y 50 and 1730 a 1733

Transcripts of notarial records preserved from an endangered colonial archive that documents the selling and pawnship practices involving enslaved people in Havana, Cuba during the 17th and 18th centuries.

Word Embeddings as a Key to the Study of Bias, Race, and Gender in Congress, 1880-2010

Word embeddings reveal how Congressional language around bias, race, and gender shifted from 1880 to 2010. From 1880 to 1970, “bias” was linked to personal emotion and partisanship; after 1975, it became associated with systemic issues like racism, sexism, and gerrymandering. Vector subtraction techniques show that early references to women emphasized suffrage and labor, while post-1970 discourse focused on reproduction and sexuality, with terms like “unwed,” “contraceptives,” and “clinics.” These changes reflect a broader shift toward identity-based and structural understandings of inequality in political speech.

The Congress Viewer Demo App

The Congress Viewer (years 1900 - 2000), a prototype text mining app, demonstrates the potential of tools designed to measure lexical changes, including advanced NLP techniques like parsing and analyzing grammatical relationships. This app can increase transparency in Congress while also providing new insights into the evolution and nature of political language across various contexts, including different time periods and discourse communities.

Beyond the Black Box: Toward Transparent AI for Computational Text Analysis in the Digital Humanities

This article introduces Critical Generative Interpretation, a method that supports humanist inquiry by making AI-generated insights traceable and grounded in textual evidence. By linking large language model (LLM) outputs to structured knowledge graphs derived from source texts, the method enables scholars to critically assess where generated interpretations come from and how they relate to the original material. This methodology supports humanist inquiry through close reading. Through a case study of Harold and the Purple Crayon, the article shows how this approach fosters interpretive engagement and makes AI a method for humanistic knowledge production.