Steph Buongiorno

Steph Buongiorno, PhD, is a researcher and a teacher of data science and high performance computing. Her transdisciplinary work generates new knowledge beyond the boundaries of individual fields and opens up new approaches to interpreting texts, data, and culture.

She enjoys poetry, all sorts of sensory things, as well as quiet, underwater worlds.

Featured Works

See all

Apps

Articles

Artificial Intelligence

Books

Congress

Course Materials

Cuba

Datasets

Digital Humanities

Government Data

Hansard

Knowledge Graphs

Text Mining for Historical Analysis

Text Mining for Historical Analysis offers a critical intervention into the evolving field of digital history. It introduces "computational historical thinking"-a mode of thinking that explores the epistemological entanglements between computation, theory, and historical analysis, emphasizing how computational procedures actively shape the questions we ask and the meanings we derive from data. Through sustained engagement with historical corpora—such as the 19th-century Hansard debates and contemporary U.S. Congressional Records—this book demonstrates how to attend to both structure and semantics, thus reimagining the relationship between computation and historical knowledge in the digital age.

Democracy Viewer

Democracy Viewer is an open-source text mining application that enables analysts to explore and interpret humanities texts using techniques like word counts, TF-IDF, and word embeddings. It supports both distant and close reading. Analysts can upload their own datasets or work with curated collections available on the platform. Democracy Viewer also provides access to open government data, including U.S. Congressional records, making public texts more accessible for research and civic engagement.

Beyond the Black Box: Toward Transparent AI for Computational Text Analysis in the Digital Humanities

This article introduces Critical Generative Interpretation, a method that supports humanist inquiry by making AI-generated insights traceable and grounded in textual evidence. By linking large language model (LLM) outputs to structured knowledge graphs derived from source texts, the method enables scholars to critically assess where generated interpretations come from and how they relate to the original material. This methodology supports humanist inquiry through close reading. Through a case study of Harold and the Purple Crayon, the article shows how this approach fosters interpretive engagement and makes AI a method for humanistic knowledge production.

Foundations and Applications of Humanities Analytics

Computational methods allow researchers to systematically analyze and interpret large volumes of social, political, and cultural data, uncovering underlying patterns and insights at scale. These course materials, made for the Santa Fe Institute, are designed to equip humanities researchers with computational and quantitative tools. The course aims to foster a supportive community, build practical skills, and diversify the field of humanities analytics by welcoming participants from various backgrounds and stages of their academic careers.

Database Escrituras Protocolos 1640 a 44 y 50 and 1730 a 1733

Transcripts of notarial records preserved from an endangered colonial archive that documents the selling and pawnship practices involving enslaved people in Havana, Cuba during the 17th and 18th centuries.

Word Embeddings as a Key to the Study of Bias, Race, and Gender in Congress, 1880-2010

Word embeddings reveal how Congressional language around bias, race, and gender shifted from 1880 to 2010. From 1880 to 1970, “bias” was linked to personal emotion and partisanship; after 1975, it became associated with systemic issues like racism, sexism, and gerrymandering. Vector subtraction techniques show that early references to women emphasized suffrage and labor, while post-1970 discourse focused on reproduction and sexuality, with terms like “unwed,” “contraceptives,” and “clinics.” These changes reflect a broader shift toward identity-based and structural understandings of inequality in political speech.

The Congress Viewer Demo App

The Congress Viewer (years 1900 - 2000), a prototype text mining app, demonstrates the potential of tools designed to measure lexical changes, including advanced NLP techniques like parsing and analyzing grammatical relationships. This app can increase transparency in Congress while also providing new insights into the evolution and nature of political language across various contexts, including different time periods and discourse communities.

Mapping the Elusive: Using Network Analysis to Understand Slavery, Debt Relations, and the Emergence of a Free Population in 18th-Century Colonial Havana

Transcripts of notarial records from the Fondo Escribanías in the National Archive of Cuba, an endangered repository. The transcripts document pawnship and selling practices involving enslaved people in Havana, Cuba during the 17th and 18th centuries. These records are important for analyzing how the enslaved person functioned as a “social connector,” linking a wide range of creditors and debtors, buyers and sellers, through contracts in the colonial urban economy.

PANGeA

PANGeA is a system that uses large language models (LLMs) to create narrative content for turn-based RPGs based on game designers' high-level criteria. It introduces a novel validation system for handling free-form text input during development and gameplay, employing "self-reflection" techniques, enabling small/local LLMs to perform comparably to foundational models. It enriches player-NPC interactions by generating personality-biased non-playable characters (NPCs). It improves AI accuracy through crowdsourcing mechanics. PANGeA houses a server with a custom memory system that provides context for LLM generation. The server's REST interface enables integration with any game engine.