Participants gain a theoretical and practical understanding of text analysis methods, and learn how to extract content and derive meaning from digital sources, enabling new humanities scholarship. These courses are in collaboration with the Santa Fe Institute and supported by National Endowment of the Humanities (NEH) Grant (no. HT-272418-20).
Steph Buongiorno, PhD, is a postdoctoral research fellow at Southern Methodist University, Guildhall. Buongiorno engages in transdisciplinary, computational research for the purpose of innovative and holistic problem-solving. Her work involves combining multiple disciplines to generate new knowledge beyond the boundaries of individual fields. At Guildhall, Buongiorno focuses on two main areas of research that aim to address large-scale problems. Her primary area of research uses human computation gaming to develop tools for fighting human trafficking using human-in-the-loop machine learning techniques. The second area of research revolves around the development of an autonomous agent system driven by large language models (LLMs)--a system that can be used to support research, automation, and a multitude of other activities. Visit GitHub Download CV.
Analyze word embeddings and collocates to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.
This project addresses a human rights issue in conversation with academic researchers from four departments (Computer Science, Economics, Statistics, and Applied Science), Congress, and the Department of Justice. We use crowd sourcing and human-in-the-loop machine learning techniques to train a neural network and imporve predictive named entity recognition and social network analysis. This work is supported by the National Institute of Justice (NIJ).
Use an array of data-mining and statistical approaches to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.
usdoj fetches data from the United States Department of Justice API such as press releases, blog entries, and speeches. Optional parameters allow users to specify the number of results starting from the earliest or latest entries, and whether these results contain keywords. Data is cleaned for analysis and returned in a dataframe.
oldbailey fetches historical trial data from the Old Bailey API (April 13, 1674 - April 1, 1913). It parses and resolves ambiguous and inconsistent XML while adding valuable metadata, such as the name of the first-person speaker. It returns an analysis-ready data frame with fields including speaker name, victim name, defendant name, their genders, crime location and date, and more!
An overview of materials designed for an introductory class by Jo Guldi on applying computation methods for digital history. A link to the full course material is included.
A pipeline for disambiguating speaker names in the 19th-century British Parliamentary debates. This project was supported by National Science Foundation (NSF) Grant (no. 1520103).
The Hansard 19th-Century British Parliamentary Debates: Parsed Debates, N-Gram Counts, Special Vocabulary, and Topics
Automated scripts and an article describing our process for creating an analysis-ready version of the 19th-century Hansard corpus and supplementary material.