Participants gain a theoretical and practical understanding of text analysis methods, and learn how to extract content and derive meaning from digital sources, enabling new humanities scholarship. These courses are in collaboration with the Santa Fe Institute and supported by National Endowment of the Humanities (NEH) Grant (no. HT-272418-20).
Welcome. I am a doctoral candidate in the Applied Science department at Southern Methodist University's Lyle School of Engineering. I create data sets and computational tools for machine learning and data analytics applied to the information and social sciences. This is my portfolio. It has synopses of research, demonstrations of software packages, and samples of pedagogical materials with links to the full course. From here you can visit my GitHub or download my CV.
Analyze word embeddings and collocates to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.
This project addresses a human rights issue in conversation with academic researchers from four departments (Computer Science, Economics, Statistics, and Applied Science), Congress, and the Department of Justice. We use crowd sourcing and human-in-the-loop machine learning techniques to train a neural network and imporve predictive named entity recognition and social network analysis. This work is supported by Congress Bill H.R. 2471.
Use an array of data-mining and statistical approaches to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.
An overview of materials designed for an introductory class by Jo Guldi on applying computation methods for digital history. A link to the full course material is included.
A pipeline for disambiguating speaker names in the 19th-century British Parliamentary debates. This project was supported by National Science Foundation (NSF) Grant (no. 1520103).
The Hansard 19th-Century British Parliamentary Debates: Parsed Debates, N-Gram Counts, Special Vocabulary, and Topics
Automated scripts and an article describing our process for creating an analysis-ready version of the 19th-century Hansard corpus and supplementary material.
Easily access a cleaned version of the c19 Hansard corpus with improved speaker names in the R environment.
An R software package providing functions for extracting grammatical subject-verb-object (SVO) and subject-verb-adjective complement/ adjective modifier (SVA) triples from text. This linguistically improved algorithm has significantly higher precision and recall measures than existing methods.