Government Data Hansard

The 19th-Century British Hansard Corpus

We present “The Hansard 19th-Century British Parliamentary Debates with Improved Speaker Names: Parsed Debates, N-Gram Counts, Special Vocabulary, Collocates, and Topics.” Our intention is to provide researchers with an analysis-ready version of the 19th-century Hansard debates (and supplementary material) to enhance analyses of the era.

We made two major improvements while creating this corpus: we identified debates that had missing entries in UK Parliament’s records and we added a field for the names of disambiguated speakers. Our process for disambiguated speakers is recorded as a separate project, here .

To discover missing debates we identified systematic issues within Parliament’s digital version of Hansard where XML tags were missing or erroneous. Debate text, for example, might be tagged as a debate title or as a speaker. In other cases, sections were missing tags in their entirety. While the mistagged data was merely buried, not lost to time, the errors in the XML tags were problematic enough to cause the data to miss being indexed by UK Parliament, and to result in missing entries from the corpus.

Automated scripts for producing our corpus can be found on our repository . An article

This work was supported in part by National Science Foundation (NSF) Grant (no. 1520103).

Text Mining for Historical Analysis

Text Mining for Historical Analysis offers a critical intervention into the evolving field of digital history. It introduces "computational historical thinking"-a mode of thinking that explores the epistemological entanglements between computation, theory, and historical analysis, emphasizing how computational procedures actively shape the questions we ask and the meanings we derive from data. Through sustained engagement with historical corpora—such as the 19th-century Hansard debates and contemporary U.S. Congressional Records—this book demonstrates how to attend to both structure and semantics, thus reimagining the relationship between computation and historical knowledge in the digital age.

Democracy Viewer

Democracy Viewer is an open-source text mining application that enables analysts to explore and interpret humanities texts using techniques like word counts, TF-IDF, and word embeddings. It supports both distant and close reading. Analysts can upload their own datasets or work with curated collections available on the platform. Democracy Viewer also provides access to open government data, including U.S. Congressional records, making public texts more accessible for research and civic engagement.

Who is the Working Class, Socialist James Bond? Exploring Propaganda through Knowledge Graphs and AI-Assisted Forensic Reading in Bulgarian Spy Fiction

The Cold War between the "free world" and the Communist Bloc involved multiple fronts, including intense literary warfare. In the bloc, authors crafted characters that reflected traits and ideals promoted by those regimes. To deepen our understanding of these works and their ideological underpinnings, we employ an original method we call "AI-assisted forensic reading," using advanced natural language processing, knowledge graphs, and artificial intelligence. Our approach uncovers new knowledge in the target literature by illuminating how these authors construct meaning, disseminate propaganda, and mirror idealized traits or real-life events, likely under the influence or direction of intelligence agency leaders.

The 19th-Century British Hansard Corpus

You may also like

Text Mining for Historical Analysis

Democracy Viewer

Who is the Working Class, Socialist James Bond? Exploring Propaganda through Knowledge Graphs and AI-Assisted Forensic Reading in Bulgarian Spy Fiction