2 Posts

Apps

Return to top
The Congress Viewer Demo App

The Congress Viewer Demo App

Analyze word embeddings and collocates to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.

The Hansard Viewer Demo App

The Hansard Viewer Demo App

Use an array of data-mining and statistical approaches to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.

2 Posts

Congress

Return to top
The Congressional Data Scraper

The Congressional Data Scraper

Export an analysis-ready version of the Daily Editions of the U.S. Congressional Records.

Text Mining for Historical Analysis: A Book on Methods

Text Mining for Historical Analysis: A Book on Methods

This methods book provides a practical introduction to the R programming language for text mining historical records. And more than just a code cookbook, it offers a critical perspective to handling our human history. It is the companion guide to The Dangerous Art of Text Mining by Jo Guldi.

1 Posts

Data Scrapers

Return to top
The Congressional Data Scraper

The Congressional Data Scraper

Export an analysis-ready version of the Daily Editions of the U.S. Congressional Records.

5 Posts

Digital Humanities

Return to top
Course Materials: Foundations and Applications of Humanities Analytics

Course Materials: Foundations and Applications of Humanities Analytics

Participants gain a theoretical and practical understanding of text analysis methods, and learn how to extract content and derive meaning from digital sources, enabling new humanities scholarship. These courses are in collaboration with the Santa Fe Institute and supported by National Endowment of the Humanities (NEH) Grant (no. HT-272418-20).

The Congress Viewer Demo App

The Congress Viewer Demo App

Analyze word embeddings and collocates to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.

The Hansard Viewer Demo App

The Hansard Viewer Demo App

Use an array of data-mining and statistical approaches to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.

dhmeasures: A Software Package in R

dhmeasures: A Software Package in R

An R software package providing functions for Placeholder description

Text Mining for Historical Analysis: A Book on Methods

Text Mining for Historical Analysis: A Book on Methods

This methods book provides a practical introduction to the R programming language for text mining historical records. And more than just a code cookbook, it offers a critical perspective to handling our human history. It is the companion guide to The Dangerous Art of Text Mining by Jo Guldi.

6 Posts

Hansard

Return to top
The Congress Viewer Demo App

The Congress Viewer Demo App

Analyze word embeddings and collocates to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.

The Hansard Viewer Demo App

The Hansard Viewer Demo App

Use an array of data-mining and statistical approaches to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.

Speaker Name Disambiguation in the Hansard 19th-Century British Parliamentary Debates

Speaker Name Disambiguation in the Hansard 19th-Century British Parliamentary Debates

A pipeline for disambiguating speaker names in the 19th-century British Parliamentary debates. This project was supported by National Science Foundation (NSF) Grant (no. 1520103).

The Hansard 19th-Century British Parliamentary Debates: Parsed Debates, N-Gram Counts, Special Vocabulary, and Topics

The Hansard 19th-Century British Parliamentary Debates: Parsed Debates, N-Gram Counts, Special Vocabulary, and Topics

Automated scripts and an article describing our process for creating an analysis-ready version of the 19th-century Hansard corpus and supplementary material.

hansardr: A Software Package in R

hansardr: A Software Package in R

Easily access a cleaned version of the c19 Hansard corpus with improved speaker names in the R environment.

Text Mining for Historical Analysis: A Book on Methods

Text Mining for Historical Analysis: A Book on Methods

This methods book provides a practical introduction to the R programming language for text mining historical records. And more than just a code cookbook, it offers a critical perspective to handling our human history. It is the companion guide to The Dangerous Art of Text Mining by Jo Guldi.

7 Posts

History

Return to top
The Congress Viewer Demo App

The Congress Viewer Demo App

Analyze word embeddings and collocates to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.

The Hansard Viewer Demo App

The Hansard Viewer Demo App

Use an array of data-mining and statistical approaches to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.

Course Materials: Text Mining as Historical Method

Course Materials: Text Mining as Historical Method

An overview of materials designed for an introductory class by Jo Guldi on applying computation methods for digital history. A link to the full course material is included.

Speaker Name Disambiguation in the Hansard 19th-Century British Parliamentary Debates

Speaker Name Disambiguation in the Hansard 19th-Century British Parliamentary Debates

A pipeline for disambiguating speaker names in the 19th-century British Parliamentary debates. This project was supported by National Science Foundation (NSF) Grant (no. 1520103).

The Hansard 19th-Century British Parliamentary Debates: Parsed Debates, N-Gram Counts, Special Vocabulary, and Topics

The Hansard 19th-Century British Parliamentary Debates: Parsed Debates, N-Gram Counts, Special Vocabulary, and Topics

Automated scripts and an article describing our process for creating an analysis-ready version of the 19th-century Hansard corpus and supplementary material.

hansardr: A Software Package in R

hansardr: A Software Package in R

Easily access a cleaned version of the c19 Hansard corpus with improved speaker names in the R environment.

Text Mining for Historical Analysis: A Book on Methods

Text Mining for Historical Analysis: A Book on Methods

This methods book provides a practical introduction to the R programming language for text mining historical records. And more than just a code cookbook, it offers a critical perspective to handling our human history. It is the companion guide to The Dangerous Art of Text Mining by Jo Guldi.

1 Posts

Human in the Loop

Return to top
The Human Tafficking Project

The Human Tafficking Project

This project addresses a human rights issue in conversation with academic researchers from four departments (Computer Science, Economics, Statistics, and Applied Science), Congress, and the Department of Justice. We use crowd sourcing and human-in-the-loop machine learning techniques to train a neural network and imporve predictive named entity recognition and social network analysis. This work is supported by the National Institute of Justice (NIJ).

2 Posts

Journal Article

Return to top
The Hansard 19th-Century British Parliamentary Debates: Parsed Debates, N-Gram Counts, Special Vocabulary, and Topics

The Hansard 19th-Century British Parliamentary Debates: Parsed Debates, N-Gram Counts, Special Vocabulary, and Topics

Automated scripts and an article describing our process for creating an analysis-ready version of the 19th-century Hansard corpus and supplementary material.

Syntactic Dependency Relationships and the Extraction of Grammatical Triples

Syntactic Dependency Relationships and the Extraction of Grammatical Triples

This paper describes a method of triples extraction, posextract, which has been designed to meet the increasing need for high-accuracy triples outputs for the analysis of text.

1 Posts

Machine Learning

Return to top
The Human Tafficking Project

The Human Tafficking Project

This project addresses a human rights issue in conversation with academic researchers from four departments (Computer Science, Economics, Statistics, and Applied Science), Congress, and the Department of Justice. We use crowd sourcing and human-in-the-loop machine learning techniques to train a neural network and imporve predictive named entity recognition and social network analysis. This work is supported by the National Institute of Justice (NIJ).

3 Posts

NLP

Return to top
The Human Tafficking Project

The Human Tafficking Project

This project addresses a human rights issue in conversation with academic researchers from four departments (Computer Science, Economics, Statistics, and Applied Science), Congress, and the Department of Justice. We use crowd sourcing and human-in-the-loop machine learning techniques to train a neural network and imporve predictive named entity recognition and social network analysis. This work is supported by the National Institute of Justice (NIJ).

posextractr: A Software Package in R

posextractr: A Software Package in R

An R software package providing functions for extracting grammatical subject-verb-object (SVO) and subject-verb-adjective complement/ adjective modifier (SVA) triples from text. This linguistically improved algorithm has significantly higher precision and recall measures than existing methods.

posextract: A Software Package in Python

posextract: A Software Package in Python

An Python software package providing functions for extracting grammatical subject-verb-object (SVO) and subject-verb-adjective complement/ adjective modifier (SVA) triples from text. This linguistically improved algorithm has significantly higher precision and recall measures than existing methods.

3 Posts

Pedagogy

Return to top
Course Materials: Foundations and Applications of Humanities Analytics

Course Materials: Foundations and Applications of Humanities Analytics

Participants gain a theoretical and practical understanding of text analysis methods, and learn how to extract content and derive meaning from digital sources, enabling new humanities scholarship. These courses are in collaboration with the Santa Fe Institute and supported by National Endowment of the Humanities (NEH) Grant (no. HT-272418-20).

Course Materials: Text Mining as Historical Method

Course Materials: Text Mining as Historical Method

An overview of materials designed for an introductory class by Jo Guldi on applying computation methods for digital history. A link to the full course material is included.

Text Mining for Historical Analysis: A Book on Methods

Text Mining for Historical Analysis: A Book on Methods

This methods book provides a practical introduction to the R programming language for text mining historical records. And more than just a code cookbook, it offers a critical perspective to handling our human history. It is the companion guide to The Dangerous Art of Text Mining by Jo Guldi.

2 Posts

Python

Return to top
Course Materials: Text Mining as Historical Method

Course Materials: Text Mining as Historical Method

An overview of materials designed for an introductory class by Jo Guldi on applying computation methods for digital history. A link to the full course material is included.

posextract: A Software Package in Python

posextract: A Software Package in Python

An Python software package providing functions for extracting grammatical subject-verb-object (SVO) and subject-verb-adjective complement/ adjective modifier (SVA) triples from text. This linguistically improved algorithm has significantly higher precision and recall measures than existing methods.

6 Posts

R

Return to top
usdoj: R Library for Accessing U.S. Department of Justice (DOJ) Open Data

usdoj: R Library for Accessing U.S. Department of Justice (DOJ) Open Data

usdoj fetches data from the United States Department of Justice API such as press releases, blog entries, and speeches. Optional parameters allow users to specify the number of results starting from the earliest or latest entries, and whether these results contain keywords. Data is cleaned for analysis and returned in a dataframe.

oldbailey: R Library for Accessing Historical Old Bailey Trial Data

oldbailey: R Library for Accessing Historical Old Bailey Trial Data

oldbailey fetches historical trial data from the Old Bailey API (April 13, 1674 - April 1, 1913). It parses and resolves ambiguous and inconsistent XML while adding valuable metadata, such as the name of the first-person speaker. It returns an analysis-ready data frame with fields including speaker name, victim name, defendant name, their genders, crime location and date, and more!

hansardr: A Software Package in R

hansardr: A Software Package in R

Easily access a cleaned version of the c19 Hansard corpus with improved speaker names in the R environment.

posextractr: A Software Package in R

posextractr: A Software Package in R

An R software package providing functions for extracting grammatical subject-verb-object (SVO) and subject-verb-adjective complement/ adjective modifier (SVA) triples from text. This linguistically improved algorithm has significantly higher precision and recall measures than existing methods.

dhmeasures: A Software Package in R

dhmeasures: A Software Package in R

An R software package providing functions for Placeholder description

Text Mining for Historical Analysis: A Book on Methods

Text Mining for Historical Analysis: A Book on Methods

This methods book provides a practical introduction to the R programming language for text mining historical records. And more than just a code cookbook, it offers a critical perspective to handling our human history. It is the companion guide to The Dangerous Art of Text Mining by Jo Guldi.