Text Mining for Historical Analysis: A Book on Methods

This book manuscript introduces text mining for historical analysis using the R programming language. The exercises in this book are based on code for Jo Guldi’s The Dangerous Art of Text Mining (2022) that have been rewritten with the purpose of offering an practical point of entry into the practice of digital history for those with no previous acquaintance with the basic methods of programming.

The intended readership of Text Mining for Historical Analysis are historians, analysts, scholars, or students interested in learning about the methods used in digital history. Many of the coding exercises include material and explanation geared towards those who have never programmed before, or who have some programming experience but are new to R and its historical application. The first two chapters offer guided approaches to the programming concepts that are built on throughout this book.

The examples in this book are mostly drawn around one dataset, commonly known as “Hansard,” which contains the official record of the debates of Britain’s House of Lords and House of Commons, 1806-1899. It is a highly interesting data set because these transcripts cover the transformations of the Industrial Revolution, the abolition of the trans-Atlantic slave trade by the British navy, the struggle for women’s rights, and many other debates of general interest to readers of modern history.