Tags

3 Posts

Artificial Intelligence

Return to top

PANGeA

PANGeA is a system that uses large language models (LLMs) to create narrative content for turn-based RPGs based on game designers' high-level criteria. It introduces a novel validation system for handling free-form text input during development and gameplay, employing "self-reflection" techniques, enabling small/local LLMs to perform comparably to foundational models. It enriches player-NPC interactions by generating personality-biased non-playable characters (NPCs). It improves AI accuracy through crowdsourcing mechanics. PANGeA houses a server with a custom memory system that provides context for LLM generation. The server's REST interface enables integration with any game engine.

Dark Shadows

Dark Shadows is a film-noir style detective thriller that acts as a test bed for proof-of-concept and prototype system components, frameworks, and models that contribute to research in AI and machine learning. The gameplay focuses on social scenarios where players provide natural language input to progress the narrative. Dark Shadows includes PANGeA’s novel validation system, which leverages self-reflection to evoke a large language model's (LLM) intelligence when evaluating and responding to user input. Narrative and artwork are procedurally generated.

Evaluating the Efficacy of LLMs to Emulate Human Personalities for Video Game Play

To improve the realism of affective Non-Player Characters (NPCs) in video games, this study investigates whether Large Language Models (LLMs) can emulate human personalities. Using the Big Five framework and over 50,000 responses from the International Personality Item Pool (IPIP), LLMs were prompted with self-assessment items corresponding to various personality profiles. Their outputs were then compared to human baseline responses to evaluate accuracy and consistency. Results showed that while some local models exhibited no alignment with human profiles, certain frontier models achieved high alignment. These findings suggest that LLMs can provide a method for designing NPCs with more realistic, personality-driven behavior.

3 Posts

Congress

Return to top

Text Mining for Historical Analysis

Text Mining for Historical Analysis offers a critical intervention into the evolving field of digital history. It introduces "computational historical thinking"-a mode of thinking that explores the epistemological entanglements between computation, theory, and historical analysis, emphasizing how computational procedures actively shape the questions we ask and the meanings we derive from data. Through sustained engagement with historical corpora—such as the 19th-century Hansard debates and contemporary U.S. Congressional Records—this book demonstrates how to attend to both structure and semantics, thus reimagining the relationship between computation and historical knowledge in the digital age.

Word Embeddings as a Key to the Study of Bias, Race, and Gender in Congress, 1880-2010

Word embeddings reveal how Congressional language around bias, race, and gender shifted from 1880 to 2010. From 1880 to 1970, “bias” was linked to personal emotion and partisanship; after 1975, it became associated with systemic issues like racism, sexism, and gerrymandering. Vector subtraction techniques show that early references to women emphasized suffrage and labor, while post-1970 discourse focused on reproduction and sexuality, with terms like “unwed,” “contraceptives,” and “clinics.” These changes reflect a broader shift toward identity-based and structural understandings of inequality in political speech.

The Congress Viewer Demo App

The Congress Viewer (years 1900 - 2000), a prototype text mining app, demonstrates the potential of tools designed to measure lexical changes, including advanced NLP techniques like parsing and analyzing grammatical relationships. This app can increase transparency in Congress while also providing new insights into the evolution and nature of political language across various contexts, including different time periods and discourse communities.

1 Post

Corpus Linguistics

Return to top

Democracy Viewer

Democracy Viewer is an open-source text mining application that enables analysts to explore and interpret humanities texts using techniques like word counts, TF-IDF, and word embeddings. It supports both distant and close reading. Analysts can upload their own datasets or work with curated collections available on the platform. Democracy Viewer also provides access to open government data, including U.S. Congressional records, making public texts more accessible for research and civic engagement.

2 Posts

Cuba

Return to top

Mapping the Elusive: Using Network Analysis to Understand Slavery, Debt Relations, and the Emergence of a Free Population of African Descent in 18th-Century Colonial Havana

Transcripts of notarial records from the Fondo Escribanías in the National Archive of Cuba, an endangered repository. The transcripts document pawnship and selling practices involving enslaved people in Havana, Cuba during the 17th and 18th centuries. These records are important for analyzing how the enslaved person functioned as a “social connector,” linking a wide range of creditors and debtors, buyers and sellers, through contracts in the colonial urban economy.

Database Escrituras Protocolos 1640 a 44 y 50 and 1730 a 1733

Transcripts of notarial records preserved from an endangered colonial archive that documents the selling and pawnship practices involving enslaved people in Havana, Cuba during the 17th and 18th centuries.

9 Posts

Digital Humanities

Return to top

Text Mining for Historical Analysis

Text Mining for Historical Analysis offers a critical intervention into the evolving field of digital history. It introduces "computational historical thinking"-a mode of thinking that explores the epistemological entanglements between computation, theory, and historical analysis, emphasizing how computational procedures actively shape the questions we ask and the meanings we derive from data. Through sustained engagement with historical corpora—such as the 19th-century Hansard debates and contemporary U.S. Congressional Records—this book demonstrates how to attend to both structure and semantics, thus reimagining the relationship between computation and historical knowledge in the digital age.

Democracy Viewer

Democracy Viewer is an open-source text mining application that enables analysts to explore and interpret humanities texts using techniques like word counts, TF-IDF, and word embeddings. It supports both distant and close reading. Analysts can upload their own datasets or work with curated collections available on the platform. Democracy Viewer also provides access to open government data, including U.S. Congressional records, making public texts more accessible for research and civic engagement.

Foundations and Applications of Humanities Analytics

Computational methods allow researchers to systematically analyze and interpret large volumes of social, political, and cultural data, uncovering underlying patterns and insights at scale. These course materials, made for the Santa Fe Institute, are designed to equip humanities researchers with computational and quantitative tools. The course aims to foster a supportive community, build practical skills, and diversify the field of humanities analytics by welcoming participants from various backgrounds and stages of their academic careers.

Word Embeddings as a Key to the Study of Bias, Race, and Gender in Congress, 1880-2010

Word embeddings reveal how Congressional language around bias, race, and gender shifted from 1880 to 2010. From 1880 to 1970, “bias” was linked to personal emotion and partisanship; after 1975, it became associated with systemic issues like racism, sexism, and gerrymandering. Vector subtraction techniques show that early references to women emphasized suffrage and labor, while post-1970 discourse focused on reproduction and sexuality, with terms like “unwed,” “contraceptives,” and “clinics.” These changes reflect a broader shift toward identity-based and structural understandings of inequality in political speech.

The Congress Viewer Demo App

The Congress Viewer (years 1900 - 2000), a prototype text mining app, demonstrates the potential of tools designed to measure lexical changes, including advanced NLP techniques like parsing and analyzing grammatical relationships. This app can increase transparency in Congress while also providing new insights into the evolution and nature of political language across various contexts, including different time periods and discourse communities.

Beyond the Black Box: Toward Transparent AI for Computational Text Analysis in the Digital Humanities

This article introduces Critical Generative Interpretation, a method that supports humanist inquiry by making AI-generated insights traceable and grounded in textual evidence. By linking large language model (LLM) outputs to structured knowledge graphs derived from source texts, the method enables scholars to critically assess where generated interpretations come from and how they relate to the original material. This methodology supports humanist inquiry through close reading. Through a case study of Harold and the Purple Crayon, the article shows how this approach fosters interpretive engagement and makes AI a method for humanistic knowledge production.

The Hansard Viewer Demo App

Use an array of data-mining and statistical approaches to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.

Course Materials: Digital History

Computational methods are changing the way that we access information about history and society. These methods help us to detect change over time, to identify influential figures, and to name turning points. What happens when we apply these tools to the entire Hansard corpus or to a million congressional debates and tweets? This work provides an introduction to the analytic methodologies transforming the humanities and social sciences via a book, under contract at Cambridge University Press, and series of Jupyter Notebooks aimed at exploring questions like these.

Syntactic Dependency Relationships and the Extraction of Grammatical Triples

This paper describes a method of triples extraction, posextract, which has been designed to meet the increasing need for high-accuracy triples outputs for the analysis of text.

1 Post

Gender

Return to top

Word Embeddings as a Key to the Study of Bias, Race, and Gender in Congress, 1880-2010

Word embeddings reveal how Congressional language around bias, race, and gender shifted from 1880 to 2010. From 1880 to 1970, “bias” was linked to personal emotion and partisanship; after 1975, it became associated with systemic issues like racism, sexism, and gerrymandering. Vector subtraction techniques show that early references to women emphasized suffrage and labor, while post-1970 discourse focused on reproduction and sexuality, with terms like “unwed,” “contraceptives,” and “clinics.” These changes reflect a broader shift toward identity-based and structural understandings of inequality in political speech.

9 Posts

Government Data

Return to top

Text Mining for Historical Analysis

Text Mining for Historical Analysis offers a critical intervention into the evolving field of digital history. It introduces "computational historical thinking"-a mode of thinking that explores the epistemological entanglements between computation, theory, and historical analysis, emphasizing how computational procedures actively shape the questions we ask and the meanings we derive from data. Through sustained engagement with historical corpora—such as the 19th-century Hansard debates and contemporary U.S. Congressional Records—this book demonstrates how to attend to both structure and semantics, thus reimagining the relationship between computation and historical knowledge in the digital age.

Democracy Viewer

Democracy Viewer is an open-source text mining application that enables analysts to explore and interpret humanities texts using techniques like word counts, TF-IDF, and word embeddings. It supports both distant and close reading. Analysts can upload their own datasets or work with curated collections available on the platform. Democracy Viewer also provides access to open government data, including U.S. Congressional records, making public texts more accessible for research and civic engagement.

Who is the Working Class, Socialist James Bond? Exploring Propaganda through Knowledge Graphs and AI-Assisted Forensic Reading in Bulgarian Spy Fiction

The Cold War between the "free world" and the Communist Bloc involved multiple fronts, including intense literary warfare. In the bloc, authors crafted characters that reflected traits and ideals promoted by those regimes. To deepen our understanding of these works and their ideological underpinnings, we employ an original method we call "AI-assisted forensic reading," using advanced natural language processing, knowledge graphs, and artificial intelligence. Our approach uncovers new knowledge in the target literature by illuminating how these authors construct meaning, disseminate propaganda, and mirror idealized traits or real-life events, likely under the influence or direction of intelligence agency leaders.

Database Escrituras Protocolos 1640 a 44 y 50 and 1730 a 1733

Transcripts of notarial records preserved from an endangered colonial archive that documents the selling and pawnship practices involving enslaved people in Havana, Cuba during the 17th and 18th centuries.

The Congress Viewer Demo App

The Congress Viewer (years 1900 - 2000), a prototype text mining app, demonstrates the potential of tools designed to measure lexical changes, including advanced NLP techniques like parsing and analyzing grammatical relationships. This app can increase transparency in Congress while also providing new insights into the evolution and nature of political language across various contexts, including different time periods and discourse communities.

The Hansard Viewer Demo App

Use an array of data-mining and statistical approaches to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.

rOpengov Universe

Open government data enables citizens to see government activities and decision-making processes, empowering them to participate more fully in the democratic process. Researchers can use open government data for studies and projects, generating insights and contributing to evidence-based policy-making. To this end, this work introduces R packages hosted by the rOpengov Universe that are designed to make analyzing contemporary and historical open government data more accessible. Queried data is returned in a clean and analysis-ready dataframe.

Course Materials: Digital History

Computational methods are changing the way that we access information about history and society. These methods help us to detect change over time, to identify influential figures, and to name turning points. What happens when we apply these tools to the entire Hansard corpus or to a million congressional debates and tweets? This work provides an introduction to the analytic methodologies transforming the humanities and social sciences via a book, under contract at Cambridge University Press, and series of Jupyter Notebooks aimed at exploring questions like these.

The 19th-Century British Hansard Corpus

ENTER

3 Posts

Hansard

Return to top

Text Mining for Historical Analysis

Text Mining for Historical Analysis offers a critical intervention into the evolving field of digital history. It introduces "computational historical thinking"-a mode of thinking that explores the epistemological entanglements between computation, theory, and historical analysis, emphasizing how computational procedures actively shape the questions we ask and the meanings we derive from data. Through sustained engagement with historical corpora—such as the 19th-century Hansard debates and contemporary U.S. Congressional Records—this book demonstrates how to attend to both structure and semantics, thus reimagining the relationship between computation and historical knowledge in the digital age.

The Hansard Viewer Demo App

Use an array of data-mining and statistical approaches to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.

The 19th-Century British Hansard Corpus

ENTER

1 Post

Human Trafficking

Return to top

GAME-KG

Knowledge graphs (KGs) can augment large language models (LLMs) while also providing an explainable set of facts that can be inspected by a human. Explainability is valuable for fields that may otherwise avoid LLMs due to hallucinations, such as human trafficking analysis. Creating KGs poses challenges, however. KGs parsed from documents may include explicit connections (those directly stated in a document) but miss implicit connections (those evident to a human, but not directly stated). This research introduces GAME-KG, an approach to modifying explicit and implicit KG connections by crowdsourcing feedback through video games.

4 Posts

Knowledge Graphs

Return to top

Mapping the Elusive: Using Network Analysis to Understand Slavery, Debt Relations, and the Emergence of a Free Population of African Descent in 18th-Century Colonial Havana

Transcripts of notarial records from the Fondo Escribanías in the National Archive of Cuba, an endangered repository. The transcripts document pawnship and selling practices involving enslaved people in Havana, Cuba during the 17th and 18th centuries. These records are important for analyzing how the enslaved person functioned as a “social connector,” linking a wide range of creditors and debtors, buyers and sellers, through contracts in the colonial urban economy.

Who is the Working Class, Socialist James Bond? Exploring Propaganda through Knowledge Graphs and AI-Assisted Forensic Reading in Bulgarian Spy Fiction

The Cold War between the "free world" and the Communist Bloc involved multiple fronts, including intense literary warfare. In the bloc, authors crafted characters that reflected traits and ideals promoted by those regimes. To deepen our understanding of these works and their ideological underpinnings, we employ an original method we call "AI-assisted forensic reading," using advanced natural language processing, knowledge graphs, and artificial intelligence. Our approach uncovers new knowledge in the target literature by illuminating how these authors construct meaning, disseminate propaganda, and mirror idealized traits or real-life events, likely under the influence or direction of intelligence agency leaders.

Beyond the Black Box: Toward Transparent AI for Computational Text Analysis in the Digital Humanities

This article introduces Critical Generative Interpretation, a method that supports humanist inquiry by making AI-generated insights traceable and grounded in textual evidence. By linking large language model (LLM) outputs to structured knowledge graphs derived from source texts, the method enables scholars to critically assess where generated interpretations come from and how they relate to the original material. This methodology supports humanist inquiry through close reading. Through a case study of Harold and the Purple Crayon, the article shows how this approach fosters interpretive engagement and makes AI a method for humanistic knowledge production.

GAME-KG

Knowledge graphs (KGs) can augment large language models (LLMs) while also providing an explainable set of facts that can be inspected by a human. Explainability is valuable for fields that may otherwise avoid LLMs due to hallucinations, such as human trafficking analysis. Creating KGs poses challenges, however. KGs parsed from documents may include explicit connections (those directly stated in a document) but miss implicit connections (those evident to a human, but not directly stated). This research introduces GAME-KG, an approach to modifying explicit and implicit KG connections by crowdsourcing feedback through video games.

7 Posts

NLP

Return to top

Foundations and Applications of Humanities Analytics

Computational methods allow researchers to systematically analyze and interpret large volumes of social, political, and cultural data, uncovering underlying patterns and insights at scale. These course materials, made for the Santa Fe Institute, are designed to equip humanities researchers with computational and quantitative tools. The course aims to foster a supportive community, build practical skills, and diversify the field of humanities analytics by welcoming participants from various backgrounds and stages of their academic careers.

The Congress Viewer Demo App

The Congress Viewer (years 1900 - 2000), a prototype text mining app, demonstrates the potential of tools designed to measure lexical changes, including advanced NLP techniques like parsing and analyzing grammatical relationships. This app can increase transparency in Congress while also providing new insights into the evolution and nature of political language across various contexts, including different time periods and discourse communities.

PANGeA

PANGeA is a system that uses large language models (LLMs) to create narrative content for turn-based RPGs based on game designers' high-level criteria. It introduces a novel validation system for handling free-form text input during development and gameplay, employing "self-reflection" techniques, enabling small/local LLMs to perform comparably to foundational models. It enriches player-NPC interactions by generating personality-biased non-playable characters (NPCs). It improves AI accuracy through crowdsourcing mechanics. PANGeA houses a server with a custom memory system that provides context for LLM generation. The server's REST interface enables integration with any game engine.

GAME-KG

Knowledge graphs (KGs) can augment large language models (LLMs) while also providing an explainable set of facts that can be inspected by a human. Explainability is valuable for fields that may otherwise avoid LLMs due to hallucinations, such as human trafficking analysis. Creating KGs poses challenges, however. KGs parsed from documents may include explicit connections (those directly stated in a document) but miss implicit connections (those evident to a human, but not directly stated). This research introduces GAME-KG, an approach to modifying explicit and implicit KG connections by crowdsourcing feedback through video games.

The Hansard Viewer Demo App

Use an array of data-mining and statistical approaches to gain new insights into the evolution and nature of political language as it occurs in different time periods and in different contexts.

Course Materials: Digital History

Computational methods are changing the way that we access information about history and society. These methods help us to detect change over time, to identify influential figures, and to name turning points. What happens when we apply these tools to the entire Hansard corpus or to a million congressional debates and tweets? This work provides an introduction to the analytic methodologies transforming the humanities and social sciences via a book, under contract at Cambridge University Press, and series of Jupyter Notebooks aimed at exploring questions like these.

Syntactic Dependency Relationships and the Extraction of Grammatical Triples

This paper describes a method of triples extraction, posextract, which has been designed to meet the increasing need for high-accuracy triples outputs for the analysis of text.

1 Post

Ontologies

Return to top

Who is the Working Class, Socialist James Bond? Exploring Propaganda through Knowledge Graphs and AI-Assisted Forensic Reading in Bulgarian Spy Fiction

The Cold War between the "free world" and the Communist Bloc involved multiple fronts, including intense literary warfare. In the bloc, authors crafted characters that reflected traits and ideals promoted by those regimes. To deepen our understanding of these works and their ideological underpinnings, we employ an original method we call "AI-assisted forensic reading," using advanced natural language processing, knowledge graphs, and artificial intelligence. Our approach uncovers new knowledge in the target literature by illuminating how these authors construct meaning, disseminate propaganda, and mirror idealized traits or real-life events, likely under the influence or direction of intelligence agency leaders.

1 Post

Russia

Return to top

Who is the Working Class, Socialist James Bond? Exploring Propaganda through Knowledge Graphs and AI-Assisted Forensic Reading in Bulgarian Spy Fiction

The Cold War between the "free world" and the Communist Bloc involved multiple fronts, including intense literary warfare. In the bloc, authors crafted characters that reflected traits and ideals promoted by those regimes. To deepen our understanding of these works and their ideological underpinnings, we employ an original method we call "AI-assisted forensic reading," using advanced natural language processing, knowledge graphs, and artificial intelligence. Our approach uncovers new knowledge in the target literature by illuminating how these authors construct meaning, disseminate propaganda, and mirror idealized traits or real-life events, likely under the influence or direction of intelligence agency leaders.

2 Posts

Return to top

Mapping the Elusive: Using Network Analysis to Understand Slavery, Debt Relations, and the Emergence of a Free Population of African Descent in 18th-Century Colonial Havana

Transcripts of notarial records from the Fondo Escribanías in the National Archive of Cuba, an endangered repository. The transcripts document pawnship and selling practices involving enslaved people in Havana, Cuba during the 17th and 18th centuries. These records are important for analyzing how the enslaved person functioned as a “social connector,” linking a wide range of creditors and debtors, buyers and sellers, through contracts in the colonial urban economy.

Who is the Working Class, Socialist James Bond? Exploring Propaganda through Knowledge Graphs and AI-Assisted Forensic Reading in Bulgarian Spy Fiction

The Cold War between the "free world" and the Communist Bloc involved multiple fronts, including intense literary warfare. In the bloc, authors crafted characters that reflected traits and ideals promoted by those regimes. To deepen our understanding of these works and their ideological underpinnings, we employ an original method we call "AI-assisted forensic reading," using advanced natural language processing, knowledge graphs, and artificial intelligence. Our approach uncovers new knowledge in the target literature by illuminating how these authors construct meaning, disseminate propaganda, and mirror idealized traits or real-life events, likely under the influence or direction of intelligence agency leaders.

1 Post

Spy Novels

Return to top

Who is the Working Class, Socialist James Bond? Exploring Propaganda through Knowledge Graphs and AI-Assisted Forensic Reading in Bulgarian Spy Fiction

The Cold War between the "free world" and the Communist Bloc involved multiple fronts, including intense literary warfare. In the bloc, authors crafted characters that reflected traits and ideals promoted by those regimes. To deepen our understanding of these works and their ideological underpinnings, we employ an original method we call "AI-assisted forensic reading," using advanced natural language processing, knowledge graphs, and artificial intelligence. Our approach uncovers new knowledge in the target literature by illuminating how these authors construct meaning, disseminate propaganda, and mirror idealized traits or real-life events, likely under the influence or direction of intelligence agency leaders.

5 Posts

Video Games

Return to top

PANGeA

PANGeA is a system that uses large language models (LLMs) to create narrative content for turn-based RPGs based on game designers' high-level criteria. It introduces a novel validation system for handling free-form text input during development and gameplay, employing "self-reflection" techniques, enabling small/local LLMs to perform comparably to foundational models. It enriches player-NPC interactions by generating personality-biased non-playable characters (NPCs). It improves AI accuracy through crowdsourcing mechanics. PANGeA houses a server with a custom memory system that provides context for LLM generation. The server's REST interface enables integration with any game engine.

GAME-KG

Knowledge graphs (KGs) can augment large language models (LLMs) while also providing an explainable set of facts that can be inspected by a human. Explainability is valuable for fields that may otherwise avoid LLMs due to hallucinations, such as human trafficking analysis. Creating KGs poses challenges, however. KGs parsed from documents may include explicit connections (those directly stated in a document) but miss implicit connections (those evident to a human, but not directly stated). This research introduces GAME-KG, an approach to modifying explicit and implicit KG connections by crowdsourcing feedback through video games.

Dark Shadows

Dark Shadows is a film-noir style detective thriller that acts as a test bed for proof-of-concept and prototype system components, frameworks, and models that contribute to research in AI and machine learning. The gameplay focuses on social scenarios where players provide natural language input to progress the narrative. Dark Shadows includes PANGeA’s novel validation system, which leverages self-reflection to evoke a large language model's (LLM) intelligence when evaluating and responding to user input. Narrative and artwork are procedurally generated.

Agent-Driven, Game-Based Learning: Personalized CS Education for Diverse Students

AI agents can personalize education by identifying students’ strengths, weaknesses, and personalities to generate content tailored to them. This work presents "personalized education agents" deployed in an educational version of Minecraft. Agents bridge concepts from lessons to "big picture" thinking by creating connections between STEM and interdisciplinary topics, such as the Language Arts. Agents translate student progression and learning outcomes to teachers for their assessment of student progress.

Evaluating the Efficacy of LLMs to Emulate Human Personalities for Video Game Play

To improve the realism of affective Non-Player Characters (NPCs) in video games, this study investigates whether Large Language Models (LLMs) can emulate human personalities. Using the Big Five framework and over 50,000 responses from the International Personality Item Pool (IPIP), LLMs were prompted with self-assessment items corresponding to various personality profiles. Their outputs were then compared to human baseline responses to evaluate accuracy and consistency. Results showed that while some local models exhibited no alignment with human profiles, certain frontier models achieved high alignment. These findings suggest that LLMs can provide a method for designing NPCs with more realistic, personality-driven behavior.

Tags

Artificial Intelligence

Congress

Corpus Linguistics

Cuba

Digital Humanities

Gender

Government Data

Hansard

Human Trafficking

Knowledge Graphs

NLP

Ontologies

Russia

Social Networks

Spy Novels

Video Games