Large language models (LLMs) offer unprecedented flexibility in procedural generation, enabling the creation of dynamic video game storylines that evolve with user input. A critical aspect of realizing this potential is allowing players and developers to provide dynamic or free-form text to drive generation. Ingesting free-form text for a video game poses challenges, however, as it can prompt the LLM to generate content beyond the intended narrative scope. In response to this challenge, this research introduces Procedural Artificial Narrative using Generative AI (PANGeA) for leveraging large language models (LLMs) to create narrative content for turn-based, role-playing games (RPGs).

PANGeA is an approach comprised of components including a memory system, validation system, a Unity game engine plug-in, and a server with a RESTful interface that enables connecting PANGeA components with any game engine as well as accessing local and private LLMs. PANGeA procedurally generates level data like setting, key items, non-playable characters (NPCs)), and dialogue based on a set of configuration and design rules provided by the game designer. This process is supported by a novel validation system for handling free-form text input during game development and gameplay, which aligns LLM generation with the narrative. It does this by evoking the LLM’s intellegence to dynamically evaluate the text input against game rules that reinforce the designer’s intent. To enrich player-NPC interactions, PANGeA uses the Big Five Personality model to shape NPC responses.

To explore its broad application, PANGeA is evaluated across two studies. First, this research presents a narrative test scenario of the prototype game, Dark Shadows, which was developed using PANGeA within the Unity game engine. This is followed by an ablation study that tests PANGeA’s performance across 10 different role-playing game scenarios–from western to science fiction–and across three model sizes: Llama-3 (8B), GPT-3.5, and GPT-4. These evaluations demonstrate that PANGeA’s NPCs can hold dynamic, narrative-consistent conversations that, without the memory system, would exceed the LLM’s context length. In addition, the results demonstrate PANGeA’s validation system not only aligns LLM responses with the game narrative but also improves the performance of Llama-3 (8B), enabling it to perform comparably to large-scale foundational models like GPT-4. With the validation system, Llama-3 (8B)’s performance improved from 28% accuracy to 98%, and GPT-4’s from 71% to 99%. These findings indicate PANGeA can help game designers generate narrative-consistent content while leveraging LLMs of different sizes, suitable for various devices.

Steph Buongiorno [Corresponding Author], Jake Klinkert, Tanishq Chawla, Zixin Zhuang, and Corey Clark. “PANGeA: Procedural Artificial Narrative Using Generative AI for Turn-Based Video Games.” Proceedings of the 2024 AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), Lexington, KY, USA, 2024.