Expanding Your Vocabulary: Topic Integration Using the Segments-as-Topics (SAT) Approach

Topic integration
Vocabularies
Constitution

Roy Gardner, Matthew Martin, Andrés Cruz, Zachary Elkins, and Ashley Moran. “Expanding Your Vocabulary: Topic Integration Using the Segments-as-Topics (SAT) Approach.”

Authors
Affiliations

Roy Gardner

Comparative Constitutions Project, University of Texas at Austin

Department of Government, University of Texas at Austin

Department of Government, University of Texas at Austin

Department of Government, University of Texas at Austin

Department of Government, University of Texas at Austin

Published

September 2024

Abstract

Topic discovery and integration are essential to maintain vocabularies—the set of concepts underlying a textual corpus. We present a three-stage methodology combining automation and human expertise to assess candidate topics, which we call the segments-as-topic (SAT) approach. To develop the methodology, we use a vocabulary created by the Comparative Constitutions Project (CCP) that tracks more than 330 topics in a corpus of national constitutions. In the (1) SAT generation stage, we formulate topics that are distinct from existing topics, then use a sentence-level semantic similarity model to search for constitution sections (segments) that are similar in meaning to each topic. Domain experts collaborate on the formulation of the topic text until a formulation is identified that produces a set of search results that match the intent of the topic. Once a sufficient number of constitution sections have been matched, the (2) topic expansion stage of the methodology uses the sections themselves to find additional semantically similar sections. These sections are assessed and are either added to the section set or rejected. The process is repeated until no further new sections are found at which point the section set constitutes the definitive set of sections for the topic. Finally, in the (3) validation stage, a panel of scholars decides whether to accept the topic into the CCP vocabulary, after which matching constitution sections are automatically tagged with the topic. Several new topics have been added to the CCP vocabulary with these methods, some of which we present here to illustrate our process and results. The methodology provides researchers with a systematic way to expand existing vocabularies.