AI Use Cases in National Archives and Records Admin.

Introduction

The National Archives and Records Administration (NARA) is at the forefront of integrating artificial intelligence into its operations to enhance the management and accessibility of historical records. Through innovative AI applications, NARA is working to improve the identification and protection of sensitive information, streamline the metadata enrichment process, and advance data discovery and classification. Additionally, efforts are underway to refine search capabilities with semantic technologies and enhance the efficiency of responding to Freedom of Information Act (FOIA) requests. The following use cases reflect NARA’s commitment to leveraging cutting-edge technology to preserve and make accessible the nation’s rich historical and cultural heritage.

Use Cases

  • 1. AI Pilot Project to Screen and Flag for Personally Identifiable Information (PII) in Digitized Archival Records

    The AI Pilot Project to Screen and Flag for Personally Identifiable Information (PII) in Digitized Archival Records is a collaborative initiative by the National Archives and Records Administration (NARA) to utilize AI tools from Amazon Web Services (AWS) and Google Cloud Platform. This pilot aims to identify and redact sensitive information, such as Social Security Numbers and Dates of Birth, in digitized archival records. The project will screen both existing records in the National Archives Catalog and those awaiting addition. The PII Detection pilot employs a weighted scoring algorithm to prioritize documents containing the most sensitive information, and there are plans to develop a user interface tool for the Legal, Business, and Security teams to conduct preliminary scans on unpublished information. Additionally, the prototype will be enhanced to include custom entity detection capabilities.

  • 2. Auto-fill of Descriptive Metadata for Archival Descriptions

    The Auto-fill of Descriptive Metadata for Archival Descriptions project aims to streamline the process of filling out descriptive metadata for records released to the public by the National Archives and Records Administration (NARA). As millions of pages of records are made available through the National Archives Catalog, many of these records currently have minimal descriptive metadata due to the intensive manual effort required. This project will utilize the content of the documents and existing metadata from the records management system to automatically predict and populate the necessary descriptive fields, enhancing the searchability and accessibility of archival records.

  • 3. Automated Data Discovery and Classification Pilot

    The Automated Data Discovery and Classification Pilot is a planned initiative by NARA to explore the use of AI and machine learning (ML) for automated data discovery and classification. This pilot will utilize public and mock-up datasets to test both supervised and unsupervised AI/ML techniques. The project aims to implement a commercial off-the-shelf (COTS) solution that enables users to search for and discover complete documents, rather than just individual sensitive data elements like Social Security Numbers or credit card numbers. This approach allows for comprehensive document discovery, enabling users to locate various types of documents, such as RFPs, purchase orders, and financial statements. If NARA identifies document types not recognized by the vendor’s solution, a learning set of examples will be created to train the algorithm for improved classification.

  • 4. Semantic Search for National Archives Catalog - an Artificial Intelligence (AI) / Machine Learning (ML) Pilot Program

    The Semantic Search for National Archives Catalog project is an AI and machine learning pilot program aimed at improving access to the vast collection of records maintained by the National Archives and Records Administration (NARA). Given the millions of records available, finding specific documents can be challenging and time-consuming. This project introduces semantic search capabilities, allowing users to conduct searches using natural language queries. Unlike traditional keyword searches, semantic search understands user intent and the contextual meaning behind search terms, leading to more accurate and relevant search results. This enhancement will enable researchers and historians to locate the records they need more efficiently and may also reveal relationships between different records, providing deeper insights into historical events and processes

  • 5. Freedom of Information Act (FOIA) Discovery AI Pilot

    The Freedom of Information Act (FOIA) Discovery AI Pilot aims to enhance the National Archives and Records Administration’s (NARA) ability to respond to FOIA requests using advanced AI techniques. The AI system will implement a natural language processing (NLP) search method that assesses content similarity between user queries and archival records. Additionally, the system will automate the redaction process, ensuring that personal information and other sensitive data are appropriately removed based on the specifics of each FOIA request. This dual approach aims to streamline the FOIA response process while maintaining compliance with privacy regulations.

Conclusion

The National Archives and Records Administration (NARA) is harnessing the power of artificial intelligence to transform its archival processes and improve public access to historical records. By implementing AI-driven solutions, NARA aims to enhance the accuracy and efficiency of sensitive information redaction, automate metadata enrichment, and facilitate comprehensive data discovery and classification. Additionally, advancements in semantic search and FOIA request processing are set to revolutionize how users interact with the National Archives Catalog and ensure compliance with privacy regulations.

Discuss a Use Case

Fill in your details & we will get back to you shortly.