AI Use Cases in Health and Human Services

Introduction

In the rapidly evolving landscape of scientific research and regulatory oversight, advanced technological solutions play a pivotal role in enhancing efficiency, accuracy, and insight. This collection of use cases highlights the diverse applications of artificial intelligence (AI), machine learning (ML), and natural language processing (NLP) in various sectors, including food safety, drug regulation, and scientific research. From the development of predictive models for antimicrobial resistance to the creation of intelligent knowledge discovery platforms, these initiatives represent a significant leap towards more effective and informed decision-making. By integrating cutting-edge technologies, these projects aim to address complex challenges, streamline processes, and improve outcomes across multiple domains.

Use Cases

  • 1. Information Gateway OneReach Application

    The Information Gateway hotline, powered by OneReach AI, connects callers to a phone IVR system that provides access to state hotlines for reporting child abuse and neglect, tailored to the caller’s area code. In addition to this service, OneReach offers a FAQ texting service that employs natural language processing to address user inquiries. The data collected from user queries is utilized for reinforcement training by human AI trainers, contributing to the continuous development of additional FAQs and improving the overall effectiveness of the service.

  • 2. AHRQ Search

    The AHRQ Search initiative aims to enhance the organization’s search capabilities by incorporating features such as relevancy tailoring, auto-generated synonyms, and automated suggestions. This comprehensive search tool also provides suggested related content and auto-tagging functionalities, along with a “Did you mean” feature to assist users in finding specific information more efficiently. By improving the search experience, the project seeks to facilitate better access to relevant content for users across the organization.

  • 3. Chatbot

    This project involves the development of a chatbot that serves as an interactive interface for users to ask questions about AHRQ content. By enabling conversational inquiries, the chatbot aims to replace the traditional public inquiry telephone line, making it easier and more efficient for users to access information and support.

  • 4. ReDIRECT

    The ReDIRECT project leverages artificial intelligence to discover candidates for drug repurposing. This initiative focuses on evaluating existing pharmaceuticals to find new therapeutic applications, thereby potentially speeding up the availability of effective treatments for various health conditions.

  • 5. Burn & Blast MCMs: Rivanna

    This project employs AI-based algorithms integrated with the Accuro XV system to detect and highlight fractures and soft tissue injuries. By enhancing diagnostic capabilities, this technology aims to improve the accuracy and speed of injury assessments, ultimately benefiting patient care in emergency and clinical settings.

  • 6. Burn & Blast MCMs: Philips

    This initiative utilizes AI-based algorithms within the Lumify handheld ultrasound system to identify lung injuries and infectious diseases. By providing rapid and accurate diagnostics, this technology aims to enhance clinical decision-making and improve patient outcomes in critical care situations.

  • 7. Burn & Blast MCMs: SpectralMD

    This project aims to accurately determine the depth severity and size of burn injuries using advanced imaging techniques. By providing precise assessments, this initiative seeks to enhance treatment planning and improve outcomes for patients suffering from burn injuries.

  • 8. Current Health

    This project involves a continuous monitoring platform equipped with AI algorithms designed to assess the severity of COVID-19 in patients. By providing real-time data and insights, this initiative aims to improve patient management and facilitate timely interventions in healthcare settings.

  • 9. Digital MCM: Visual Dx

    The Digital MCM: Visual Dx project utilizes smartphone imaging technology combined with artificial intelligence to detect the presence of mPox. This innovative approach aims to enhance accessibility and speed in diagnosing mPox, allowing for timely interventions and better public health responses.

  • 10. Host-Based Diagnostics: Patchd

    The Host-Based Diagnostics: Patchd initiative involves a wearable device integrated with an AI model designed to predict the onset of sepsis in patients at home. By enabling early detection of this critical condition, the project aims to improve patient outcomes and facilitate timely medical interventions, potentially saving lives.

  • 11. Data Modernization

    The Data Modernization project focuses on creating an open data management architecture that enhances business intelligence (BI) and machine learning (ML) capabilities across all ASPR data. This initiative aims to streamline data access and analysis, ultimately improving decision-making processes and operational efficiency within health services.

  • 12. Cyber Threat Detection/ Predictive analytics

    This project employs artificial intelligence and machine learning tools to process vast amounts of threat data, enhancing the ability to detect and respond to cyber threats. By leveraging advanced analytics, the initiative aims to improve cybersecurity measures and protect sensitive health information from potential breaches.

  • 13. emPOWER

    The emPOWER initiative harnesses artificial intelligence to quickly develop tools and programs aimed at identifying and supporting populations at risk during the COVID-19 pandemic. By focusing on at-risk groups, this project seeks to enhance public health responses and ensure that vulnerable communities receive the necessary resources and support.

  • 14. Community Access to Testing

    The Community Access to Testing project employs multiple machine learning models to predict surges in COVID-19 cases within communities. By accurately forecasting these trends, the initiative aims to improve testing accessibility and resource allocation, ultimately enhancing public health preparedness and response efforts.

  • 15. Modeling & Simulation

    This project focuses on developing modeling tools and conducting analyses to prepare for biothreat events. By refining these models during emergent situations, the initiative aims to enhance situational awareness and improve response strategies, ensuring that health services are better equipped to handle potential threats.

  • 16. Ventilator Medication Model

    The Ventilator Medication Model utilizes a generalized additive model to project the rate of COVID-19 patients requiring ventilation. By providing accurate forecasts, this project aims to assist healthcare providers in resource planning and management, ensuring that adequate ventilatory support is available for patients in need.

  • 17. Product redistribution optimization

    This initiative employs artificial intelligence and modeling techniques to optimize the redistribution of medical products among partners, including jurisdictions, pharmacies, and federal entities. By considering factors such as distance, ordering patterns, and equity, the project aims to enhance the efficiency of product distribution, ensuring that resources are allocated effectively where they are most needed.

  • 18. Highly Infectious Patient Movement optimization

    This project focuses on optimizing the movement of highly infectious patients using a limited number of transport containers. By analyzing factors such as distance and population density, the initiative aims to enhance planning and decision-making processes, ensuring that patient transport is conducted safely and efficiently during health emergencies.

  • 19. TowerScout: Automated cooling tower detection from aerial imagery for Legionnaires' Disease outbreak investigation

    The TowerScout project utilizes aerial imagery combined with advanced object detection and image classification models to identify cooling towers. These structures are known potential sources of Legionnaires’ Disease outbreaks in communities. By automating the detection process, TowerScout aims to enhance public health investigations and facilitate timely interventions during outbreaks.

  • 20. HaMLET: Harnessing Machine Learning to Eliminate Tuberculosis

    The HaMLET initiative employs computer vision models to analyze chest x-rays for the detection of tuberculosis (TB). This technology aims to enhance the quality of health screenings conducted overseas for immigrants and refugees seeking entry into the United States, thereby improving early detection and treatment of TB in vulnerable populations.

  • 21. Zero-shot learning to identify menstrual irregularities reported after COVID-19 vaccination

    This project applies zero-shot learning techniques to identify and classify reports of menstrual irregularities that have been associated with COVID-19 vaccinations. By leveraging this advanced machine learning approach, the initiative aims to enhance the understanding of vaccine side effects and improve public health monitoring related to vaccination outcomes.

  • 22. Validation Study of Deep Learning Algorithms to Explore the Potential Use of Artificial Intelligence for Public Health Surveillance of Eye Diseases

    This validation study focuses on the application of deep learning algorithms to detect diabetic retinopathy in retinal photos collected through the National Health and Nutrition Examination Survey (NHANES). The goal is to assess whether these AI algorithms can effectively replace traditional ophthalmologist grading, potentially streamlining public health surveillance of eye diseases and improving early detection efforts.

  • 23. Automating extraction of sidewalk networks from street-level images

    This project involves a team of scientists developing a computer vision model to automate the extraction of sidewalk networks from street-level images sourced from Mapillary. By accurately identifying the presence of sidewalks, this initiative aims to support urban planning and public health efforts related to physical activity and mobility in communities.

  • 24. Identify walking and bicycling trips in location-based data, including global-positioning system data from smartphone applications

    This initiative focuses on developing machine learning techniques to analyze GPS-based data from smartphone applications to identify walking and bicycling trips. By utilizing commercially available location data, the project aims to produce geocoded data tables, GIS layers, and maps that can inform public health strategies and promote active transportation in communities.

  • 25. Identify infrastructure supports for physical activity (e.g., sidewalks) in satellite and roadway images

    This project aims to develop machine learning techniques to identify infrastructure that supports physical activity, such as sidewalks and bicycle lanes, in both satellite and roadway images. By analyzing image-based data, the initiative seeks to generate geocoded data tables, maps, and summary reports that can aid in urban planning and public health initiatives promoting active lifestyles.

  • 26. Identifying state and local policy provisions that promote or inhibit creating healthy built environments

    This initiative focuses on using natural language processing and machine learning techniques to analyze state and local policy provisions that either promote or inhibit the creation of healthy built environments. By processing various types of policy texts, the project aims to produce datasets that quantify relevant aspects of these policies. As of April 2023, the Division of Nutrition, Physical Activity, and Obesity (DNAPO) is collaborating with contractors to explore the effectiveness of these methods compared to traditional approaches, while also identifying related efforts within the CDC and academic institutions.

  • 27. Use of Natural Language Processing for Topic Modeling to Automate Review of Public Comments to Notice of Proposed Rulemaking

    This project involves the development of a Natural Language Processing (NLP) tool designed for topic modeling to automate the review of public comments submitted in response to notices of proposed rulemaking. By clustering these comments based on their content, the tool aims to enhance the efficiency of the review process, allowing for more effective analysis and incorporation of public feedback into regulatory decision-making.

  • 28. Sequential Coverage Algorithm (SCA) and partial Expectation-Maximization (EM) estimation in Record Linkage

    The Sequential Coverage Algorithm (SCA) and partial Expectation-Maximization (EM) estimation are advanced machine learning techniques implemented by the CDC’s National Center for Health Statistics (NCHS) to enhance data linkage processes. The SCA, a supervised algorithm, helps create effective joining methods for large datasets, while the unsupervised EM estimation estimates the proportion of matching pairs within these groups. Together, these methods significantly improve the accuracy and efficiency of linking health data, facilitating better public health analysis and decision-making.

  • 29. Coding cause of death information on death certificates to ICD-10

    This project involves the use of MedCoder to assign ICD-10 codes to the cause of death information recorded on death certificates. By translating the literal text descriptions provided by certifiers into standardized codes, this initiative ensures accurate classification of both underlying and contributing causes of death, which is essential for public health reporting and analysis.

  • 30. Detecting Stimulant and Opioid Misuse and Illicit Use

    This initiative focuses on analyzing clinical notes to identify instances of illicit use and misuse of stimulant and opioid medications. By leveraging advanced data analysis techniques, the project aims to enhance monitoring and intervention strategies for substance misuse, ultimately contributing to improved public health outcomes.

  • 31. AI/ML Model Release Standards

    The AI/ML Model Release Standards project at NCHS aims to establish comprehensive guidelines for the release of artificial intelligence and machine learning models used within the Center. These standards are intended to ensure consistency and quality across AI/ML projects and may serve as a foundational framework for developing broader standards throughout the CDC, promoting best practices in AI/ML development and deployment.

  • 32. Named Entity Recognition for Opioid Use in Free Text Clinical Notes from Electronic Health Records

    This initiative involves the development of a Named Entity Recognition (NER) model using natural language processing (NLP) techniques to analyze electronic health records from the National Hospital Care Survey. The model aims to accurately detect assertions or negations of opioid use within clinical notes, thereby enhancing the understanding of opioid prescribing patterns and misuse in healthcare settings.

  • 33. Nowcasting Suicide Trends

    The Nowcasting Suicide Trends project focuses on creating an interactive dashboard that integrates various traditional and non-traditional datasets to provide real-time insights into national suicide death trends. By employing a multi-stage machine learning pipeline, this initiative aims to deliver timely and actionable data that can inform public health strategies and interventions aimed at reducing suicide rates.

  • 34. Feedback Analysis Solution (FAS)

    The Feedback Analysis Solution (FAS) is designed to enhance the review of public comments and other relevant information from stakeholders by utilizing data from CMS and publicly available sources like Regulations.Gov. By employing Natural Language Processing (NLP) tools, FAS efficiently aggregates, sorts, and identifies duplicate comments, streamlining the review process. Additionally, machine learning (ML) techniques are applied to extract key topics, themes, and sentiment from the dataset, providing valuable insights for decision-making.

  • 35. Predictive Intelligence - Incident Assignment for Quality Service Center (QSC)

    The Predictive Intelligence (PI) system is implemented within the Quality Service Center (QSC) to optimize incident assignment. By analyzing short descriptions provided by users through the ServiceNow Service Portal, the system identifies keywords that match previously submitted incidents, allowing for efficient routing of tickets to the appropriate assignment groups. This solution is regularly updated and re-trained with incident data every 3-6 months to ensure its effectiveness and adaptability to changing needs.

  • 36. Fraud Prevention System Alert Summary Report Priority Score

    The Fraud Prevention System Alert Summary Report Priority Score model is being developed to analyze Medicare administrative and claims data, along with fraud alert and investigation information. Its primary goal is to predict the likelihood that an investigation will result in an administrative action, thereby assisting CMS in prioritizing their investigative resources effectively. As the model is still under development, the final specifications and methodologies are yet to be finalThe Center for Program Integrity (CPI) has developed several fraud prevention models, such as DMEMBITheftML and HHAProviderML, which utilize Medicare administrative and claims data to detect potential cases of fraud, waste, and abuse. By employing random forest techniques, these models generate alerts for investigators, highlighting potential fraud schemes and the providers involved, thereby enhancing the effectiveness of fraud detection efforts.
    ized.

  • 37. Center for Program Integrity (CPI) Fraud Prevention System Models (e.g. DMEMBITheftML, HHAProviderML)

    The Center for Program Integrity (CPI) has developed several fraud prevention models, such as DMEMBITheftML and HHAProviderML, which utilize Medicare administrative and claims data to detect potential cases of fraud, waste, and abuse. By employing random forest techniques, these models generate alerts for investigators, highlighting potential fraud schemes and the providers involved, thereby enhancing the effectiveness of fraud detection efforts.

  • 38. Priority Score Model - ranks providers within the Fraud Prevention System using logistic regression based on program integrity guidelines.

    The Priority Score Model is designed to rank healthcare providers within the Fraud Prevention System (FPS) based on program integrity guidelines. By utilizing inputs such as Medicare claims data, Targeted Probe and Educate (TPE) data, and jurisdiction information, the model applies logistic regression techniques to generate rankings that help identify providers who may require further scrutiny or intervention.

  • 39. Priority Score Timeliness - forecast the time needed to work on an alert produced by Fraud Prevention System (Random Forest, Decision Tree, Gradient Boost, Generalized Linear Regression)

    The Priority Score Timeliness project focuses on forecasting the time required to address alerts generated by the Fraud Prevention System (FPS). By analyzing inputs such as Medicare claims data, TPE data, and jurisdiction information, the project employs various machine learning techniques, including Random Forest, Decision Tree, Gradient Boosting, and Generalized Linear Regression, to provide accurate time estimates for alert resolution, thereby improving resource allocation and efficiency in fraud investigations.

  • 40. CCIIO Enrollment Resolution and Reconciliation System (CERRS)

    The CCIIO Enrollment Resolution and Reconciliation System (CERRS) utilizes artificial intelligence for classification purposes. This system aims to streamline the enrollment resolution process by effectively categorizing data, thereby enhancing the efficiency and accuracy of enrollment management within the Center for Consumer Information and Insurance Oversight (CCIIO).

  • 41. The CCIIO Enrollment Resolution and Reconciliation System (CERRS) utilizes artificial intelligence for classification purposes. This system aims to streamline the enrollment resolution process by effectively categorizing data, thereby enhancing the efficiency and accuracy of enrollment management within the Center for Consumer Information and Insurance Oversight (CCIIO).

    The CCIIO Enrollment Resolution and Reconciliation System (CERRS) utilizes artificial intelligence for classification purposes. This system aims to streamline the enrollment resolution process by effectively categorizing data, thereby enhancing the efficiency and accuracy of enrollment management within the Center for Consumer Information and Insurance Oversight (CCIIO).

  • 42. CMS Connect (CCN)

    The CMS Connect (CCN) project leverages artificial intelligence to enhance global search capabilities within the CMS framework. By improving search functionalities, this initiative aims to facilitate easier access to information and resources across the CMS network, ultimately supporting better decision-making and operational efficiency.

  • 43. CMS Enterprise Portal Services (CMS Enterprise Portal-Chatbot)

    The CMS Enterprise Portal Services project focuses on developing an AI-powered chatbot aimed at enhancing process efficiency within the CMS Enterprise Portal. This chatbot is designed to provide quick and accurate responses to user inquiries, streamlining workflows and improving knowledge management. By facilitating easier access to critical information and resources, the chatbot enhances the overall user experience for staff and stakeholders, ultimately supporting better decision-making and operational effectiveness within the organization.

  • 44. Federally Facilitated Marketplaces (FFM)

    The Federally Facilitated Marketplaces (FFM) project utilizes artificial intelligence to enhance anomaly detection, correction, classification, and forecasting within the marketplace data. By applying advanced algorithms to time series data, this initiative aims to identify irregular patterns and trends, enabling more accurate predictions and timely interventions to improve marketplace operations and decision-making.

  • 45. Marketplace Learning Management System (MLMS)

    The Marketplace Learning Management System (MLMS) project employs artificial intelligence to facilitate language interpretation and translation services. This initiative aims to improve accessibility and understanding of marketplace information for diverse populations, ensuring that language barriers do not hinder individuals from accessing essential resources and support.

  • 46. Medicaid And CHIP Financial (MACFin) Anomaly Detection Model for DSH Audit

    The Medicaid and CHIP Financial (MACFin) team has developed a machine learning model specifically designed to detect anomalies within Disproportionate Share Hospital (DSH) audit data. This model identifies the top 1-5% of outliers based on extreme behaviors in the data, such as unusual amounts or characteristics, facilitating targeted investigations into potential gaps and barriers. By flagging these anomalies, the model helps minimize overpayments and underpayments, ensuring more accurate financial distributions and supporting effective auditing processes.

  • 47. Medicaid And CHIP Financial (MACFin) DSH Payment Forecasting model)

    The MACFin team has developed a forecasting model to predict future Disproportionate Share Hospital (DSH) payments for the upcoming year, utilizing historical data and trends from the past 1-3 years. By training multiple models, including time series and machine learning approaches, the team identified the most effective model based on average mean error in predicting DSH payment amounts across hospitals. Given the disorganized nature of DSH data, significant effort was invested in cleaning and consolidating over six years of data from all states. This predictive capability not only aids in early planning and trend analysis but can also be adapted to forecast other DSH-related metrics, such as payment-to-uncompensated ratios and instances of underpayment or overpayment.

  • 48. Performance Metrics Database and Analytics (PMDA)

    The Performance Metrics Database and Analytics (PMDA) project leverages artificial intelligence for various functions, including anomaly detection and correction, language interpretation and translation, and knowledge management. By utilizing AI technologies, this initiative aims to enhance the accuracy and efficiency of performance metrics analysis, improve communication across language barriers, and streamline the management of knowledge resources within the organization.

  • 49. Relationships, Events, Contacts, and Outreach Network (RECON)

    The Relationships, Events, Contacts, and Outreach Network (RECON) project employs artificial intelligence to develop a recommender system and conduct sentiment analysis. This initiative aims to enhance the understanding of stakeholder relationships and interactions by providing personalized recommendations and insights based on sentiment analysis, ultimately improving outreach efforts and engagement strategies.

  • 50. Risk Adjustment Payment Integrity Determination System (RAPIDS)

    The Risk Adjustment Payment Integrity Determination System (RAPIDS) utilizes artificial intelligence for classification purposes and to enhance process efficiency. By applying AI techniques, this system aims to improve the accuracy of risk adjustment payments and streamline the overall determination process, ensuring that payments are aligned with the appropriate risk levels and enhancing the integrity of the payment system.

  • 51. Drug Cost Increase Predictions

    This project focuses on analyzing historical drug cost increases to predict future trends in drug pricing. By leveraging past data, the initiative aims to provide insights into potential future cost escalations, enabling better financial planning and decision-making for healthcare providers and policymakers.

  • 52. This project focuses on analyzing historical drug cost increases to predict future trends in drug pricing. By leveraging past data, the initiative aims to provide insights into potential future cost escalations, enabling better financial planning and decision-making for healthcare providers and policymakers.

    This initiative involves analyzing the market share of generic drugs in comparison to brand-name drugs over time, utilizing data from Part D claims volume. By forecasting future market shares, the project aims to provide valuable insights into trends in drug utilization, helping stakeholders make informed decisions regarding drug pricing and availability in the marketplace. This initiative involves analyzing the market share of generic drugs in comparison to brand-name drugs over time, utilizing data from Part D claims volume. By forecasting future market shares, the project aims to provide valuable insights into trends in drug utilization, helping stakeholders make informed decisions regarding drug pricing and availability in the marketplace.

  • 53. Drug cost anomaly detection

    This project focuses on detecting anomalies in drug costs associated with Part D claims. By identifying unusual pricing patterns or discrepancies, the initiative aims to enhance oversight and ensure that drug pricing remains fair and consistent, ultimately supporting the integrity of the healthcare system.

  • 54. Artificial Intelligence (AI) Explorers Program Pilot - Automated Technical Profile

    The Artificial Intelligence (AI) Explorers Program Pilot for Automated Technical Profile is a 90-day initiative aimed at researching and developing a machine-readable profile for CMS systems. This project seeks to create a “technology fingerprint” for CMS projects by analyzing various data sources throughout different stages of their development lifecycle, ultimately enhancing the understanding and management of technology applications within CMS.

  • 55. Artificial Intelligence (AI) Explorers Program Pilot - Section 508 Accessibility Testing

    The AI Explorers Program Pilot for Section 508 Accessibility Testing is a 90-day project designed to assist CMS technical leads and Application Development Organizations (ADOs) in conducting thorough analyses of test result data. This initiative supports the CMS Section 508 Program, which ensures that electronic and information technology is accessible to people with disabilities, thereby promoting inclusivity and compliance with accessibility standards.

  • 56. Process Large Amount of Submitted Docket Comments

    This project aims to automate the processing of large volumes of submitted docket comments by utilizing artificial intelligence and machine learning techniques. The system will facilitate the transfer, deduplication, summarization, and clustering of comments, thereby streamlining the review process and enhancing the efficiency of stakeholder engagement and feedback analysis.

  • 57. To develop novel approaches to expand and/or modify the vaccine AESI phenotypes in order to further improve adverse event detection

    This initiative focuses on enhancing the detection of adverse events of special interest (AESI) related to vaccines by developing an appropriate machine learning model. By utilizing clinical-oriented language models pre-trained on clinical documents from UCSF, the project aims to refine the identification of AESI phenotypes, ultimately improving the monitoring and safety assessment of vaccines.

  • 58. BEST Platform improves post-market surveillance efforts through the semi-automated detection, validation, and reporting of adverse events.

    The BEST Platform is designed to enhance post-market surveillance of biologics by employing a range of applications and techniques for the semi-automated detection, validation, and reporting of adverse events. By utilizing machine learning (ML) and natural language processing (NLP) technologies, the platform effectively identifies potential adverse events from electronic health records (EHRs) and extracts critical features for clinician validation, thereby improving patient safety and regulatory oversight.

  • 59. Development of Machine Learning Approaches to Population Pharmacokinetic Model Selection and Evaluation of Application to Model-Based Bioequivalence Analysis

    This project focuses on developing advanced machine learning approaches for selecting population pharmacokinetic models, which are essential for understanding drug behavior in different populations. The initiative includes the creation of a deep learning and reinforcement learning framework for model selection, as well as the implementation of a genetic algorithm approach in Python. These methodologies aim to enhance the accuracy and efficiency of model-based bioequivalence analysis, ultimately supporting better drug development and regulatory decisions.

  • 60. Machine-Learning based Heterogeneous Treatment Effect Models for Prioritizing Product-Specific Guidance Development

    This project aims to develop and implement a novel machine learning algorithm designed to estimate heterogeneous treatment effects, which will help prioritize the development of product-specific guidance (PSG). The initiative involves three key tasks: first, addressing the challenge of confounding variables in observational data by utilizing a variational autoencoder to simultaneously estimate hidden confounders and treatment effects; second, evaluating the model on synthetic datasets and established benchmarks to assess its interpretability; and third, validating the model with real-world PSG data in collaboration with the FDA team. The project will utilize publicly available datasets, such as the Orange Book and FDA PSGs, as well as internal data, to ensure comprehensive validation and applicability of the model.

  • 61. Developing Tools based on Text Analysis and Machine Learning to Enhance PSG Review Efficiency

    This initiative focuses on enhancing the efficiency of product-specific guidance (PSG) reviews through the development of advanced tools based on text analysis and machine learning. The project includes creating a novel neural summarization model that integrates an information retrieval system, utilizing dual attention mechanisms for both sentence-level and word-level outputs. The new model will be evaluated using PSG data and the large CNN/Daily Mail dataset to ensure its effectiveness. Additionally, an open-source software package will be developed to facilitate the implementation of the text summarization model and the information retrieval system, promoting accessibility and collaboration in PSG review processes.

  • 62. BEAM (Bioequivalence Assessment Mate) - a Data/Text Analytics Tool to Enhance Quality and Efficiency of Bioequivalence Assessment

    The BEAM (Bioequivalence Assessment Mate) project aims to create a data and text analytics tool designed to enhance the quality and efficiency of bioequivalence assessments. By utilizing verified data analytics packages, text mining techniques, and artificial intelligence (AI) toolsets, including machine learning (ML), the initiative seeks to streamline the labor-intensive processes involved in bioequivalence evaluations, ultimately facilitating more efficient and high-quality regulatory assessments.

  • 63. Application of Statistical Modeling and Natural Language Processing for Adverse Event Analysis

    This project focuses on developing new tools and methods for monitoring drug-induced adverse events (AEs) to enhance early signal detection and safety assessment of marketed drugs. By employing natural language processing (NLP) and data mining (DM) techniques, the initiative aims to extract relevant information from approved drug labeling for statistical modeling. This analysis will help determine when specific AEs are typically labeled (either pre- or post-market) and identify detection patterns, including predictive factors, within the first three years of a drug’s marketing. The project seeks to improve understanding of the timing and early detection of AEs, facilitating targeted monitoring of novel drugs. Funding will also support an ORISE fellow to contribute to this research.

  • 64. Centers of Excellence in Regulatory Science and Innovation (CERSI) project - Leveraging AI for improving remote interactions.

    The Centers of Excellence in Regulatory Science and Innovation (CERSI) project focuses on leveraging artificial intelligence to enhance remote interactions in four key areas identified by the FDA: transcription, translation, document and evidence management, and collaborative workspaces. The project utilizes advanced automatic speech recognition technology, specifically a transformer-based sequence-to-sequence (seq2seq) model, which is trained to generate accurate transcripts. Given the challenges of using pre-trained models that may not accommodate various accents or specialized terminology, researchers will manually transcribe a selection of video/audio materials to fine-tune the model for better performance in the regulatory context. Additionally, the project aims to develop a comprehensive system for managing documents and evidence, incorporating a document classifier, a video/audio classifier, and an interactive middleware to facilitate seamless access and sharing of documents among participants.

  • 65. Opioid Data Warehouse Term Identification and Novel Synthetic Opioid Detection and Evaluation Analytics

    This project focuses on identifying novel synthetic opioids (NSOs) by analyzing publicly available social media and forensic chemistry data. By utilizing the FastText library, the initiative creates vector models for known NSO-related terms within a large corpus of social media text. The system provides users with similarity scores and expected prevalence estimates for various terms, thereby enhancing future data collection efforts and improving the understanding of emerging drug products in social media discourse.

  • 66. Artificial Intelligence-based Deduplication Algorithm for Classification of Duplicate Reports in the FDA Adverse Event Reports (FAERS)

    This initiative involves the development of an artificial intelligence-based deduplication algorithm designed to identify duplicate individual case safety reports (ICSRs) within the FDA Adverse Event Reporting System (FAERS). By processing unstructured data from free-text narratives using natural language processing (NLP), the algorithm extracts relevant clinical features. It employs a probabilistic record linkage approach that combines both structured and unstructured data to effectively identify duplicates. This optimization allows for comprehensive processing of the entire FAERS database, facilitating enhanced data mining and analysis of adverse event reports.

  • 67. Information Visualization Platform (InfoViP) to support analysis of individual case safety reports

    The Information Visualization Platform (InfoViP) is designed to enhance post-market safety surveillance by improving the review and evaluation process of Individual Case Safety Reports (ICSRs). By incorporating artificial intelligence and advanced visualization techniques, InfoViP facilitates the detection of duplicate ICSRs, generates temporal data visualizations, and classifies ICSRs for better usability. This platform aims to increase the efficiency and scientific rigor of safety assessments, ultimately supporting more effective monitoring of drug safety.

  • 68. Using Unsupervised Learning to Generate Code Mapping Algorithms to Harmonize Data Across Data Systems

    This project aims to explore the use of unsupervised learning techniques to develop code mapping algorithms that can harmonize data across different healthcare systems within the Sentinel framework. By employing data-driven statistical methods, the initiative seeks to identify and reduce coding discrepancies, facilitating the transfer of knowledge and best practices between sites. The ultimate goal is to create scalable and automated solutions for harmonizing electronic health records (EHR) data, improving interoperability and data consistency across systems.

  • 69. Augmenting date and cause of death ascertainment in observational data sources

    This project focuses on enhancing the ascertainment of date and cause of death through the development of algorithms that probabilistically link alternative data sources with electronic health records (EHRs). By creating generalizable approaches to improve mortality assessment, the initiative aims to enhance the validity of Sentinel investigations that utilize mortality as an endpoint. The project outlines two specific aims: first, to leverage publicly available online data to determine the date of death for patients from two healthcare systems; and second, to augment cause of death data by analyzing healthcare system narrative text and administrative codes to generate probabilistic estimates for common causes of death.

  • 70. Scalable automated NLP-assisted chart abstraction and feature extraction tool

    This study aims to develop a scalable, automated tool that utilizes natural language processing (NLP) to assist in chart abstraction and feature extraction from electronic medical records (EMRs). By leveraging claims and EHR data—encompassing structured, semi-structured, and unstructured formats—the project seeks to demonstrate the usability and value of these data sources in a pharmacoepidemiology context. The study will utilize real-world longitudinal data from Cerner Enviza EHRs linked to claims, applying NLP techniques to identify and contextualize pre-exposure confounding variables, integrate unstructured EHR data for confounding adjustment, and ascertain outcomes. A specific use case will investigate the relationship between montelukast use in asthma patients and neuropsychiatric events.

  • 71. MASTER PLAN Y4

    The MASTER PLAN Y4 outlines the mission of the Innovation Center to integrate longitudinal patient-level electronic health record (EHR) data into the Sentinel System. This integration aims to facilitate in-depth investigations of medication outcomes using more comprehensive clinical data than what is typically available through insurance claims. The Master Plan presents a five-year roadmap for achieving this vision, focusing on four strategic areas: (1) enhancing data infrastructure; (2) advancing feature engineering; (3) improving causal inference methodologies; and (4) developing detection analytics. The initiative emphasizes the use of emerging technologies, including natural language processing, advanced analytics, and data interoperability, to enhance the capabilities of the Sentinel System.

  • 72. Creating a development network

    The Creating a Development Network project aims to establish a framework for converting structured data from electronic health records (EHRs) and linked claims into the Sentinel Common Data Model (SCDM) at participating sites. The project has two specific aims: first, to ensure that structured data is consistently transformed into the SCDM format; and second, to develop a standardized process for storing free text notes at each site. This includes creating procedures for routine metadata extraction from these notes, enabling direct access for investigators and facilitating timely execution of future tasks within the Sentinel system.

  • 73. Empirical evaluation of EHR-based signal detection approaches

    The project focuses on empirically evaluating signal detection approaches based on electronic health record (EHR) data. It aims to develop methodologies for abstracting and integrating both structured and unstructured EHR data, enhancing the ability to identify signals related to health outcomes that can only be detected through EHR data. This includes leveraging natural language processing (NLP) and laboratory values to improve the accuracy and comprehensiveness of signal detection in healthcare settings.

  • 74. Label comparison tool to support identification of safety-related changes in drug labeling

    This project involves the development of an AI-powered label comparison tool designed to assist reviewers in identifying safety-related changes in drug labeling over time. By analyzing drug labels in PDF format, the tool utilizes BERT-based natural language processing to detect and highlight newly added safety issues. This capability supports the FDA’s efforts to update drug labeling based on postmarket data, ensuring that safety information is accurately reflected and communicated.

  • 75. Artificial Intelligence (AI) Supported Annotation of FAERS Reports

    This initiative aims to create a prototype software application that enhances the review process of the FDA Adverse Event Reporting System (FAERS) data. By developing computational algorithms, the application will semi-automatically categorize FAERS reports into meaningful medication error categories based on free text narratives. The project leverages existing annotated reports and collaborates with subject matter experts to refine initial natural language processing (NLP) algorithms. An active learning approach will be employed to continuously improve the accuracy of report categorization, ultimately supporting better medication safety monitoring.

  • 76. Community Level Opioid Use Dynamics Modeling and Simulation
    The Community Level Opioid Use Dynamics Modeling and Simulation project utilizes artificial intelligence, particularly Agent-Based Modeling (ABM), to analyze opioid use dynamics within communities. By integrating various datasets, the project investigates how geographical and social factors influence opioid use patterns. Additionally, machine learning techniques, such as classification, are employed to identify data entry types, enhancing the training data for modeling purposes. This comprehensive approach aims to provide insights into the factors driving opioid use and inform public health interventions.
  • 77. Automatic Recognition of Individuals by Pharmacokinetic Profiles to Identify Data Anomalies

    This project aims to enhance the detection of data anomalies in pharmacokinetic (PK) profiles related to Abbreviated New Drug Applications (ANDA). The Office of Biostatistics has developed an R Shiny application called DABERS (Data Anomalies in BioEquivalence R Shiny) to support the Office of Scientific Investigations (OSI) and the Office of Generic Drugs (OGD). The project addresses the complexity of PK and pharmacodynamic data, which cannot be adequately described by a single statistic. By employing advanced statistical methods, including machine learning and data augmentation, the initiative seeks to identify potential data manipulations and anomalies. The project has two main objectives: to provide a data-driven method for modeling complex PK patterns from a regulatory perspective, and to enhance understanding of drug response variability for public health research and drug development, ultimately guiding patient subgroup targeting and optimal dosing strategies.

  • 78. CluePoints CRADA

    The CluePoints CRADA project employs unsupervised machine learning techniques to detect and identify data anomalies within clinical trial data across various levels, including site, country, and subject. By considering multiple use cases, the project aims to enhance data quality and integrity, facilitate site selection for inspections, and assist reviewers in identifying potentially problematic sites for further sensitivity analyses. This initiative is crucial for ensuring the reliability and validity of clinical trial data.

  • 79. Clinical Study Data Auto-transcribing Platform (AI Analyst) for Generating Evidence to Support Drug Labelling

    The Clinical Study Data Auto-transcribing Platform, known as AI Analyst, is designed to autonomously generate clinical study reports from source data, thereby assessing the strength and robustness of analytical evidence for drug labeling. The platform transcribes Study Data Tabulation Model (SDTM) datasets from phase I/II studies into comprehensive clinical study reports with minimal human intervention. The underlying AI algorithm emulates the thought processes of subject matter experts, such as clinicians and statisticians, to accurately interpret study designs and results. The platform incorporates multiple layers of data pattern recognition to address the complexities of clinical study assessments, including diverse study designs and reporting formats. It has been trained on hundreds of New Drug Application (NDA) and Biologics License Application (BLA) submissions, as well as over 1500 clinical trials. The AI Analyst is compatible with various study types, including those related to drug interactions, renal/hepatic impairment, and bioequivalence. In 2022, the Office of Clinical Pharmacology initiated the RealTime Analysis Depot (RAD) project to routinely utilize this AI platform for reviewing New Molecular Entity (NME), 505(b)(2), and 351K submissions.

  • 80. Data Infrastructure Backbone for AI applications

    The Data Infrastructure Backbone for AI Applications project involves the creation of a data lake, referred to as the WILEE knowledgebase, which will integrate and ingest data from various sources to enhance advanced analytics and support risk-based decision-making. The data sources include internal stakeholder submissions, scientific literature from PubMed and NIH, CFSAN-generated data, news articles, and food sales data, among others. The design of the data lake allows for automated data ingestion while also permitting manual curation when necessary. It is structured to facilitate the identification and integration of new data sources as they become available. This centralized data repository will enhance insights into CFSAN-regulated products, food additives, and other relevant substances, ultimately improving knowledge discovery during the review of premarket submissions and post-market monitoring of the U.S. food supply.

  • 81. AI Engine for Knowledge discovery, Post-market Surveillance and Signal Detection

    The AI Engine for Knowledge Discovery, Post-Market Surveillance, and Signal Detection project aims to enhance the CFSAN’s capabilities in identifying potential issues related to commodities under its jurisdiction. By leveraging artificial intelligence, the project focuses on investigating chronic exposure risks associated with food additives, color additives, food contact substances, and contaminants, as well as the long-term use of cosmetics. The OFAS Warp Intelligent Learning Engine (WILEE) serves as an intelligent knowledge discovery and analytic agent, providing a horizon-scanning solution that analyzes data from the WILEE knowledgebase. This enables the Office to adopt a proactive approach, forecast industry trends, and prepare for potential operational risks, such as changes in USDA regulations. WILEE will facilitate risk-based decision-making by integrating diverse data sources and generating timely reports with actionable insights, significantly improving response times and overall effectiveness.

  • 82. Emerging Chemical Hazard Intelligence Platform (ECHIP - completed

    The Emerging Chemical Hazard Intelligence Platform (ECHIP) is an AI-driven solution developed to identify potential chemical hazards and emerging concerns related to substances of interest for CFSAN. By utilizing data from news sources, social media, and scientific literature, ECHIP enables CFSAN to proactively address stakeholder concerns and potential hazards. Prior to ECHIP, the signal identification and verification process could take 2-4 weeks, depending on the number of scientists involved in reviewing relevant literature. Pilot studies have shown that ECHIP can reduce this process to approximately 2 hours by automatically ingesting, analyzing, and presenting data from multiple sources, thereby streamlining the signal detection and verification workflow.

  • 83. OSCAR

    OSCAR, the Office of Science Customer Assistance Response chatbot, is designed to provide 24/7 support to users seeking assistance from the Customer Service Center. It features a user-friendly interface that allows users to input questions and access previous responses. Additionally, OSCAR includes a dashboard for administrative users, providing key metrics to monitor usage and performance, thereby enhancing customer service efficiency.

  • 84. SSTAT

    The Self-Service Text Analytics Tool (SSTAT) enables users to analyze and explore topics within a collection of documents. Users can submit documents to the tool, which then generates a list of topics and associated keywords. SSTAT automatically produces a visual representation of the documents and their related topics, providing users with a quick overview and facilitating efficient document analysis.

  • 85. ASSIST4TOBACCO

    ASSIST4Tobacco is a semantic search system designed to assist stakeholders in the Center for Tobacco Products (CTP) in locating tobacco authorization applications with greater accuracy and efficiency. By leveraging advanced search capabilities, the system enhances the ability of users to find relevant applications, thereby streamlining the review process and improving regulatory oversight.

  • 86. Using XGBoost Machine Learning Method to Predict Antimicrobial Resistance from WGS data

    This project utilizes genomic data and artificial intelligence/machine learning (AI/ML) techniques to investigate antimicrobial resistance (AMR) in pathogens such as Salmonella, E. coli, Campylobacter, and Enterococcus, sourced from retail meats, humans, and food-producing animals. The XGBoost machine learning model is employed to enhance predictions of antimicrobial susceptibility by estimating Minimum Inhibitory Concentrations (MICs) based on whole genome sequencing (WGS) data, thereby improving the understanding and management of AMR.

  • 87. Development of virtual animal models to simulate animal study results using Artificial Intelligence (AI)

    This project focuses on developing virtual animal models using artificial intelligence (AI) to simulate results from animal studies, which are critical for evaluating the safety of chemicals. As regulatory agencies, including the FDA, move towards the 3Rs principle (reduction, refinement, and replacement) of animal testing, the project proposes an AI-based generative adversarial network (GAN) architecture to learn from existing animal study data. This approach aims to generate relevant data for new and untested chemicals without the need for additional animal experiments. The FDA’s guidelines and frameworks, such as the Predictive Toxicology Roadmap, support the modernization of toxicity assessments through alternative methods, ultimately enhancing the FDA’s predictive capabilities and facilitating drug development while minimizing animal testing.

  • 88. Assessing and mitigating bias in applying Artificial Intelligence (AI) based natural language processing (NLP) of drug labeling documents

    This proposal addresses the growing concerns about bias in artificial intelligence (AI) systems used in biomedical sciences, particularly in the context of natural language processing (NLP) applied to drug labeling documents. The project aims to conduct a comprehensive study to assess potential biases that may arise when AI models trained on diverse datasets are applied to new domains. By understanding these biases, the initiative seeks to develop strategies to mitigate them, ensuring that AI applications in document analysis for FDA reviews are fair and accurate, ultimately enhancing the integrity of regulatory processes.

  • 89. Identify sex disparities in opioid drug safety signals in FDA adverse events report systems (FAERS) and social media Twitter to improve women health

    This project focuses on identifying sex disparities in opioid drug safety signals by analyzing data from the FDA Adverse Events Report Systems (FAERS) and social media platforms like Twitter. The initiative aims to address the Office of Women’s Health (OWH) 2023 priority area by examining differences in adverse events related to opioid drugs between genders. By comparing findings from FAERS and Twitter, the project seeks to determine whether social media can serve as an early warning system for opioid-related issues affecting women. The insights gained from this analysis could contribute to improving women’s health outcomes in the context of opioid use.

  • 90. Prediction of adverse events from drug - endogenous ligand - target networks generated using 3D-similarity and machine learning methods.

    This project aims to predict adverse events associated with drug interactions by utilizing drug-endogenous ligand-target networks and advanced machine learning methods. Molecular similarity has been a valuable tool in various fields, including virtual screening and toxicology, but predicting toxicological responses remains complex due to the involvement of multiple pathways and protein targets. The project focuses on developing a universal molecular modeling approach that employs unique three-dimensional fingerprints to capture the steric and electrostatic interactions between ligands and receptors. By quantifying both structural and functional similarities, this approach aims to enhance the prediction of adverse events from AI-generated networks, potentially revealing new insights into mechanisms of toxicity.

  • 91. Predictive toxicology models of drug placental permeability using 3D-fingerprints and machine learning

    This project focuses on developing predictive toxicology models to assess drug placental permeability, which is crucial for ensuring fetal safety during pregnancy. The human placenta facilitates the transfer of various substances through mechanisms such as passive diffusion and active transport. The project aims to utilize three-dimensional molecular similarities of endogenous placental transporter ligands to known drug substrates to identify the most likely mode of drug transportation. By building predictive models that link molecular characteristics to placental permeability, the initiative seeks to enhance the understanding of how drugs interact with placental transporters. Data will be gathered from literature mining, CDER databases, and empirical assessments using in vitro models, with validation conducted through blind test sets and small-scale studies of drugs with unknown permeabilities.

  • 92. Opioid agonists/antagonists knowledgebase (OAK) to assist review and development of analgesic products for pain management and opioid use disorder treatment

    The Opioid Agonists/Antagonists Knowledgebase (OAK) project aims to address the rising opioid overdose deaths in the United States by supporting the development of abuse-deterrent analgesic products and innovative treatments for opioid use disorder (OUD). The project will curate experimental data on opioid agonist and antagonist activities from public sources and conduct functional opioid receptor assays on approximately 2800 drugs using a quantitative high-throughput screening (qHTS) platform. Additionally, the initiative will develop and validate in silico models to predict opioid activity. The OAK knowledgebase will serve as a valuable resource for FDA reviewers, providing access to experimental data and protocols, and enabling read-across methods for estimating activity in chemicals lacking experimental data. This comprehensive approach aims to inform regulatory reviews and facilitate the development of safer analgesics and treatments for OUD.

  • 93. Development of a Comprehensive Open Access Molecules with Androgenic Activity Resource (MAAR) to Facilitate Assessment of Chemicals

    The project aims to create a comprehensive open-access resource called the Molecules with Androgenic Activity Resource (MAAR) to facilitate the assessment of chemicals for androgenic activity. The androgen receptor (AR) is crucial for evaluating drug safety and chemical risk, as it can be both a target and an off-target for various substances. Currently, existing data on androgenic activity is scattered across multiple sources and formats, hindering its usability. MAAR will consolidate this data and provide predictive models that adhere to the FAIR principles (Findable, Accessible, Interoperable, and Reusable). This resource will enhance research capabilities and support regulatory decision-making regarding the efficacy and safety of FDA-regulated products.

  • 94. Artificial Intelligence (AI)-based Natural Language Processing (NLP) for FDA labeling documents

    This project focuses on leveraging artificial intelligence (AI) and natural language processing (NLP) to analyze FDA labeling documents, which are often unstructured and lack standardization. The study aims to utilize advanced language models, such as BERT and BioBERT, to extract meaningful information from over 120,000 FDA drug labeling documents. Key areas of investigation include interpreting and classifying drug properties (safety and efficacy), summarizing text to highlight important sections, conducting automatic anomaly analysis for signal identification, and enhancing information retrieval through a question-and-answer format. The project will compare AI-based NLP approaches with traditional MedDRA methods to improve drug safety and efficacy assessments. Ultimately, the findings will establish benchmarks for applying public language models to FDA documents and support the future development of the FDA Label tool used in the Center for Drug Evaluation and Research (CDER) review process.

  • 95. Informing selection of drugs for COVID-19 treatment by big data analytics and artificial intelligence

    This project aims to utilize big data analytics and artificial intelligence to inform the selection of drugs for treating COVID-19. Given the global health crisis, with millions infected and significant mortality rates, there is an urgent need to repurpose existing drugs for effective treatment. The project will mine adverse drug event data from various sources, including public databases and social media, to gather safety information on potential repurposed drugs. The ultimate goal is to provide comprehensive adverse event data that will facilitate the safety evaluation of these drugs, helping to identify the most suitable candidates for repurposing and ensuring that the right patients are selected for treatment, thereby enhancing efforts to combat the pandemic.

  • 96. Towards Explainable AI: Advancing Predictive Modeling for Regulatory Use

    This project focuses on advancing the understanding and application of explainable artificial intelligence (AI) in regulatory contexts. As AI technologies become more prevalent, the FDA faces challenges in assessing AI-centric products and implementing AI methods to enhance its operations. A key aspect of this initiative is to explore the interpretability of AI models, which often lacks quantitative metrics and can be subjective. The project will investigate various AI methods, evaluating their performance and interpretability using established benchmark datasets and extending the analysis to clinical and pre-clinical datasets. The findings will provide essential parameters and guidance for developing explainable AI models, ultimately facilitating informed decision-making in regulatory settings.

  • 97. Identification of sex differences on prescription opioid use (POU)-related cardiovascular risks by big data analysis

    This project aims to investigate sex differences in cardiovascular risks associated with prescription opioid use (POU) through big data analysis. POU can lead to various adverse effects across different body systems, and significant sex differences have been noted in cardiac outcomes. The study will develop a novel statistical model to identify safety signals while considering gender as a variable, addressing limitations of existing FDA data mining methods. By analyzing real-world evidence from electronic health records (EHRs) and employing AI tools, the project seeks to uncover sex-dependent risk factors for cardiotoxicity related to POU. This initiative aligns with the FDA’s strategic priorities to reduce addiction crises and enhance women’s health research, ultimately providing valuable insights for drug reviewers and healthcare providers to mitigate cardiovascular risks in women using POU.

  • 98. NCTR/DBB-CDER/OCS collaboration on A SafetAI Initiative to Enhance IND Review Process

    This collaborative project aims to enhance the Investigational New Drug (IND) review process by utilizing artificial intelligence (AI) and machine learning (ML) to develop animal-free models for toxicity assessments. The initiative focuses on identifying safety biomarkers from non-animal assays and predicting safety outcomes based on chemical structure data. Deep learning (DL), a sophisticated subset of ML, will be employed to improve the identification of safety concerns related to drug-induced liver injury (DILI) and carcinogenicity. By leveraging DL’s advanced capabilities, the project seeks to streamline the IND review process and reduce reliance on animal testing, ultimately improving drug safety evaluations.

  • 99. individual Functional Activity Composite Tool (inFACT)

    The individual Functional Activity Composite Tool (inFACT) is designed to support the Social Security Administration (SSA) in the disability determination process. It assists adjudicators by extracting and presenting relevant functional evidence from extensive case records, which can span hundreds or thousands of pages. inFACT organizes and displays information regarding an individual’s overall functional capabilities, derived from free-text medical records, and aligns this data with key business elements, thereby streamlining the review process and enhancing decision-making efficiency.

  • 100. Assisted Referral Tool

    The Assisted Referral Tool is designed to aid in the assignment of relevant scientific areas for grant applications. By streamlining the referral process, the tool ensures that applications are directed to the appropriate scientific domains, enhancing the efficiency and accuracy of grant management and review.

  • 101. NanCI: Connecting Scientists

    NanCI (Connecting Scientists) is an AI-driven platform that helps users discover scientific content aligned with their interests. Users can collect research papers into a folder and utilize the tool to find similar articles in the literature. The platform allows users to refine recommendations through up or down voting, enhancing the personalization of content. Additionally, NanCI facilitates networking among users with shared interests, enabling them to connect and exchange recommendations within a scientific social network.

  • 102. Detection of Implementation Science focus within incoming grant applications

    This project involves the development of a tool that employs natural language processing (NLP) and machine learning to assess incoming grant applications for their focus on Implementation Science (IS). The tool calculates an IS score, which predicts whether a grant proposal aligns with the principles of Implementation Science, a relatively new field. The National Heart, Lung, and Blood Institute (NHLBI) utilizes this IS score to inform decisions regarding the assignment of applications to specific divisions for effective grants management and oversight.

  • 103. Federal IT Acquisition Reform Act (FITARA) Tool

    The Federal IT Acquisition Reform Act (FITARA) Tool is designed to streamline the identification of IT-related contracts within the National Institute of Allergy and Infectious Diseases (NIAID). By automating this process, the tool enhances efficiency and accuracy in tracking and managing IT acquisitions, ensuring compliance with federal regulations.

  • 104. Division of Allergy, Immunology, and Transplantation (DAIT) AIDS-Related Research Solution

    The DAIT AIDS-Related Research Solution employs natural language processing (NLP) and classification algorithms to analyze incoming grant applications. It predicts the priority level (high, medium, low) and identifies the relevant research area for each application. By ranking applications based on these predictions, the tool helps prioritize higher-ranked applications for review, thereby optimizing the grant evaluation process.

  • 105. Scientific Research Data Management System Natural Language Processing Conflict of Interest Tool

    The Scientific Research Data Management System’s Conflict of Interest Tool utilizes natural language processing (NLP) techniques, such as optical character recognition (OCR) and text extraction, to identify entities within grant applications. This functionality assists the NIAID’s Scientific Review Program team in efficiently detecting potential conflicts of interest (COI) between grant reviewers and applicants, thereby enhancing the integrity of the review process.

  • 106. Tuberculosis (TB) Case Browser Image Text Detection

    The Tuberculosis (TB) Case Browser Image Text Detection tool is designed to identify text within images that may contain Personally Identifiable Information (PII) or Protected Health Information (PHI) in TB-related portals. By detecting such sensitive information, the tool helps ensure compliance with privacy regulations and protects patient confidentiality.

  • 107. Research Area Tracking Tool

    The Research Area Tracking Tool is a dashboard that leverages machine learning algorithms to identify and track projects within designated high-priority research areas. This tool enhances visibility into ongoing research efforts, facilitating better resource allocation and strategic planning within the organization.

  • 108. NIDCR Digital Transformation Initiative (DTI)

    The NIDCR Digital Transformation Initiative (DTI) aims to develop a natural language processing (NLP) chatbot that enhances operational efficiency, transparency, and consistency for employees at the National Institute of Dental and Craniofacial Research (NIDCR). This chatbot will serve as a resource for employees, streamlining communication and information retrieval within the organization.

  • 109. NIDCR Data Bank

    The NIDCR Data Bank project enables intramural research program investigators to transfer large volumes of unstructured data into a scalable cloud archival storage solution. This system is designed to be cost-effective and includes robust metadata management for governance purposes. Additionally, it facilitates secondary and tertiary data analysis opportunities by leveraging advanced cognitive services, including artificial intelligence (AI), machine learning (ML), and natural language processing (NLP) toolsets.

  • 110. Automated approaches for table extraction

    The Automated Approaches for Table Extraction project focuses on creating a model-based automation process to streamline the extraction of data from published tables. Given that data tables often contain rich and critical information, this tool significantly reduces the time and effort required for manual data extraction, enhancing efficiency in data analysis and research.

  • 111. SWIFT Active Screener

    The SWIFT Active Screener employs statistical models to optimize the literature screening process for the Division of Translational Toxicology. By utilizing active learning techniques and incorporating user feedback, the tool automatically prioritizes studies, thereby saving screeners time and effort while enhancing the efficiency of evidence evaluations.

  • 112. Clinical Trial Predictor

    The Clinical Trial Predictor is a sophisticated tool that employs a combination of natural language processing (NLP) and machine learning algorithms to analyze the text of research applications. By examining titles, abstracts, narratives, specific aims, and research strategies, the tool predicts whether the applications are likely to involve clinical trials. This predictive capability aids in the efficient review and categorization of research proposals.

  • 113. JIT Automated Calculator (JAC)

    The JIT Automated Calculator (JAC) is a tool that utilizes natural language processing (NLP) to analyze Just-In-Time (JIT) Other Support forms submitted by principal investigators (PIs). By parsing these forms, the JAC determines the amount of external support that PIs are receiving from sources other than the pending application. This functionality enhances transparency and helps in evaluating the overall funding landscape for research projects.

  • 114. Similarity-based Application and Investigator Matching (SAIM)

    The Similarity-based Application and Investigator Matching (SAIM) system employs natural language processing (NLP) to identify grants awarded to National Institute of General Medical Sciences (NIGMS) Principal Investigators that are funded by non-NIH sources. This tool helps assess whether a new grant application overlaps significantly with existing grants from other agencies, thereby promoting efficient resource allocation and reducing redundancy in funding.

  • 115. Remediate Adobe .pdf documents to be more accessible

    The project focuses on improving the accessibility of Adobe .pdf documents to meet Section 508 standards, which ensure that electronic and information technology is accessible to people with disabilities. The National Library of Medicine (NLM) is exploring the use of artificial intelligence (AI) to remediate existing .pdf files that do not comply with these standards. By enhancing the accessibility of these documents, the initiative aims to better serve individuals who rely on assistive technologies, such as those who are blind or visually impaired.

  • 116. MEDIQA: Biomedical Question Answering

    The MEDIQA project focuses on automating the process of question answering in the biomedical field using artificial intelligence (AI) techniques. By leveraging both traditional and neural machine learning approaches, the project aims to address a diverse array of biomedical information needs. The goal is to enhance user access to National Library of Medicine (NLM) resources through a single entry point, streamlining the retrieval of relevant information for various users.

  • 117. CLARIN: Detecting clinicians' attitudes through clinical notes

    The CLARIN project aims to analyze clinical notes to detect clinicians’ attitudes, emotions, and potential biases. By employing artificial intelligence (AI) techniques, the project seeks to enhance understanding of how clinician sentiments may impact patient care and decision-making. This initiative supports efforts to promote equity and diversity in healthcare while improving the overall quality of care provided to patients.

  • 118. Best Match: New relevance search for PubMed

    The Best Match project introduces a new relevance search algorithm for PubMed, designed to enhance the user experience in finding biomedical literature. As the volume of published research continues to grow, retrieving the most relevant papers for specific queries has become increasingly difficult. The Best Match algorithm utilizes user intelligence and advanced machine learning techniques to prioritize search results based on relevance rather than the traditional date sort order, improving the efficiency of literature searches for millions of users.

  • 119. SingleCite: Improving single citation search in PubMed

    The SingleCite project enhances the single citation search functionality in PubMed, which is crucial for users seeking specific documents in scholarly databases. The automated algorithm developed for SingleCite establishes a mapping between queries and documents by employing a regression function that predicts the likelihood of a retrieved document being the target. This prediction is based on three key variables: the score of the highest-scoring document, the score difference between the top two documents, and the fraction of the query matched by the candidate citation. SingleCite has demonstrated superior performance in benchmarking tests and is particularly effective in rescuing queries that would otherwise fail to retrieve relevant results.

  • 120. Computed Author: author name disambiguation for PubMed

    The Computed Author tool addresses the challenge of author name ambiguity in PubMed, where multiple authors may share the same name, leading to irrelevant search results. The National Library of Medicine (NLM) developed a machine learning method that scores features to disambiguate pairs of papers with ambiguous author names. By employing agglomerative clustering, the tool groups all papers belonging to the same authors based on these classifications. The disambiguation process has been validated through manual verification, demonstrating higher accuracy than existing methods. This tool has been integrated into PubMed to enhance the efficiency of author name searches.

  • 121. NLM-Gene: towards automatic gene indexing in PubMed articles

    NLM-Gene is an innovative tool developed to automate the gene indexing process within PubMed articles, which is currently done manually by expert indexers. This tool utilizes advanced natural language processing (NLP) and deep learning techniques to identify gene names in biomedical literature, significantly reducing the time and resources required for indexing. The performance of NLM-Gene has been evaluated using gold-standard datasets, and it is set to be integrated into the MEDLINE indexing pipeline, enhancing literature retrieval and information access.

  • 122. NLM-Chem: towards automatic chemical indexing in PubMed articles

    NLM-Chem is a tool designed to automate the chemical indexing process for PubMed articles, which is currently a manual task performed by expert indexers. By employing advanced natural language processing (NLP) and deep learning methods, NLM-Chem efficiently identifies chemical names in biomedical literature. Its effectiveness has been validated against gold-standard evaluation datasets, and it is scheduled for integration into the MEDLINE indexing pipeline, thereby improving the efficiency of literature retrieval and access to chemical information.

  • 123. Biomedical Citation Selector (BmCS)

    The Biomedical Citation Selector (BmCS) automates the article selection process for the National Library of Medicine (NLM), enhancing the efficiency and effectiveness of indexing and hosting relevant information for public access. By standardizing the selection process through automation, BmCS significantly reduces the time required to process MEDLINE articles, thereby improving the overall workflow and accessibility of biomedical literature.

  • 124. MTIX

    MTIX is a machine learning-based system designed to automate the indexing of MEDLINE articles with Medical Subject Headings (MeSH) terms. Utilizing a multi-stage neural text ranking approach, MTIX enhances the efficiency of the indexing process, allowing for cost-effective and timely categorization of articles. This automation not only streamlines the indexing workflow but also improves the accessibility of biomedical literature for researchers and the public.

  • 125. ClinicalTrials.gov Protocol Registration and Results System Review Assistant

    The ClinicalTrials.gov Protocol Registration and Results System Review Assistant is a research initiative focused on evaluating the potential of artificial intelligence (AI) to enhance the efficiency and effectiveness of reviewing study records. By exploring AI integration, the project aims to streamline the review process for clinical trial protocols and results, ultimately improving the management and accessibility of clinical trial information.

  • 126. MetaMap

    MetaMap is a powerful tool that connects biomedical text to concepts within the Unified Medical Language System (UMLS) Metathesaurus. By utilizing natural language processing (NLP), MetaMap links the text found in biomedical literature to the underlying knowledge, including synonym relationships, contained in the Metathesaurus. The program offers a flexible architecture for exploring various mapping strategies and their applications, and it is used by the Medical Text Indexer (MTI) to generate potential indexing terms, enhancing the indexing process for biomedical literature.

  • 127. HIV-related grant classifier tool

    The HIV-related grant classifier tool is a user-friendly application designed for scientific staff to input grant information and automatically classify grants related to HIV research. The tool employs an automated algorithm to categorize the grants, and it features interactive data visualizations, including heat maps created with the Plotly Python library. These visualizations display the confidence levels of the predicted classifications, enhancing the analysis and management of HIV-related funding opportunities.

  • 128. Automated approaches to analyzing scientific topics

    This project focuses on developing automated methods for analyzing scientific topics using natural language processing (NLP) and artificial intelligence/machine learning (AI/ML) techniques. The approach groups semantically similar documents, such as grants, publications, and patents, and extracts AI-generated labels that accurately represent the scientific focus of each topic. This automated analysis aids in the evaluation and management of the National Institutes of Health (NIH) research portfolio, facilitating better insights into research trends and funding allocations.

  • 129. Identification of emerging areas

    The Identification of Emerging Areas project utilizes artificial intelligence (AI) and machine learning (ML) to analyze the age and rate of progress of various research topics within NIH portfolios. By assessing these metrics, the project can identify emerging areas of research on a large scale, thereby facilitating the acceleration of scientific progress and enabling more strategic research investments.

  • 130. Person-level disambiguation for PubMed authors and NIH grant applicants

    The Person-Level Disambiguation project focuses on accurately attributing grants, articles, and other research outputs to individual researchers, which is essential for conducting high-quality analyses. This enhanced disambiguation method improves the identification of authors in PubMed articles and NIH grant applications, thereby supporting data-driven decision-making processes and ensuring that researchers receive appropriate credit for their work.

  • 131. Prediction of transformative breakthroughs

    The Prediction of Transformative Breakthroughs initiative aims to enhance the pace of scientific discovery by predicting significant breakthroughs in biomedicine. By analyzing co-citation networks, the project has identified a common signature that can forecast breakthroughs more than five years before they are officially published. This predictive capability not only improves the efficiency of research investments but also has led to a patent application (U.S. Patent Application No. 63/257,818) for the methodology used in this approach.

  • 132. Machine learning pipeline for mining citations from full-text scientific articles

    The Machine Learning Pipeline for Mining Citations project, developed by the NIH Office of Portfolio Analysis, automates the identification of freely available scientific articles online that do not require a library subscription. The pipeline processes full-text PDFs, converting them to XML format, and employs a Long Short-Term Memory (LSTM) recurrent neural network to differentiate between reference text and other content within the articles. The references identified by the LSTM model are then processed through the Citation Resolution Service, enhancing the accessibility and usability of scientific literature. For further details, refer to the publication by Hutchins et al. (2019).

  • 133. Machine learning system to predict translational progress in biomedical research

    The Machine Learning System for Predicting Translational Progress in Biomedical Research is designed to assess whether a research paper is likely to be cited in future clinical trials or guidelines. By analyzing early reactions from the scientific community, this system can provide real-time predictions of translational progress in biomedicine. This capability enhances the understanding of how research impacts clinical practice and policy. For more information, see the publication by Hutchins et al. (2019).

  • 134. Research, Condition, and Disease Categorization (RCDC) AI Validation Tool

    The Research, Condition, and Disease Categorization (RCDC) AI Validation Tool is designed to enhance the accuracy and completeness of RCDC categories, which are essential for public reporting of health data. By ensuring that these categories are correctly assigned, the tool supports transparency and reliability in the dissemination of research findings and health information.

  • 135. Internal Referral Module (IRM)

    The Internal Referral Module (IRM) initiative leverages artificial intelligence (AI) and natural language processing (NLP) to automate the prediction of grant applications directed to NIH Institutes and Centers (ICs). By streamlining this manual process, the IRM enhances the ability of Program Officers to make informed decisions regarding grant applications, ultimately improving the efficiency of the review process.

  • 136. NIH Grants Virtual Assistant

    The NIH Grants Virtual Assistant is a chatbot designed to help users navigate and find grant-related information through the Office of Extramural Research (OER) resources. By providing immediate assistance and information, the chatbot enhances user experience and accessibility to vital grant information, facilitating the grant application process.

  • 137. Tool for Natural Gas Procurement Planning

    The Tool for Natural Gas Procurement Planning enables the NIH to develop a strategic procurement plan for natural gas. By utilizing current long-term forecasts, the tool helps set realistic price targets, ensuring that the NIH can effectively manage its energy costs and procurement strategies.

  • 138. NIH Campus Cooling Load Forecaster

    The NIH Campus Cooling Load Forecaster project is designed to predict the chilled water demand for the NIH campus over the next four days. This forecasting capability allows the management of the NIH Central Utilities Plant to effectively plan and optimize the operation and maintenance of the chiller plant, ensuring efficient energy use and reliable cooling for campus facilities.

  • 139. NIH Campus Steam Demand Forecaster

    The NIH Campus Steam Demand Forecaster project is designed to predict the steam demand for the NIH campus over the next four days. By providing accurate forecasts, this tool enables stakeholders at the NIH Central Utilities Plant to effectively plan and optimize the operation and maintenance of the steam system, ensuring efficient energy use and reliable service delivery.

  • 140. Chiller Plant Optimization

    The Chiller Plant Optimization project focuses on enhancing the efficiency of the chilled water production process at the NIH campus. By implementing strategies to reduce energy consumption, this initiative aims to lower operational costs and minimize the environmental impact of cooling systems, contributing to more sustainable campus operations.

  • 141. Natural Language Processing Tool for Open Text Analysis

    The Natural Language Processing Tool for Open Text Analysis is designed to enhance facility readiness and minimize downtime by enabling the analysis of previously inaccessible data contained in open text formats. By unlocking this data, the tool allows for better decision-making and operational efficiency across various departments.

  • 142. Contracts and Grants Analytics Portal

    The Contracts and Grants Analytics Portal is an AI-driven tool that significantly improves the ability of HHS Office of Inspector General (OIG) staff to access and analyze grants-related data. It allows users to quickly navigate to relevant findings from thousands of audits, discover similar findings, analyze trends, compare data across operating divisions (OPDIVs), and assess potential anomalies among grantees. This enhanced accessibility and analytical capability supports more informed decision-making and oversight.

  • 143. Text Analytics Portal

    The Text Analytics Portal is designed to empower personnel without a background in analytics to efficiently examine text documents. By utilizing a suite of technologies, including search functions, topic modeling, and entity recognition, the portal simplifies the analysis process. The initial implementation focuses on specific use cases relevant to the HHS Office of Inspector General (OIG), enhancing the ability to extract insights from text data.

Conclusion

The integration of AI, ML, and NLP technologies across these projects underscores a transformative shift towards more intelligent and efficient systems. Each use case illustrates how advanced tools can be harnessed to tackle specific challenges, from enhancing food safety to optimizing research grant management and improving drug safety evaluations. The successful implementation of these technologies not only promises to advance regulatory and research capabilities but also sets a precedent for future innovations. As these projects continue to evolve, they offer valuable insights and methodologies that can drive progress and ensure more effective solutions in the ever-changing landscape of scientific and regulatory endeavors.

Discuss a Use Case

Fill in your details & we will get back to you shortly.