The Food and Drug Administration (FDA) is harnessing the power of artificial intelligence (AI) across various initiatives to advance its mission of ensuring public health and safety. According to AlphaSense, “With a total addressable market of nearly $50 billion for AI-enabled drug development, it is estimated that 30% of new drugs will be discovered using AI by 2025.” AI technologies are being utilized to enhance predictive modeling for pest detection, automate the analysis of agricultural and environmental data, and improve the management of food and nutrition information. From predicting invasive pest species at ports of entry to detecting early signs of disease in crops and optimizing resource allocation in agriculture, the below use cases highlight the FDA’s commitment to leveraging cutting-edge technologies for more effective and efficient operations.
The Food and Drug Administration (FDA) is employing machine learning algorithms to enhance the predictive modeling of invasive pest species at ports of entry. By analyzing inspection data, this initiative aims to improve the detection capabilities for significant invasive and quarantine pests, thereby strengthening biosecurity measures and protecting agricultural resources.
This project focuses on the detection of pre-symptomatic Huanglongbing (HLB) infection in citrus crops by analyzing multispectral and thermal imagery. The goal is to identify specific pixels that exhibit HLB infection signatures, enabling early intervention and management strategies to mitigate the spread of this devastating disease in citrus orchards.
The High Throughput Phenotyping project aims to enhance the monitoring of citrus orchard health by automating the processes of locating, counting, and categorizing citrus trees. This initiative utilizes advanced imaging and data analysis techniques to provide timely insights into orchard conditions, facilitating better management practices and improving overall crop health.
This project focuses on the identification and localization of aquatic weeds using advanced detection techniques. By accurately mapping the presence of these invasive species, the initiative aims to support effective management and control strategies to protect aquatic ecosystems and maintain biodiversity.
The Automated Detection & Mapping of Host Plants project focuses on utilizing ground-level imagery, such as street view photos, to generate detailed maps of target tree species. This initiative aims to enhance the understanding of plant distributions and support ecological studies and management efforts by providing accurate spatial data on host plants.
The Standardization of Cut Flower Business Names project employs natural language processing (NLP) techniques to standardize business names in message set data. The process involves cleaning the data by removing punctuation and calculating cosine similarity to match similar terms. This standardization enhances data consistency and facilitates better data analysis and reporting within the cut flower industry.
The Approximate String or Fuzzy Matching project utilizes an algorithm to compute string similarity metrics, enabling the classification of similar but not identical text in administrative documents. This automation reduces information duplication and minimizes the need for manual error-checking, streamlining document management processes and improving data accuracy.
This project focuses on training machine learning models to automate the reading of file attachments, specifically PDF documents, and extract relevant information into a more user-friendly Excel format. By leveraging artificial intelligence, the initiative aims to save program managers time and effort, as they often receive numerous documents daily and need to extract specific data from them efficiently.
The Artificial Intelligence for Correlative Statistical Analysis project employs various AI-driven statistical techniques to model predictive relationships between different variables. Commonly used methods include random forests, artificial neural networks, k-nearest neighbor clustering, and support vector machines. This initiative enhances the FDA’s ability to make data-driven predictions and informed decisions based on complex datasets.
The ARS Project Mapping initiative utilizes natural language processing (NLP) to analyze research project plans, performing term analysis and clustering. This approach allows national program leaders to interact with a dashboard that highlights synergies and patterns across various Agricultural Research Service (ARS) research program portfolios, facilitating better collaboration and resource allocation.
The NAL Automated Indexing project employs Cogito software to automate the subject indexing of approximately 500,000 peer-reviewed journal articles annually. By utilizing the National Ag Library Thesaurus (NALT) concept space, the software annotates articles with relevant metadata, enhancing the discoverability of content in the Library’s bibliographic citation databases, including AGRICOLA, PubAg, and Ag Data Commons.
The Democratizing Data project aims to leverage AI tools, machine learning, and natural language processing to analyze how publicly funded data and evidence are utilized to benefit science and society. By understanding these dynamics, the initiative seeks to promote transparency, accessibility, and effective use of data in public health and policy-making.
The Westat project is a competitive initiative aimed at discovering automated methods to effectively link USDA nutrition information to a dataset containing 750,000 food items related to purchases and acquisitions. Competing teams employed various AI techniques, including natural language processing (NLP), random forests, and semantic matching, to develop innovative solutions that enhance the integration of nutrition data with food item information, ultimately supporting better dietary guidance and public health initiatives.
The Retailer Receipt Analysis project is a proof of concept that employs Optical Character Recognition (OCR) technology to analyze a sample of up to 1,000 Food and Nutrition Service (FNS) receipts and invoices. This initiative aims to demonstrate how the current manual review process can be automated, leading to significant time savings for staff, improved accuracy in reviews, and enhanced detection of complex patterns. The ultimate goal is to develop a review system that features an automated workflow capable of learning from analyst feedback, incorporating known Supplemental Nutrition Assistance Program (SNAP) fraud patterns, identifying new patterns, and visualizing alerts related to these patterns on retailer invoices and receipts.
The Nutrition Education & Local Access Dashboard aims to create a comprehensive county-level visualization of nutrition support provided by the Food and Nutrition Service (FNS). This dashboard focuses on nutrition education and local food access, integrating various metrics related to hunger and nutritional health. To enhance usability, the team developed a K-means clustering script that categorizes states into seven different groups based on characteristics such as Farm to School intensity, program activity, ethnicity, fresh food access, school size, and program participation. This clustering enables users to identify similar states, fostering potential partnerships and collaborations that may not have been previously considered.
The Land Change Analysis Tool (LCAT) utilizes a random forest machine learning classifier to create high-resolution land cover maps derived from aerial and satellite imagery. Training data for the model is generated through a custom-built web application, and the processing tasks are executed on a 192-node Docker cluster to efficiently handle CPU-intensive computations. The results of this analysis are made publicly available through an image service. To date, the project has successfully mapped over 600 million acres and generated more than 700,000 training samples, contributing valuable data for land management and environmental monitoring.
The Ecosystem Management Decision Support System (EMDS) is a spatial decision support tool designed for landscape analysis and planning. It operates as a component of popular geographic information systems (GIS) such as ArcGIS and QGIS. Users can develop tailored applications to address specific challenges, utilizing a combination of four AI engines: logic processing, multi-criteria decision analysis, Bayesian networks, and Prolog-based decision trees. This flexibility allows for comprehensive analysis and informed decision-making in ecosystem management.
The Wildland Urban Interface – Mapping Wildfire Loss project is a proof-of-concept study that explores the application of machine learning techniques, specifically deep learning and convolutional neural networks, along with object-based image classification methods. The goal is to accurately identify buildings, assess building loss, and evaluate defensible space around structures before and after wildfire events in areas where urban and wildland environments intersect. This research aims to enhance understanding of wildfire impacts and inform mitigation strategies.
The CLT Knowledge Database is an information system designed to catalog and provide access to cross-laminated timber (CLT) information. It utilizes data aggregator bots that scour the internet for relevant content, searching for hundreds of keywords and employing machine learning to assess the relevance of the findings. The search engine intelligently locates and updates pertinent CLT references, categorizing information based on common applications and interest areas. As of February 24, 2022, the database has cataloged over 3,600 publications related to various aspects of CLT. This system promotes the growth of mass timber markets by disseminating knowledge, facilitating collaboration among stakeholders, and minimizing duplication of efforts. It benefits manufacturers, researchers, design professionals, code officials, government agencies, and other stakeholders, ultimately supporting the increased use of mass timber and enhancing forest health by raising the economic value of forests.
The RMRS Raster Utility is a .NET object-oriented library designed to streamline data acquisition, raster sampling, and both statistical and spatial modeling. This utility aims to reduce the processing time and storage space required for raster analysis, making it more efficient for users. Additionally, the library incorporates machine learning techniques, enhancing its capabilities for analyzing and interpreting raster data in various applications.
TreeMap 2016 is a comprehensive tree-level model that represents the forests of the contiguous United States. It integrates forest plot data from the Forest Inventory and Analysis (FIA) program and aligns this data with a 30×30 meter grid. The model is utilized in both private and public sectors for various applications, including fuel treatment planning, snag hazard mapping, and estimating terrestrial carbon resources. A random forests machine learning algorithm was employed to impute forest plot data into a set of target rasters provided by the Landscape Fire and Resource Management Planning Tools (LANDFIRE). The model considers various predictor variables, including forest cover percentage, tree height, vegetation type, topography (slope, elevation, aspect), geographical location (latitude and longitude), biophysical factors (photosynthetically active radiation, precipitation, temperature variations, relative humidity, and vapor pressure deficit), and disturbance history (time since disturbance and type of disturbance) for the landscape as of 2016.
The Landscape Change Monitoring System (LCMS) is a national initiative that utilizes remote sensing data from Landsat and Sentinel satellites, produced by the USDA Forest Service, to map and monitor changes in vegetation canopy cover, land cover, and land use. The system employs temporal change classifications in conjunction with training data to conduct supervised classification processes, enabling the identification of vegetation gain and loss as well as changes in land cover and use. This monitoring system provides valuable insights for land management and conservation efforts.
The Geospatial and Remote Sensing Training Courses provide education on various software and scripting techniques that facilitate machine learning applications in geospatial analysis. The curriculum is dynamic, covering topics such as introductory and advanced change detection, the eCognition software package, and geospatial scripting for Google Earth Engine. Additionally, some courses include training on using Collect Earth Online, equipping participants with the skills needed to analyze and interpret remote sensing data effectively.
The Forest Health Detection Monitoring project employs machine learning models to enhance the analysis of forest health across the United States. This involves upscaling training data collected from various sources, including Sentinel-2, Landsat, MODIS, and lidar imagery, to effectively map and monitor stages of forest mortality and defoliation. Additionally, the project includes post-processing of raster outputs into vector polygons, facilitating more detailed spatial analysis and visualization of forest health conditions.
The Cropland Data Layer (CDL) project utilizes machine learning algorithms to analyze satellite sensor readings and classify the type of crop or agricultural activity present in each 30 square meter pixel on the ground. The algorithms are trained using data from the USDA’s Farm Services Agency and other reliable sources to establish “ground truth.” This process not only generates accurate classifications but also allows for the assessment of classification accuracy. The CDL has been produced for national coverage since 2008 and is particularly accurate for major commodities like corn and soybeans. Additional information and background on the CDL can be found in various peer-reviewed research papers and presentations.
The List Frame Deadwood Identification project employs boosted regression trees to analyze various inputs, including administrative linkage data, frame data, and historical response information. The model generates a propensity score that indicates the relative likelihood of a farm operation being out of business. By identifying common tree splits and integrating expert knowledge, the project aims to establish a systematic process for identifying and addressing deadwood in agricultural operations, thereby improving data accuracy and resource allocation.
The Census of Agriculture Response Propensity Scores project utilizes random forest models to derive scores that predict the likelihood of response to the Census of Agriculture (COA). By analyzing historical data, control data, and other survey information, these scores assist in targeting more effective data collection strategies. This approach enhances the efficiency of the census process by focusing efforts on areas with higher predicted response rates.
The Climate Change Classification NLP project employs natural language processing techniques to classify projects funded by the National Institute of Food and Agriculture (NIFA) as either related to climate change or not. The model analyzes input features such as project titles, non-technical summaries, objectives, and keywords. The classification results in a binary outcome, indicating whether each project is associated with climate change, thereby aiding in the assessment of funding priorities and research focus areas.
The Operational Water Supply Forecasting project for western U.S. rivers relies on forecasts of spring-summer river flow volumes generated by operational hydrologic models. The USDA Natural Resources Conservation Service (NRCS) National Water and Climate Center operates the largest regional forecasting system, continuing a nearly century-old tradition. Recently, the NWCC developed a next-generation prototype called the multi-model machine-learning metasystem (M4), which integrates various AI and data science technologies tailored to meet specific user needs. The system requires inputs related to snow and precipitation from the NRCS Snow Survey and Water Supply Forecast program’s SNOTEL environmental monitoring network, although it is adaptable. In hindcasting tests across diverse environments in the western U.S. and Alaska, the M4 system demonstrated significant improvements in out-of-sample accuracy compared to existing benchmarks. Key technical features, such as multi-model ensemble modeling, autonomous machine learning (AutoML), hyperparameter pre-calibration, and theory-guided data science, enable automated training and operation. Live operational testing at selected sites has confirmed the logistical feasibility of workflows and provided geophysical explanations of results based on known hydroclimatic processes, addressing concerns about the opacity of machine learning models and allowing for relatable forecast narratives for NRCS customers.
The Ecological Site Descriptions project involves the analysis of over 20 million records related to soil data and 20,000 text documents containing information on ecological states and transitions. This extensive analysis aims to enhance understanding of ecological conditions and inform land management practices by leveraging machine learning techniques to extract insights from the vast dataset.
The Conservation Effects Assessment Project aims to predict the conservation benefits achieved at the field level through various agricultural practices. The model integrates data from farmer surveys, APEX modeling results, and environmental data to assess the effectiveness of conservation efforts. This predictive capability supports informed decision-making and resource allocation for conservation initiatives.
The Digital Imagery (No-Change) project for the National Resources Inventory (NRI) program employs neural networks and other artificial intelligence technologies to identify areas in digital imagery that have not changed over time. This capability is crucial for monitoring land use and resource conditions, enabling efficient management of natural resources and supporting the goals of the NRI program.
The Artificial Intelligence SPAM Mitigation Project utilizes a combination of Robotic Process Automation (RPA) and machine learning (AI/ML) models to automatically identify and eliminate spam and marketing emails from civil rights complaints email channels. Given that a substantial portion of incoming emails to the Office of Adjudication and Special Counsel Review (OASCR) consists of spam, marketing, and phishing attempts, this solution aims to enhance the efficiency and security of email management within the department.
The Acquisition Approval Request Compliance Tool employs a natural language processing (NLP) model to analyze procurement header and line descriptions within the USDA’s Integrated Acquisition System (IAS). This model assesses the likelihood that a given award is related to information technology (IT) and may therefore require an Acquisition Approval Request (AAR). By examining the text characteristics of awards with existing AAR numbers, the model calculates the probability for procurements lacking an AAR number, thereby streamlining compliance processes and enhancing procurement accuracy.
The Intelligent Ticket Routing project automates the process of routing BMC Remedy tickets to the appropriate work groups. Utilizing a combination of technologies, including Python, JupyterHub, Scikit-learn, GitLab, Flask, Gunicorn, Nginx, and ERMS, this system enhances operational efficiency by ensuring that tickets are directed to the right teams without manual intervention, thereby improving response times and service quality.
The Predictive Maintenance Impacts project focuses on forecasting the effects of maintenance activities on infrastructure items within the DISC framework. By leveraging tools such as Einblick, MySQL, Python, Linux, and Tableau, the project aims to analyze maintenance data and predict potential impacts, enabling proactive maintenance strategies that enhance infrastructure reliability and performance.
The Video Surveillance System (VSS) is designed to provide comprehensive video management capabilities, integrating various surveillance subsystems, including NVRs, DVRs, encoders, fixed and pan-tilt cameras, network switches, routers, and other necessary hardware. The VSS will enable the collection, management, and clear presentation of video feeds from multiple sources, facilitating a unified configuration platform for both analog and digital video devices. This system normalizes disparate video systems into a cohesive viewing experience, allowing operators to easily drag and drop cameras into views and utilize advanced features for tracking targets across sequential cameras, enhancing overall security management.