GAO: Artificial Intelligence: Generative AI Training, Development, and Deployment Considerations

Why GAO Did This Study

For this technology assessment, GAO was asked to describe commercial development of generative AI technologies. This report is the second in a body of work looking at generative AI. In future reports, GAO plans to assess societal and environmental effects of the use of generative AI and federal research, development, and adoption of generative AI technologies. This report provides an overview of the common practices, limitations, and other factors affecting the responsible development of generative AI technologies and the processes commercial developers follow to collect, use, and store training data for generative AI technologies.

To conduct this assessment, GAO conducted literature reviews and interviewed representatives from several leading companies developing generative AI technologies. GAO selected these companies based on known information about generative AI technologies they have created and the voluntary commitments they made to the White House to manage risks posed by AI. Through these interviews, GAO gathered information regarding common practices these companies use to develop and deploy generative AI technologies. GAO also reviewed relevant publicly available documentation, such as white papers, model cards, and guidance documents to identify further information regarding training data and the types of attacks used to compromise generative AI models.

What GAO Found

Commercial developers use several common practices to facilitate responsible development and deployment of generative artificial intelligence (AI) technologies. For example, they may use benchmark tests to evaluate the accuracy of models, employ multi-disciplinary teams to evaluate models prior to deployment, and conduct red teaming to test the security of their models and identify potential risks, among others. These practices focus on quantitative and qualitative evaluation methods for providing accurate and contextual results, as well as preventing harmful outputs.

In addition, commercial developers face some limitations in responsibly developing and deploying generative AI technologies to ensure that they are safe and trustworthy. Primarily, developers recognize that their models are not fully reliable, and that user judgment should play a role in accepting model outputs. In various white papers, model cards, and other documentation, they have noted that despite the mitigation efforts, their models may produce incorrect outputs, exhibit bias, or be susceptible to attacks. Such attacks include prompt injection attacks, jailbreaks, and data poisoning. Prompt injection attacks and jailbreaks rely on text prompt inputs that may change the behavior of a generative AI model that could be used to conduct misinformation campaigns or transmit malware, among other malicious activities. Data poisoning is a process by which an attacker can change the behavior of a generative AI system through manipulation of its training data or process.

Generative AI typically requires a large dataset for training—ranging from millions to trillions of data points. Training information is used to help models learn about language and how to respond to questions. The quantity of data can vary based on the specific type of model. However, information regarding the specifics of training datasets is not entirely available to the public. The commercial developers GAO met with did not disclose detailed information about their training datasets beyond high-level information identified in model cards and other relevant documentation. For example, many stated that their training data consists of information publicly available on the internet.

Foundation models are especially susceptible to poisoning attacks when training data are scraped from public sources. Commercial developers are taking measures to safeguard sensitive information by undergoing privacy evaluations at various stages of training and development. Before training a model, developers can filter and curate training data to reduce the use of harmful content, such as sites that collect personal information.

Read the report here.

Ad



Not Yet a Premium Partner/Sponsor? Learn more about the OS AI Premium Corporate and Individual Plans here. Plans start at $250 annually.

How useful was this post?

Click on a star to rate it!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

LEAVE A REPLY

Please enter your comment!
Please enter your name here