Notice ID: CAW-AIRD-25
To enable the U.S. economy to harness the full benefits of AI, the National Institute of Standards and Technology (NIST) focuses on fundamental research and improving AI measurement science, technology, standards and related tools.
The U.S. AI Safety Institute (AISI), housed within the National Institute of Standards and Technology, is developing testing, evaluations, and guidelines to help accelerate trustworthy AI innovation in the United States and around the world, with a focus on promoting measurement science for AI capabilities and helping to prevent flaws or misuses of AI technology that could undermine public safety or national security.
As part of this work, AISI is conducting testing, evaluation, validation, and verification (TEVV) on high-impact frontier models’ capabilities. In doing so, AISI seeks to ensure that its projects, evaluations, and tools reflect the best available science, and to coordinate closely with a diverse set of Al stakeholders who are developing and conducting evaluations to assess capabilities, functionality, and risks.
NIST is performing market research to identify potential sources for an anticipated contract to assist in developing evaluations and benchmarks of AI models’ relevant software engineering and AI research and development capabilities, functionality, and risks.
The Contractor must provide or develop resources for various aspects of assessing the capability of frontier Al models to assist in software engineering and AI research and development, including by assessing the quality or functionality of AI-generated outputs in these domains and any corresponding risks …
Contractors must provide or develop resources for one or more of the following:
- Developing benchmarks and scoring mechanisms for automated evaluation of Al models’ relevant capabilities.
- Developing tasks for automated evaluation of AI models’ relevant capabilities with accompanying data on human baseline performance (e.g., how long the tasks take human experts to complete);
- Design and implementation of protocols or methods for evaluating AI models’ relevant capabilities.…
Relevant frontier model capabilities to elicit, evaluate, and benchmark include:
- Capabilities that enable a model to assist with or automate software development activities such as designing and implementing projects based on specifications, identifying and correcting bugs, updating and refactoring code, or deploying code.
- Capabilities that enable a model to assist with or automate research activities associated with frontier AI model development, such as the ability to generate and test hypotheses relating to the design of AI models or to perform iterative experimentation …
Not Yet a Premium Partner/Sponsor? Learn more about the OS AI Premium Corporate and Individual Plans here. Plans start at $295 annually.