NIST Sources Sought: Development of Evaluations and Benchmarks for Assessment of AI Models Cyber Capabilities

January 10, 2025

Notice ID: CAW-AISI-0002

NIST is performing market research to identify potential sources for an anticipated contract to assist in developing evaluations and benchmarks of AI models’ relevant cyber capabilities and risks.

The Contractor must provide or develop resources for various aspects of assessing frontier Al model cyber capabilities and risks. The Contractor would be responsible to conduct one or more of the tasks in list A in order to assess one or more of the capabilities in list B.

Contractor Tasks:

LIST A – Contractors must provide or develop resources for one or more of the following.

Developing benchmarks and scoring mechanisms for automated evaluation of Al models’ relevant cyber capabilities based on real or realistic offensive cyber tasks or workflows.
Developing tasks for automated evaluation of AI models’ relevant cyber capabilities with accompanying data on human baseline performance (e.g., how long the tasks take human experts to complete) …

LIST B – Relevant frontier model capabilities to elicit, evaluate, and benchmark include:

Capabilities that enable a model to discover vulnerabilities in real or realistic code bases, web resources, or networks;
Capabilities that enable a model to develop working exploits for discovered or known vulnerabilities in real or realistic code bases, web resources, or networks…

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

LEAVE A REPLY Cancel reply

User Agreement