NIST Sources Sought: Development of Evaluations and Benchmarks for Assessment of AI Models Cyber Capabilities

Notice ID: CAW-AISI-0002

NIST is performing market research to identify potential sources for an anticipated contract to assist in developing evaluations and benchmarks of AI models’ relevant cyber capabilities and risks.

The Contractor must provide or develop resources for various aspects of assessing frontier Al model cyber capabilities and risks. The Contractor would be responsible to conduct one or more of the tasks in list A in order to assess one or more of the capabilities in list B.

Contractor Tasks:

LIST A – Contractors must provide or develop resources for one or more of the following.

  1. Developing benchmarks and scoring mechanisms for automated evaluation of Al models’ relevant cyber capabilities based on real or realistic offensive cyber tasks or workflows.
  2. Developing tasks for automated evaluation of AI models’ relevant cyber capabilities with accompanying data on human baseline performance (e.g., how long the tasks take human experts to complete) …

LIST B – Relevant frontier model capabilities to elicit, evaluate, and benchmark include:

  1. Capabilities that enable a model to discover vulnerabilities in real or realistic code bases, web resources, or networks;
  2. Capabilities that enable a model to develop working exploits for discovered or known vulnerabilities in real or realistic code bases, web resources, or networks…

Read more here.

Ad



Not Yet a Premium Partner/Sponsor? Learn more about the OS AI Premium Corporate and Individual Plans here. Plans start at $250 annually.

How useful was this post?

Click on a star to rate it!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

LEAVE A REPLY

Please enter your comment!
Please enter your name here