Response to the NIST RFI on Auditing, Evaluating, and Red-Teaming AI Systems

IAPS provided a response to the National Institute of Standards and Technology (NIST)’s request for information related to NIST’s assignments under Sections 4.1, 4.5, and 11 of the Executive Order Concerning Artificial Intelligence (EO 14110). IAPS’ comments outlined specific guidelines and practices that could help AI actors better manage and mitigate risks from AI systems, particularly from dual-use foundation models (DUFMs).

Our recommendations were:

  • External actors conducting AI auditing, evaluation, and red-teaming (collectively referred to here as “external scrutiny”) should be given sufficient access, independence, expertise, and resources to perform effective scrutiny of models, particularly DUFMs. Additionally, guidelines for AI system evaluations should focus on desired outcomes, rather than specific technical methods.

  • NIST should recommend that generative AI developers working on DUFMs maintain incident response plans, and thresholds for incident response, for dangerous model capabilities, including CBRN, cyber risks, and risks from model autonomy.

  • For DUFMs and other generative AI systems as appropriate, AI developers should play a large role in adopting a “shift left” for AI risk management by emphasizing safety and security activities earlier in the development cycle.

  • The NIST AI 100-1 should recommend a strong defense-in-depth approach for DUFMs and other generative AI systems as appropriate, by identifying multiple measures with independent failure mechanisms for important categories of activity in the AI RMF, so that common cause failures do not overcome multiple defensive layers at once.

  • NIST should issue guidance to define and distinguish different types of AI red-teaming, as AI practitioners currently use red-teaming to refer to many distinct types of assurance techniques.

  • NIST should provide guidance on threat modeling, and highlight it as an essential activity to guide the prioritization of red-teaming efforts and inform the development of new model evaluations of DUFMs.

  • NIST guidelines on red-teaming should include guidance around conducting “adversary simulation,” a realistic simulation of well-resourced, persistent, and highly motivated adversaries and actors, as an example of good practice for identifying risks from catastrophic misuse.

Previous
Previous

Federal Drive with Tom Temin podcast interview: Onni Aarne on AI hardware security risks

Next
Next

Secure, Governable Chips