Systems online

HarmBench: A standardized evaluation framework for automated red teaming and robust refusal

Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendrycks

This research is currently only available at its source.

You can find the research at the below link.
Feel free to contact Gray Swan with any questions or comments.