Gray Swan Research

All Research

Explore our published research to learn how the latest advances in AI safety and security give Gray Swan the edge against evolving threats.

Adversarial Attacks on Robotic Vision Language Action Models

robustness

•

Jun 2025

Eliot Krzysztof Jones, Alexander Robey, Andy Zou, Zachary Ravichandran, George J. Pappas, Hamed Hassani, Matt Fredrikson, J. Zico Kolter

The emergence of vision-language-action models (VLAs) for end-to-end control is reshaping the field of robotics by enabling the fusion of multimodal sensory inputs at the billion-parameter scale. The capabilities of VLAs stem primarily from their architectures, which are often based on frontier large language models (LLMs). However, LLMs are known to be susceptible to adversarial misuse, and given the significant physical risks inherent to robotics, questions remain regarding the extent to which VLAs inherit these vulnerabilities.

Pushing the Boundaries of AI Safety and Security Research

Timeline

Adversarial Attacks on Aligned Language Models

Representation Engineering: A Top-Down Approach to AI Transparency

Improving Alignment and Robustness with Circuit Breakers

Our thesis...

The road ahead

Alignment & Control

Monitoring & Evaluation

Robustness & Security

All Research

Adversarial Attacks on Robotic Vision Language Action Models

Improving Alignment and Robustness with Circuit Breakers

Representation Engineering: A Top-Down Approach to AI Transparency

Adversarial Attacks on Aligned Language Models

The WMDP benchmark: Measuring and reducing malicious use with unlearning

HarmBench: A standardized evaluation framework for automated red teaming and robust refusal

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark

OpenOOD: Benchmarking generalized out-of-distribution detection

Forecasting Future World Events with Neural Networks

Scaling Out-of-Distribution Detection for Real-World Settings

What Would Jiminy Cricket Do? Towards Agents That Behave Morally

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures

Globally-Robust Neural Networks

APPS: Measuring Coding Challenge Competence With APPS

MMLU: Measuring Massive Multitask Language Understanding

Aligning AI With Shared Human Values

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

Pretrained Transformers Improve Out-of-Distribution Robustness

Overfitting in adversarially robust deep learning

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

Fast is better than free: Revisiting adversarial training

Natural Adversarial Examples

Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

Randomized Smoothing: Certified adversarial robustness via randomized smoothing

Using Pre-Training Can Improve Model Robustness and Uncertainty

ImageNet-C: Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Deep Anomaly Detection with Outlier Exposure

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

Provable defenses against adversarial examples via the convex outer adversarial polytope

Join our newsletter