Publications

Conference Proceedings

  1. Z. Li, P.-Y. Chen, and T.-Y. Ho, “Retention Score: Quantifying Jailbreak Risks for Vision Language Models,” in AAAI 2025.
  2. Z. Li, P.-Y. Chen, and T.-Y. Ho, “GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models,” in NeurIPS 2024.

Projects

Robustness Evaluation of LLMs

  • Developed a framework to quantify jailbreak risks in vision-language models.
  • Proposed a generative model-based approach for adversarial robustness evaluation.

Collaborations

  • Stanford University: Adversarial Machine Learning
  • ETH Zurich: Trustworthy AI