Publications
Conference Proceedings
- Z. Li, P.-Y. Chen, and T.-Y. Ho, “Retention Score: Quantifying Jailbreak Risks for Vision Language Models,” in AAAI 2025.
- Z. Li, P.-Y. Chen, and T.-Y. Ho, “GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models,” in NeurIPS 2024.
Projects
Robustness Evaluation of LLMs
- Developed a framework to quantify jailbreak risks in vision-language models.
- Proposed a generative model-based approach for adversarial robustness evaluation.
Collaborations
- Stanford University: Adversarial Machine Learning
- ETH Zurich: Trustworthy AI