Large Language Models (LLMs) increasingly serve as tooling platforms through structured output APIs, but the grammar-guided decoding that powers this feature opens a critical control-plane attack surface orthogonal to traditional data-plane vulnerabilities.
We introduce the Constrained Decoding Attack (CDA), a new jailbreak class that targets the LLM control plane. CDA is best characterized as a control-to-semantic pipeline: (1) schema-enforced logit masking injects a malicious prefix into the generation trajectory, and (2) the model itself completes the harmful intent. Unlike data-plane jailbreaks that rely on bypassing alignment with visible inputs, CDA acts on the decoding process itself, so internal safety alignment alone cannot stop it.
We instantiate CDA with EnumAttack (hides malicious content in enum fields) and the more evasive DictAttack (decouples the payload across a benign prompt and a dictionary-based grammar). Across 13 proprietary/open-weight models and five standard benchmarks, DictAttack achieves 94.3–99.5% ASR on flagship models including gpt-5, gemini-2.5-pro, deepseek-r1, and gpt-oss-120b. While basic grammar auditing mitigates EnumAttack, DictAttack still sustains 75.8% ASR against state-of-the-art jailbreak guardrails, exposing a "semantic gap" that demands cross-plane defenses.
94–99%
DictAttack ASR on flagship models
(gpt-5, gemini-2.5-pro,deepseek-r1, gpt-oss-120b)
75.8%
DictAttack ASR even under SOTA
cross-plane Dual-Plane Guard
0.98
StrongREJECT composite score ongemini-2.5-pro — near-perfect
useful harmful answers
13
proprietary & open-weight LLMs
evaluated across 5 standard
jailbreak benchmarks
An attacker who can supply the grammar (JSON Schema, regular expression, or context-free grammar) used by an LLM API for structured output — e.g. a malicious MCP tool, a compromised agent backend, or an attacker-controlled tool definition in a multi-tenant pipeline — can hide malicious intent inside the schema while the user-visible prompt stays benign. The constrained-decoding engine then deterministically forces the target model to commit to the malicious trajectory before any data-plane guardrail or alignment-side defense ever sees the assembled intent.
We release a sanitized public proof-of-concept repository (EnumAttack & DictAttack illustrations + Algorithm 1 reference implementation) at github.com/zhangshuoming990105/ConstrainedDecodingAttack.
The full evaluation harness — multi-model batch runners, audit pipelines (LlamaGuard / OpenAI Moderation / SelfDefend), the Circuit Breaker harness used in §5.5, and per-model run logs across all 13 models and five benchmarks — is provided as a separate academic-only gated artifact. Verified researchers may request access from the corresponding author with their institutional affiliation and intended use.
Disclosure. We disclosed the underlying vulnerability to OpenAI and Google (Gemini) in early 2025; the embargo has since passed. The maintainers of xgrammar were also notified and acknowledged the issue. Mitigations deployed inside closed-source provider stacks are not visible to us; do not assume any particular endpoint is patched.
@inproceedings{zhang2026cda,
title = {When Grammar Guides the Attack: Uncovering Control-Plane
Vulnerabilities in {LLM}s with Structured Output},
author = {Zhang, Shuoming and Zhao, Jiacheng and Dong, Hanyuan and
Xu, Ruiyuan and Li, Zhicheng and Zhang, Yangyu and
Li, Shuaijiang and Wen, Yuan and Xia, Chunwei and
Wang, Zheng and Feng, Xiaobing and Cui, Huimin},
booktitle = {Proceedings of the 2026 ACM SIGSAC Conference on Computer and
Communications Security (CCS '26)},
year = {2026},
publisher = {ACM},
}