When Grammar Guides the Attack

When Grammar Guides the Attack:
Uncovering Control-Plane Vulnerabilities in LLMs with Structured Output

Shuoming Zhang¹, Jiacheng Zhao^1*, Hanyuan Dong¹, Ruiyuan Xu¹, Zhicheng Li¹, Yangyu Zhang¹,
Shuaijiang Li¹, Yuan Wen², Chunwei Xia³, Zheng Wang³, Xiaobing Feng¹, Huimin Cui¹

¹SKLP, ICT, CAS & UCAS ²University of Aberdeen ³University of Leeds
^*Corresponding author

ACM CCS 2026

Abstract

Large Language Models (LLMs) increasingly serve as tooling platforms through structured output APIs, but the grammar-guided decoding that powers this feature opens a critical control-plane attack surface orthogonal to traditional data-plane vulnerabilities.

We introduce the Constrained Decoding Attack (CDA), a new jailbreak class that targets the LLM control plane. CDA is best characterized as a control-to-semantic pipeline: (1) schema-enforced logit masking injects a malicious prefix into the generation trajectory, and (2) the model itself completes the harmful intent. Unlike data-plane jailbreaks that rely on bypassing alignment with visible inputs, CDA acts on the decoding process itself, so internal safety alignment alone cannot stop it.

We instantiate CDA with EnumAttack (hides malicious content in enum fields) and the more evasive DictAttack (decouples the payload across a benign prompt and a dictionary-based grammar). Across 13 proprietary/open-weight models and five standard benchmarks, DictAttack achieves 94.3–99.5% ASR on flagship models including gpt-5, gemini-2.5-pro, deepseek-r1, and gpt-oss-120b. While basic grammar auditing mitigates EnumAttack, DictAttack still sustains 75.8% ASR against state-of-the-art jailbreak guardrails, exposing a "semantic gap" that demands cross-plane defenses.

Headline numbers

94–99%

DictAttack ASR on flagship models
(gpt-5, gemini-2.5-pro,
deepseek-r1, gpt-oss-120b)

75.8%

DictAttack ASR even under SOTA
cross-plane Dual-Plane Guard

0.98

StrongREJECT composite score on
gemini-2.5-pro — near-perfect
useful harmful answers

proprietary & open-weight LLMs
evaluated across 5 standard
jailbreak benchmarks

Threat model in one sentence

An attacker who can supply the grammar (JSON Schema, regular expression, or context-free grammar) used by an LLM API for structured output — e.g. a malicious MCP tool, a compromised agent backend, or an attacker-controlled tool definition in a multi-tenant pipeline — can hide malicious intent inside the schema while the user-visible prompt stays benign. The constrained-decoding engine then deterministically forces the target model to commit to the malicious trajectory before any data-plane guardrail or alignment-side defense ever sees the assembled intent.

Code & artifact

We release a sanitized public proof-of-concept repository (EnumAttack & DictAttack illustrations + Algorithm 1 reference implementation) at github.com/zhangshuoming990105/ConstrainedDecodingAttack.

The full evaluation harness — multi-model batch runners, audit pipelines (LlamaGuard / OpenAI Moderation / SelfDefend), the Circuit Breaker harness used in §5.5, and per-model run logs across all 13 models and five benchmarks — is provided as a separate academic-only gated artifact. Verified researchers may request access from the corresponding author with their institutional affiliation and intended use.

Disclosure. We disclosed the underlying vulnerability to OpenAI and Google (Gemini) in early 2025; the embargo has since passed. The maintainers of xgrammar were also notified and acknowledged the issue. Mitigations deployed inside closed-source provider stacks are not visible to us; do not assume any particular endpoint is patched.

BibTeX

@inproceedings{zhang2026cda, title = {When Grammar Guides the Attack: Uncovering Control-Plane Vulnerabilities in {LLM}s with Structured Output}, author = {Zhang, Shuoming and Zhao, Jiacheng and Dong, Hanyuan and Xu, Ruiyuan and Li, Zhicheng and Zhang, Yangyu and Li, Shuaijiang and Wen, Yuan and Xia, Chunwei and Wang, Zheng and Feng, Xiaobing and Cui, Huimin}, booktitle = {Proceedings of the 2026 ACM SIGSAC Conference on Computer and Communications Security (CCS '26)}, year = {2026}, publisher = {ACM}, }

When Grammar Guides the Attack:Uncovering Control-Plane Vulnerabilities in LLMs with Structured Output