PatchSiren

PatchSiren cyber security CVE debrief

CVE-2026-53923 vllm-project CVE debrief

A vulnerability in vLLM, an inference and serving engine for large language models (LLMs), was discovered. The issue, tracked as CVE-2026-53923, affects versions from 0.5.5 until 0.23.1rc0. The vulnerability is caused by integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels, leading to partial tensor processing. The output tensor is allocated at full size, but the dequantize CUDA kernel processes only a truncated number of elements. This results in the unfilled portion of the output tensor retaining whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. The vulnerability has a CVSS score of 5.3 and is classified as MEDIUM severity. It was published on June 22, 2026, and modified on June 24, 2026.

Vendor
vllm-project
Product
vllm
CVSS
MEDIUM 5.3
CISA KEV
Not listed in stored evidence
Original CVE published
2026-06-22
Original CVE updated
2026-06-24
Advisory published
2026-06-22
Advisory updated
2026-06-24

Who should care

Organizations using vLLM for inference and serving large language models should be aware of this vulnerability. Specifically, those with multi-tenant inference deployments may be at risk of information disclosure. Users of affected versions (from 0.5.5 to 0.23.1rc0) should take action to mitigate this vulnerability.

Technical summary

The vulnerability is caused by integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu). The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. This results in the unfilled portion of the output tensor retaining whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. The vulnerability is fixed in version 0.23.1rc0.

Defensive priority

Medium priority should be given to addressing this vulnerability, especially in multi-tenant inference deployments. Organizations should assess their exposure and apply the fix from version 0.23.1rc0 as soon as possible.

Recommended defensive actions

  • Assess exposure to this vulnerability in multi-tenant inference deployments.
  • Apply the fix from version 0.23.1rc0.
  • Review and update inventory of affected vLLM versions.
  • Monitor for potential information disclosure incidents.
  • Implement compensating controls to detect and prevent exploitation.

Evidence notes

The vulnerability was discovered and reported through the CVE program. The CVE record and NVD detail provide official information about the vulnerability. Additional details are available from the source item URL and mitigation or vendor references.

Official resources

This article is AI-assisted and based on the supplied source corpus.