PatchSiren cyber security CVE debrief

CVE-2026-53923 vllm-project CVE debrief

A vulnerability in vLLM, an inference and serving engine for large language models (LLMs), was discovered. The issue, tracked as CVE-2026-53923, affects versions from 0.5.5 until 0.23.1rc0. The vulnerability is caused by integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels, leading to partial tensor processing. The output tensor is allocated at full size, but the dequantize CUDA kernel processes only a truncated number of elements. This results in the unfilled portion of the output tensor retaining whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. The vulnerability has a CVSS score of 5.3 and is classified as MEDIUM severity. It was published on June 22, 2026, and modified on June 24, 2026.

Vendor: vllm-project
Product: vllm
CVSS: MEDIUM 5.3
CISA KEV: Not listed in stored evidence
Original CVE published: 2026-06-22
Original CVE updated: 2026-06-24
Advisory published: 2026-06-22
Advisory updated: 2026-06-24

Who should care

Organizations using vLLM for inference and serving large language models should be aware of this vulnerability. Specifically, those with multi-tenant inference deployments may be at risk of information disclosure. Users of affected versions (from 0.5.5 to 0.23.1rc0) should take action to mitigate this vulnerability.

Technical summary

The vulnerability is caused by integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu). The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. This results in the unfilled portion of the output tensor retaining whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. The vulnerability is fixed in version 0.23.1rc0.

Defensive priority

Medium priority should be given to addressing this vulnerability, especially in multi-tenant inference deployments. Organizations should assess their exposure and apply the fix from version 0.23.1rc0 as soon as possible.

Recommended defensive actions

Assess exposure to this vulnerability in multi-tenant inference deployments.
Apply the fix from version 0.23.1rc0.
Review and update inventory of affected vLLM versions.
Monitor for potential information disclosure incidents.
Implement compensating controls to detect and prevent exploitation.

Evidence notes

The vulnerability was discovered and reported through the CVE program. The CVE record and NVD detail provide official information about the vulnerability. Additional details are available from the source item URL and mitigation or vendor references.

Official resources

CVE-2026-53923 CVE record
CVE.org
CVE-2026-53923 NVD detail
NVD
Source item URL
nvd_modified
Mitigation or vendor reference
[email protected] - Patch
Source reference
[email protected] - Issue Tracking
Mitigation or vendor reference
[email protected] - Third Party Advisory

This article is AI-assisted and based on the supplied source corpus.