PatchSiren cyber security CVE debrief
CVE-2026-45973 Linux CVE debrief
A race condition in the Linux kernel's RDMA/mlx5 driver can cause indefinite hangs during device unload when firmware reset occurs in LAG (Link Aggregation Group) mode. The vulnerability stems from improper error event propagation: in LAG mode, the bond device is only registered on the master, so slave devices never receive sys_error events. During firmware reset, this causes UMR (User Memory Region) completion waits to hang forever—the slave device is dead, but the master hasn't entered error state yet, so UMR posts succeed but completions never arrive. The fix adds a sys_error notifier that registers before MLX5_IB_STAGE_IB_REG and remains active until after ib_unregister_device(), ensuring error events reach the bond device throughout teardown.
- Vendor
- Linux
- Product
- Unknown
- CVSS
- Unknown
- CISA KEV
- Not listed in stored evidence
- Original CVE published
- 2026-05-27
- Original CVE updated
- 2026-05-27
- Advisory published
- 2026-05-27
- Advisory updated
- 2026-05-27
Who should care
Organizations running Linux systems with Mellanox ConnectX adapters configured in LAG mode, particularly those performing firmware updates or reset operations. Cloud providers and HPC environments using RDMA over Converged Ethernet (RoCE) with bonded interfaces are most affected.
Technical summary
The vulnerability exists in the RDMA/mlx5 driver's handling of LAG (Link Aggregation Group) mode during firmware reset. In LAG configurations, the bond device registration is asymmetric—only the master registers the bond device, leaving slaves without direct sys_error event reception. When firmware reset triggers, a race condition emerges: the slave device becomes unresponsive, but the master has not yet transitioned to error state. This causes UMR (User Memory Region) operations posted to the slave to succeed at submission but never complete, resulting in indefinite hangs in __mlx5_ib_dereg_mr during device unload. The fix implements a sys_error notifier with extended lifetime spanning from before MLX5_IB_STAGE_IB_REG through ib_unregister_device() completion, ensuring continuous error event delivery to the bond device during the entire teardown sequence.
Defensive priority
medium
Recommended defensive actions
- Apply the relevant stable kernel patch (commits 613f5d4139b6, 6d838873da9c, c8fb5c965ac7, or ebc2164a4cd4) based on your kernel version
- For systems using Mellanox ConnectX adapters in LAG mode, prioritize kernel updates to prevent potential hangs during firmware reset scenarios
- Monitor for kernel hang stack traces containing __mlx5_ib_dereg_mr or schedule_preempt_disabled in mlx5_ib contexts
- If running affected kernels with mlx5 hardware in LAG mode, consider disabling LAG temporarily if firmware reset operations are frequent until patches can be applied
Evidence notes
The CVE description includes a complete kernel call trace showing the hang occurring in __mlx5_ib_dereg_mr during device teardown. Four stable kernel commits are referenced as fixes. The vulnerability is specific to Mellanox mlx5 hardware operating in LAG mode with firmware reset conditions.
Official resources
-
CVE-2026-45973 CVE record
CVE.org
-
CVE-2026-45973 NVD detail
NVD
-
Source item URL
nvd_modified
-
Source reference
416baaa9-dc9f-4396-8d5f-8c081fb06d67
-
Source reference
416baaa9-dc9f-4396-8d5f-8c081fb06d67
-
Source reference
416baaa9-dc9f-4396-8d5f-8c081fb06d67
-
Source reference
416baaa9-dc9f-4396-8d5f-8c081fb06d67
The vulnerability was disclosed via the Linux kernel stable tree with patches published on 2026-05-27. The issue was resolved in the kernel source before public CVE assignment.