PatchSiren

PatchSiren cyber security CVE debrief

CVE-2026-45973 Linux CVE debrief

A race condition in the Linux kernel's RDMA/mlx5 driver can cause indefinite hangs during device unload when firmware reset occurs in LAG (Link Aggregation Group) mode. The vulnerability stems from improper error event propagation: in LAG mode, the bond device is only registered on the master, so slave devices never receive sys_error events. During firmware reset, this causes UMR (User Memory Region) completion waits to hang forever—the slave device is dead, but the master hasn't entered error state yet, so UMR posts succeed but completions never arrive. The fix adds a sys_error notifier that registers before MLX5_IB_STAGE_IB_REG and remains active until after ib_unregister_device(), ensuring error events reach the bond device throughout teardown.

Vendor
Linux
Product
Unknown
CVSS
Unknown
CISA KEV
Not listed in stored evidence
Original CVE published
2026-05-27
Original CVE updated
2026-05-27
Advisory published
2026-05-27
Advisory updated
2026-05-27

Who should care

Organizations running Linux systems with Mellanox ConnectX adapters configured in LAG mode, particularly those performing firmware updates or reset operations. Cloud providers and HPC environments using RDMA over Converged Ethernet (RoCE) with bonded interfaces are most affected.

Technical summary

The vulnerability exists in the RDMA/mlx5 driver's handling of LAG (Link Aggregation Group) mode during firmware reset. In LAG configurations, the bond device registration is asymmetric—only the master registers the bond device, leaving slaves without direct sys_error event reception. When firmware reset triggers, a race condition emerges: the slave device becomes unresponsive, but the master has not yet transitioned to error state. This causes UMR (User Memory Region) operations posted to the slave to succeed at submission but never complete, resulting in indefinite hangs in __mlx5_ib_dereg_mr during device unload. The fix implements a sys_error notifier with extended lifetime spanning from before MLX5_IB_STAGE_IB_REG through ib_unregister_device() completion, ensuring continuous error event delivery to the bond device during the entire teardown sequence.

Defensive priority

medium

Recommended defensive actions

  • Apply the relevant stable kernel patch (commits 613f5d4139b6, 6d838873da9c, c8fb5c965ac7, or ebc2164a4cd4) based on your kernel version
  • For systems using Mellanox ConnectX adapters in LAG mode, prioritize kernel updates to prevent potential hangs during firmware reset scenarios
  • Monitor for kernel hang stack traces containing __mlx5_ib_dereg_mr or schedule_preempt_disabled in mlx5_ib contexts
  • If running affected kernels with mlx5 hardware in LAG mode, consider disabling LAG temporarily if firmware reset operations are frequent until patches can be applied

Evidence notes

The CVE description includes a complete kernel call trace showing the hang occurring in __mlx5_ib_dereg_mr during device teardown. Four stable kernel commits are referenced as fixes. The vulnerability is specific to Mellanox mlx5 hardware operating in LAG mode with firmware reset conditions.

Official resources

The vulnerability was disclosed via the Linux kernel stable tree with patches published on 2026-05-27. The issue was resolved in the kernel source before public CVE assignment.