Hi, On Tue, Feb 18, 2025 at 09:56:53AM -0500, Noah Meyerhans wrote: > On Tue, Feb 18, 2025 at 03:11:08PM +0100, Salvatore Bonaccorso wrote: > > > Microsoft has observed that the 5.10.y kernels in bullseye are susceptible > > > to crashes due to race conditions in the NVME/PCI subsystem. See below > > > for > > > a representative kernel log. The problem appears most frequently in > > > larger > > > systems, e.g. with 4 or more NVME devices and >= 64 CPUs, but it could > > > potentially occur on smaller systems as well. > > > > > > The issue was fixed with the 5.14 kernel upstream in e4b9852a0 ("nvme-pci: > > > fix multiple races in nvme_setup_io_queues"), so this only impacts > > > oldstable. I have provided a backport of this commit upstream in > > > https://lore.kernel.org/stable/E1tj8vO-00471h-2H@lore/ > > > > > > I'm requesting that this commit be included in a bullseye kernel update. > > > > AFAICS, this backport has not been accepted back then for 5.10.y. Can > > you re-ping upstream to make sure it get included in the 5.10.y > > series? Once this has happened as we follow the 5.10.y series it will > > be included (or can be included in advance once it has been queued). > > Yes, I forgot to reset the date on the commit that I sent upstream, > which is why it looks like it's been around since 2021. I requested > that upstream apply the fix to 5.10.y last week, and will ping them in > another week or two if it hasn't been acknowledged either way...
Noah, thanks! Regards, Salvatore