Hello Sathyanarayanan On Mon, Aug 04, 2025 at 09:11:27AM -0700, Sathyanarayanan Kuppuswamy wrote: > > On 8/4/25 8:35 AM, Breno Leitao wrote: > > Hello Sathyanarayanan, > > > > On Mon, Aug 04, 2025 at 06:50:30AM -0700, Sathyanarayanan Kuppuswamy wrote: > > > On 8/4/25 2:17 AM, Breno Leitao wrote: > > > > Similarly to pci_dev_aer_stats_incr(), pci_print_aer() may be called > > > > when dev->aer_info is NULL. Add a NULL check before proceeding to avoid > > > > calling aer_ratelimit() with a NULL aer_info pointer, returning 1, which > > > > does not rate limit, given this is fatal. > > > Why not add it to pci_print_aer() ? > > > > > > > This prevents a kernel crash triggered by dereferencing a NULL pointer > > > > in aer_ratelimit(), ensuring safer handling of PCI devices that lack > > > > AER info. This change aligns pci_print_aer() with > > > > pci_dev_aer_stats_incr() > > > > which already performs this NULL check. > > > Is this happening during the kernel boot ? What is the frequency and steps > > > to reproduce? I am curious about why pci_print_aer() is called for a PCI > > > device > > > without aer_info. Not aer_info means, that particular device is already > > > released > > > or in the process of release (pci_release_dev()). Is this triggered by > > > using a stale > > > pci_dev pointer? > > I've reported some of these investigations in here: > > > > https://lore.kernel.org/all/buduna6darbvwfg3aogl5kimyxkggu3n4romnmq6sozut6axeu@clnx7sfsy457/ > > It has some details. But you did not mention details like your environment, > steps to > reproduce and how often you see it. I just want to understand in what scenario > pci_print_aer() is triggered, when releasing the device. I am wondering > whether we > are missing proper locking some where.
Oh, unfortunately I don't have these details. I have a bunch of machine in "prod" running 6.16, and they crash from time to time, and then I have the crashdumps. I can get anything that crashdump provices, but, I don't have a reproducer or the exacty steps that are triggering it. If I can get this information from a crashdump, I am more than happy to investigate. Can we get these information from crashdump? Thanks, --breno