On Tue, 20 May 2025, Bjorn Helgaas wrote: > On Tue, May 20, 2025 at 02:55:32PM +0300, Ilpo Järvinen wrote: > > On Mon, 19 May 2025, Bjorn Helgaas wrote: > > > > > From: Jon Pan-Doh <pan...@google.com> > > > > > > Spammy devices can flood kernel logs with AER errors and slow/stall > > > execution. Add per-device ratelimits for AER correctable and uncorrectable > > > errors that use the kernel defaults (10 per 5s). > > > > > > There are two AER logging entry points: > > > > > > - aer_print_error() is used by DPC and native AER > > > > > > - pci_print_aer() is used by GHES and CXL > > > > > > The native AER aer_print_error() case includes a loop that may log details > > > from multiple devices. This is ratelimited by the union of ratelimits for > > > these devices, set by add_error_device(), which collects the devices. If > > > no such device is found, the Error Source message is ratelimited by the > > > Root Port or RCEC that received the ERR_* message. > > > > > > The DPC aer_print_error() case is currently not ratelimited. > > > > > > The GHES and CXL pci_print_aer() cases are ratelimited by the Error Source > > > device. > > > > static int add_error_device(struct aer_err_info *e_info, struct pci_dev > > > *dev) > > > { > > > + /* > > > + * Ratelimit AER log messages. Generally we add the Error Source > > > + * device, but there are is_error_source() cases that can result in > > > + * multiple devices being added here, so we OR them all together. > > > > I can see the code uses OR ;-) but I wasn't helpful because this comment > > didn't explain why at all. As this ratelimit thing is using reverse logic > > to begin with, this is a very tricky bit. > > > > Perhaps something less vague like: > > > > ... we ratelimit if all devices have reached their ratelimit. > > > > Assuming that was the intention here? (I'm not sure.) > > My intention was that if there's any downstream device that has an > unmasked error logged and it has not reached its ratelimit, we should > log messages for all devices with errors logged. Does something like > this help? > > /* > * Ratelimit AER log messages. "dev" is either the source > * identified by the root's Error Source ID or it has an unmasked > * error logged in its own AER Capability. If any of these devices > * has not reached its ratelimit, log messages for all of them. > * Messages are emitted when e_info->ratelimit is non-zero. > * > * Note that e_info->ratelimit was already initialized to 1 for the > * ERR_FATAL case. > */
Yes, this is much clearer of intent, thanks. > The ERR_FATAL case is from this post-v6 change that I haven't posted > yet: > > aer_isr_one_error(...) > { > ... > if (status & PCI_ERR_ROOT_UNCOR_RCV) { > int fatal = status & PCI_ERR_ROOT_FATAL_RCV; > struct aer_err_info e_info = { > ... > + .ratelimit = fatal ? 1 : 0; > > > > > + */ > > > if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { > > > e_info->dev[e_info->error_dev_num] = pci_dev_get(dev); > > > + e_info->ratelimit |= aer_ratelimit(dev, e_info->severity); > > > e_info->error_dev_num++; > > > return 0; > > > } > > > > @@ -1147,9 +1183,10 @@ static void aer_recover_work_func(struct > > > work_struct *work) > > > pdev = pci_get_domain_bus_and_slot(entry.domain, entry.bus, > > > entry.devfn); > > > if (!pdev) { > > > - pr_err("no pci_dev for %04x:%02x:%02x.%x\n", > > > - entry.domain, entry.bus, > > > - PCI_SLOT(entry.devfn), PCI_FUNC(entry.devfn)); > > > + pr_err_ratelimited("%04x:%02x:%02x.%x: no pci_dev > > > found\n", > > > > This case was not mentioned in the changelog. > > Sharp eyes! What do you think of this commit log text? > > The CXL pci_print_aer() case is ratelimited by the Error Source device. > > The GHES pci_print_aer() case is via aer_recover_work_func(), which > searches for the Error Source device. If the device is not found, there's > no per-device ratelimit, so we use a system-wide ratelimit that covers all > error types (correctable, non-fatal, and fatal). Works for me as long as it is mentioned. > This isn't really ideal because in pci_print_aer(), the struct > aer_capability_regs has already been filled by firmware and the > logging doesn't read any registers from the device at all. > > However, pci_print_aer() *does* want the pci_dev for statistics and > tracing (pci_dev_aer_stats_incr()) and, of course, for the aer_printks > themselves. While not a perfect solution, this looks yet another case where it would help to create a dummy pci_dev struct with minimal setup which allows calling functions that input a pci_dev. That solution is not perfect because it arms a trap. Downstream functions could get changed and if the developer assumes they have a full pci_dev at hand, it could cause issues with the dummy pci_dev. How likely it happens is debatable but for many cases where the call-chain isn't overly complex such as here, dummy pci_dev seems helpful. > We could leave this pr_err() completely alone; hopefully it's a rare > case. I think the CXL path just silently skips pci_print_aer() if > this happens. > > Eventually I would really like the native AER path to start by doing > whatever firmware is doing, e.g., fill in struct aer_capability_regs, > so the core of the AER handling could be identical between native AER > and GHES/CXL. If we could do that, maybe we could figure out a > cleaner way to handle this corner case. -- i.