"Brian Conway" <bcon...@rcesoftware.com> writes:
> Greetings. I am soliciting feedback on a patch to detect and mitigate > uncontrolled ACPI GPE interrupt storms. > > Rationale: There have been a number of threads in the recent past on bugs@ > and misc@ with acpi0 spinning a CPU at 100% [1][2][3][4]. The immediate cause > is likely a buggy BIOS and its ACPI implementation. However, this type of bug > is not exclusive to no-name hardware from China, nor is it specific to a > particular hardware vendor, BIOS vendor, or GPE pin. Hardware that is or was > affected can include Intel [5], Lenovo [6], HP [7], ASUS [8], and Apple [9]. > > I have been testing with a half-dozen ACPI-equipped systems in various > states: storming and behaving, booting and resuming, 7.2 and -current, SMALL > and not. The attached diff uses a minimum 5-second evaluation window, driven > by the firing of ACPI GPE interrupts (no additional accounting thread, etc). > An uncontrolled GPE storm will be logged as such (real number): > How about zzz and ZZZ cycles? I think you might be using the wrong clock so I hypothesize this breaks. See getuptime(9). > Feb 17 22:57:06 acpitest3 /bsd: uncontrolled GPE storm 7242/s, disabling GPE > 06 > > Alternatively, if this is still too close to papering over the problem, > perhaps a smaller diff that only logs the problem, allowing a user see what > the storm is and report it to their BIOS/hardware vendor? > > Thank you for your time. > > Brian Conway > > [1] https://marc.info/?t=166422981800001 > [2] https://marc.info/?t=166497726600004 > [3] https://marc.info/?t=167356490500003 > [4] https://marc.info/?t=167614389600001 > [5] > https://community.intel.com/t5/Intel-NUCs/APCI-GPE-0x6F-Interrupt-Storm-under-OpenBSD/m-p/1426755 > [6] > https://forums.lenovo.com/t5/ThinkPad-T400-T500-and-newer-T/T480s-ACPI-bug/m-p/4057604 > [7] > https://h30434.www3.hp.com/t5/Gaming-Notebooks/High-CPU-Usage-System-ACPI-sys-GPE-L6F-Storm-Omen-15-17/td-p/7169255 > [8] > https://answers.microsoft.com/en-us/windows/forum/all/stopping-a-gpe-event-acpi-system-interrupts/cec51e6c-1ed4-4369-9e6f-108c4d6333a6 > [9] https://bugzilla.kernel.org/show_bug.cgi?id=117481 > > diff --git sys/dev/acpi/acpi.c sys/dev/acpi/acpi.c > index 853bad1ab..26a5c1702 100644 > --- sys/dev/acpi/acpi.c > +++ sys/dev/acpi/acpi.c > @@ -52,6 +52,9 @@ > #define APMDEV_NORMAL 0 > #define APMDEV_CTL 8 > > +#define GPE_RATE_MIN_CYCLE 5 /* seconds */ > +#define GPE_RATE_MAX 2000 /* per second */ > + > #include "wd.h" > > #ifdef ACPI_DEBUG > @@ -98,6 +101,8 @@ void acpi_disable_allgpes(struct acpi_softc *); > struct gpe_block *acpi_find_gpe(struct acpi_softc *, int); > void acpi_enable_onegpe(struct acpi_softc *, int); > int acpi_gpe(struct acpi_softc *, int, void *); > +void acpi_init_gpe_rate(struct acpi_softc *, int); > +int acpi_gpe_rate(struct acpi_softc *, int); > > void acpi_enable_rungpes(struct acpi_softc *); > > @@ -2229,6 +2234,7 @@ acpi_enable_onegpe(struct acpi_softc *sc, int gpe) > dnprintf(50, "enabling GPE %.2x (current: %sabled) %.2x\n", > gpe, (en & mask) ? "en" : "dis", en); > acpi_write_pmreg(sc, ACPIREG_GPE_EN, gpe>>3, en | mask); > + acpi_init_gpe_rate(sc, gpe); > } > > /* Clear all GPEs */ > @@ -2307,7 +2313,40 @@ acpi_gpe(struct acpi_softc *sc, int gpe, void *arg) > if (sc->gpe_table[gpe].flags & GPE_LEVEL) > acpi_write_pmreg(sc, ACPIREG_GPE_STS, gpe>>3, mask); > en = acpi_read_pmreg(sc, ACPIREG_GPE_EN, gpe>>3); > - acpi_write_pmreg(sc, ACPIREG_GPE_EN, gpe>>3, en | mask); > + /* Re-enable if GPE rate passes, otherwise leave disabled */ > + if (!acpi_gpe_rate(sc, gpe)) > + acpi_write_pmreg(sc, ACPIREG_GPE_EN, gpe>>3, en | mask); > + return (0); > +} > + > +void > +acpi_init_gpe_rate(struct acpi_softc *sc, int gpe) > +{ > + sc->gpe_table[gpe].rate_start = getuptime(); > + sc->gpe_table[gpe].rate_count = 0; > +} > + > +int > +acpi_gpe_rate(struct acpi_softc *sc, int gpe) > +{ > + struct gpe_block *pgpe = &sc->gpe_table[gpe]; > + time_t cycle; > + > + pgpe->rate_count++; > + dnprintf(10, "rate GPE %.2x start %lld elapsed %lld count %zu\n", gpe, > + pgpe->rate_start, getuptime() - pgpe->rate_start, pgpe->rate_count); > + > + cycle = getuptime() - pgpe->rate_start; > + if (cycle >= GPE_RATE_MIN_CYCLE) { > + if (pgpe->rate_count > (GPE_RATE_MAX * cycle)) { > + printf("uncontrolled GPE storm %lld/s, disabling GPE > %.2x\n", > + pgpe->rate_count / cycle, gpe); > + return (1); > + } > + > + /* Reset and start a new cycle */ > + acpi_init_gpe_rate(sc, gpe); > + } > return (0); > } > > diff --git sys/dev/acpi/acpivar.h sys/dev/acpi/acpivar.h > index a9b4a2ae9..4e2f47053 100644 > --- sys/dev/acpi/acpivar.h > +++ sys/dev/acpi/acpivar.h > @@ -185,6 +185,9 @@ struct gpe_block { > void *arg; > int active; > int flags; > + > + time_t rate_start; > + size_t rate_count; > }; > > struct acpi_devlist {