On Mon, Feb 12, 2018 at 11:19 AM, Sam Bobroff <sam.bobr...@au1.ibm.com> wrote: > Currently if the kernel receives a memory hot-unplug event early > enough, it may get stuck in an infinite loop in > dissolve_free_huge_pages(). This appears as a stall just after: > > pseries-hotplug-mem: Attempting to hot-remove XX LMB(s) at YYYYYYYY > > It appears to be caused by "minimum_order" being uninitialized, due to > init_ras_IRQ() executing before hugetlb_init(). > > To correct this, extract the part of init_ras_IRQ() that enables > hotplug event processing and place it in the machine_late_initcall > phase, which is guaranteed to be after hugetlb_init() is called. > > Signed-off-by: Sam Bobroff <sam.bobr...@au1.ibm.com> > --- > arch/powerpc/platforms/pseries/ras.c | 29 +++++++++++++++++++++-------- > 1 file changed, 21 insertions(+), 8 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/ras.c > b/arch/powerpc/platforms/pseries/ras.c > index 81d8614e7379..ba284949af06 100644 > --- a/arch/powerpc/platforms/pseries/ras.c > +++ b/arch/powerpc/platforms/pseries/ras.c > @@ -66,6 +66,26 @@ static int __init init_ras_IRQ(void) > of_node_put(np); > } > > + /* EPOW Events */ > + np = of_find_node_by_path("/event-sources/epow-events"); > + if (np != NULL) { > + request_event_sources_irqs(np, ras_epow_interrupt, > "RAS_EPOW"); > + of_node_put(np); > + } > + > + return 0; > +} > +machine_subsys_initcall(pseries, init_ras_IRQ); > + > +/* > + * Enable the hotplug interrupt late because processing them may touch other > + * devices or systems (e.g. hugepages) that have not been initialized at the > + * subsys stage. > + */ > +int __init init_ras_hotplug_IRQ(void) > +{ > + struct device_node *np; > + > /* Hotplug Events */ > np = of_find_node_by_path("/event-sources/hot-plug-events"); > if (np != NULL) { > @@ -75,16 +95,9 @@ static int __init init_ras_IRQ(void) > of_node_put(np); > } > > - /* EPOW Events */ > - np = of_find_node_by_path("/event-sources/epow-events"); > - if (np != NULL) { > - request_event_sources_irqs(np, ras_epow_interrupt, > "RAS_EPOW"); > - of_node_put(np); > - } > - > return 0; > } > -machine_subsys_initcall(pseries, init_ras_IRQ); > +machine_late_initcall(pseries, init_ras_hotplug_IRQ); >
Seems reasonable to me, the other RAS events internal error and epow seem like they are in the right place. Acked-by: Balbir Singh <bsinghar...@gmail.com>