On 09/19/2018 11:38 PM, Michael Ellerman wrote: > Nathan Fontenot <nf...@linux.vnet.ibm.com> writes: > >> When removing memory we need to remove the memory from the node >> it was added to instead of looking up the node it should be in >> in the device tree. >> >> During testing we have seen scenarios where the affinity for a >> LMB changes due to a partition migration or PRRN event. In these >> cases the node the LMB exists in may not match the node the device >> tree indicates it belongs in. This can lead to a system crash >> when trying to DLAPR remove the LMB after a migration or PRRN >> event. The current code looks up the node in the device tree to >> remove the LMB from, the crash occurs when we try to offline this >> node and it does not have any data, i.e. node_data[nid] == NULL. > > This isn't building for 32-bit etc: > > arch/powerpc/mm/drmem.c: In function 'init_drmem_v1_lmbs': > arch/powerpc/mm/drmem.c:371:14: error: implicit declaration of function > 'memory_add_physaddr_to_nid' [-Werror=implicit-function-declaration] > lmb->nid = memory_add_physaddr_to_nid(lmb->base_addr); > ^ > cc1: all warnings being treated as errors > scripts/Makefile.build:317: recipe for target 'arch/powerpc/mm/drmem.o' failed > > See the failed checks here: > https://patchwork.ozlabs.org/patch/969150/ > > > Probably drmem.c should only be compiled for 64-bit NUMA etc.
Looks like the root cause is that memory hotplug relies on sparsemem which is not supported on 32-bit. This patch is also going to need a refresh to apply cleanly due to other patches that have gone in. I'll re-submit after looking at the build break issues more. -Nathan > > cheers >