On 02/20/2017 07:02 PM, Michael Ellerman wrote: > Nathan Fontenot <nf...@linux.vnet.ibm.com> writes: > >> On 02/15/2017 10:34 PM, Michael Ellerman wrote: >>> Nathan Fontenot <nf...@linux.vnet.ibm.com> writes: >>> >>>> Revert the patch patch to auto-online hotplugged memory, commit >>>> id ec999072442a. Using the auto-online acpability does online added >>>> memory but does not update the associated device struct to >>>> indicate that the memory is online. The result of this is that >>>> memoryXX/online file in sysfs still reports the memory as being offline. >>> >>> Isn't that just a bug in the auto-online code? >> >> After digging through the code some more and reading some of the email >> chain when the auto-online feature was submitted I can't decide if this >> is a bug or if this is by design. The fact that they only other users >> of this appear to be balloon drivers (hv and xen) makes me think this >> may be by design. > > Have we asked the original authors? I don't see them on Cc?
Not yet. I'm working on a patch to use device_online() for doing auto online of memory. I will send this out as an RFC with an explanation of what I'm seeing and ask why it was done the way it was. > >> Changing the auto-online capability to call device_offline() instead >> would appear to also require changes to the hv and xen balloon >> drivers for the new behavior. > > OK, if that's the case then that's going to make life tricky. Yep. I'll ask about this with the RFC I send out. -Nathan > >>> If I'm reading it right it's calling online_memory_block(). If that >>> doesn't cause the memory_block to be online that would puzzle me. >> >> The memory is online and usuable when the dlpar operation completes. I >> was mistaken in my original note though, the state file in sysfs does report >> the memory as being online. The underlying issue is that the device struct >> does not get updated (dev->offline) when using the auto-online capability. >> The result is that trying to remove a LMB a second time fails when we call >> device_offline() which checks the dev->offline flag and returns failure. > > That still sounds like a bug to me. We asked the core to "auto online" > the added memory, but the dev is still offline? But maybe there's some > subtlety. > >> I think reverting the patch to use the auto-online capability may be the >> way to go. This would restore the code so that we call device_online and >> device_offline for add and remove respectively, and not rely on what the >> auto-online code is doing. >> >> Thoughts? > > It's not great, but given we need to backport it to v4.8, yeah I think > we'll have to go with a revert. > > But we should also pursue fixing the auto online logic. > >>> Also commit ec999072442a went into v4.8, so is memory hotplug broken >>> since then? If so we need to backport this or whatever fix we come up. >> >> Yes, we need to backport whatever fix we do. > > Right. I'll queue it up. > > cheers >