On 04/11/2017 02:00 AM, Michael Ellerman wrote: > Tyrel Datwyler <tyr...@linux.vnet.ibm.com> writes: > >> On 04/06/2017 09:04 PM, Michael Ellerman wrote: >>> Tyrel Datwyler <tyr...@linux.vnet.ibm.com> writes: >>> >>>> On 04/06/2017 03:27 AM, Sachin Sant wrote: >>>>> On a POWER8 LPAR running 4.11.0-rc5, a hot unplug operation on >>>>> any I/O adapter results in the following warning >>>>> >>>>> This problem has been in the code for some time now. I had first seen >>>>> this in >>>>> -next tree. >>>>> >> >> <snip> >> >>>>> Have attached the dmesg log from the system. Let me know if any additional >>>>> information is required to help debug this problem. >>>> >>>> I remember you mentioning this when the issue was brought up for CPUs. I >>>> assume the case is the same here where the issue is only seen with >>>> adapters that were hot-added after boot (ie. hot-remove of adapter >>>> present at boot doesn't trip the warning)? >>> >>> So who's fixing this? >> >> I started looking at it when Bharata submitted a patch trying to fix the >> issue for CPUs, but got side tracked by other things. I suspect that >> this underflow has actually been an issue for quite some time, and we >> are just now becoming aware of it thanks to the recount_t patchset being >> merged. > > Yes I agree. Which means it might be broken in existing distros.
Definitely. I did some profiling last night, and I understand the hotplug case. It turns out to be as I suggested in the original thread about CPUs. When the devicetree code was worked to move the tree out of proc and into sysfs the sysfs detach code added a of_node_put to remove the original of_init reference. pSeries Being the sole original *dynamic* device tree user we had always issued a of_node_put in our dlpar specific detach function to achieve that end. So, this should be a pretty straight forward trivial fix. However, for the case where devices are present at boot it appears we a leaking a lot of references resulting in the device nodes never actually being released/freed after a dlpar remove. In the CPU case after boot I count 8 more references taken than the hotplug case, and corresponding of_node_put's are not called at dlpar remove time either. That will take some time to track them down, review and clean up. -Tyrel > >> I'll look into it again this week. > > Thanks. > > cheers >