Tyrel Datwyler <tyr...@linux.vnet.ibm.com> writes: > On 07/29/2018 06:11 AM, Michael Bringmann wrote: >> During LPAR migration, the content of the device tree/sysfs may >> be updated including deletion and replacement of nodes in the >> tree. When nodes are added to the internal node structures, they >> are appended in FIFO order to a list of nodes maintained by the >> OF code APIs. When nodes are removed from the device tree, they >> are marked OF_DETACHED, but not actually deleted from the system >> to allow for pointers cached elsewhere in the kernel. The order >> and content of the entries in the list of nodes is not altered, >> though. >> >> During LPAR migration some common nodes are deleted and re-added >> e.g. "ibm,platform-facilities". If a node is re-added to the OF >> node lists, the of_attach_node function checks to make sure that >> the name + ibm,phandle of the to-be-added data is unique. As the >> previous copy of a re-added node is not modified beyond the addition >> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING >> notice to the console, (3) renames the to-be-added node to avoid >> filename collisions within a directory, and (3) adds entries to >> the sysfs/kernfs. > > So, this patch actually just band aids over the real problem. This is > a long standing problem with several PFO drivers leaking references. > The issue here is that, during the device tree update that follows a > migration. the update of the ibm,platform-facilities node and friends > below are always deleted and re-added on the destination lpar and > subsequently the leaked references prevent the devices nodes from > every actually being properly cleaned up after detach. Thus, leading > to the issue you are observing.
Leaking references shouldn't affect the node being detached from the tree though. See of_detach_node() calling __of_detach_node(), none of that depends on the refcount. It's only the actual freeing of the node, in of_node_release() that is prevented by leaked reference counts. So I agree we need to do a better job with the reference counting, but I don't see how it is causing the problem here. cheers