On Thu, Aug 15, 2019 at 08:18:06AM +0000, Schmid, Carsten wrote: >>>When a resource is freed and has children, the childrens are >> >> s/childrens/children/ >> >oh, missed that. Too many children ... ;-) > >>>+ __release_child_resources(tmp, warn); >> >> This function will release all the children. >> >> Is this what Linus suggest? >> >> From his code snippet, I just see siblings parent is set to NULL. I may miss >> some point? >> >At the point we are here, there should be no children, and children of >children at all ... >So they are all more or less lost in the wild. >That was why i didn't copy Linus' code 1:1 but reused an already existing >function doing similar thing. >It's anyway worth of thinking about this. > >What i have in mind here (example): >Parent: iomem map 0x1000..0x1FFF > Child1: iomem map 0x1000..0x17FF > Child11: iomem map 0x1000..0x13FF > Child12: iomem map 0x1400..0x17FF > Child2: iomem map 0x1800..0x1FFF > Child21: iomem map 0x1800..0x1BFF > Child22: iomem map 0x1C00..0x1FFF > >When releasing the parent, how can children 11, 12, 21 and 22 still be valid? >They don't know about their grandfather died ... >Looking at the __release_child_resources, i exactly found that all children are >invalidated/released in the way Linus did for the parent's children list. >Doesn't it make sense to do the same for all? > >Please comment. > >> >+static void check_children(struct resource *parent) >> >+{ >> >+ if (parent->child) { >> >+ /* warn and release all children */ >> >+ WARN_ONCE(1, "%s: %s has child %s, release all children\n", >> >+ __func__, parent->name, parent->child- >> >name); >> >+ write_lock(&resource_lock); >> >> In previous version, lock is grasped before parent->child is checked. >> >> Not sure why you change the order? >> >To hold the lock as short as possible. >But yes, you are right, this could lead to problems if releasing of the >children is done in a parallel thread on a multicore ... >I'll change that to cover the whole resource access within the lock. >Not a big thing ... >
My gut feeling is this is the problem from mal-functional driver, e.g. xhci-hcd. We do our best to protect core kernel from it instead of do the cleanup for it. So my suggestion is to look into why xhci-hcd behave like this and fix that. >Best regards >Carsten -- Wei Yang Help you, Help me