On Thu, Aug 15, 2019 at 08:18:06AM +0000, Schmid, Carsten wrote:
>>>When a resource is freed and has children, the childrens are
>> 
>> s/childrens/children/
>>
>oh, missed that. Too many children ... ;-)
> 
>>>+            __release_child_resources(tmp, warn);
>> 
>> This function will release all the children.
>> 
>> Is this what Linus suggest?
>> 
>> From his code snippet, I just see siblings parent is set to NULL. I may miss
>> some point?
>>
>At the point we are here, there should be no children, and children of
>children at all ...
>So they are all more or less lost in the wild.
>That was why i didn't copy Linus' code 1:1 but reused an already existing
>function doing similar thing.
>It's anyway worth of thinking about this.
>
>What i have in mind here (example):
>Parent: iomem map 0x1000..0x1FFF
>  Child1: iomem map 0x1000..0x17FF
>    Child11: iomem map 0x1000..0x13FF
>    Child12: iomem map 0x1400..0x17FF
>  Child2: iomem map 0x1800..0x1FFF
>    Child21: iomem map 0x1800..0x1BFF
>    Child22: iomem map 0x1C00..0x1FFF
>
>When releasing the parent, how can children 11, 12, 21 and 22 still be valid?
>They don't know about their grandfather died ...
>Looking at the __release_child_resources, i exactly found that all children are
>invalidated/released in the way Linus did for the parent's children list.
>Doesn't it make sense to do the same for all?
>
>Please comment.
>
>> >+static void check_children(struct resource *parent)
>> >+{
>> >+   if (parent->child) {
>> >+           /* warn and release all children */
>> >+           WARN_ONCE(1, "%s: %s has child %s, release all children\n",
>> >+                           __func__, parent->name, parent->child-
>> >name);
>> >+           write_lock(&resource_lock);
>> 
>> In previous version, lock is grasped before parent->child is checked.
>> 
>> Not sure why you change the order?
>> 
>To hold the lock as short as possible.
>But yes, you are right, this could lead to problems if releasing of the
>children is done in a parallel thread on a multicore ...
>I'll change that to cover the whole resource access within the lock.
>Not a big thing ...
>

My gut feeling is this is the problem from mal-functional driver, e.g.
xhci-hcd. We do our best to protect core kernel from it instead of do the
cleanup for it.

So my suggestion is to look into why xhci-hcd behave like this and fix that.

>Best regards
>Carsten

-- 
Wei Yang
Help you, Help me

Reply via email to