On 09/08/2020 17:12, Nikolay Aleksandrov wrote:
> On 09/08/2020 16:49, Hillf Danton wrote:
>>
>> On Fri, 7 Aug 2020 08:03:32 -0700 Stephen Hemminger wrote:
>>> On Fri, 7 Aug 2020 10:03:59 +0200
>>> Rasmus Villemoes <rasmus.villem...@prevas.dk> wrote:
>>>
>>>> On 07/08/2020 05.39, Stephen Hemminger wrote:
>>>>> On Thu, 6 Aug 2020 12:46:43 +0300
>>>>> Nikolay Aleksandrov <niko...@cumulusnetworks.com> wrote:
>>>>>   
>>>>>> On 06/08/2020 12:17, Rasmus Villemoes wrote:  
>>>>>>> On 06/08/2020 01.34, Stephen Hemminger wrote:    
>>>>>>>> On Wed, 5 Aug 2020 16:25:23 +0200  
>>>>
>>>>>>
>>>>>> Hi Rasmus,
>>>>>> I haven't tested anything but git history (and some grepping) points to 
>>>>>> deadlocks when
>>>>>> sysfs entries are being changed under rtnl.
>>>>>> For example check: af38f2989572704a846a5577b5ab3b1e2885cbfb and 
>>>>>> 336ca57c3b4e2b58ea3273e6d978ab3dfa387b4c
>>>>>> This is a common usage pattern throughout net/, the bridge is not the 
>>>>>> only case and there are more
>>>>>> commits which talk about deadlocks.
>>>>>> Again I haven't verified anything but it seems on device delete (w/ rtnl 
>>>>>> held) -> sysfs delete
>>>>>> would wait for current readers, but current readers might be stuck 
>>>>>> waiting on rtnl and we can deadlock.
>>>>>>  
>>>>>
>>>>> I was referring to AB BA lock inversion problems.  
>>>>
>>>> Ah, so lock inversion, not priority inversion.
>>
>> Hi folks,
>>
>> Is it likely that kworker helps work around that deadlock, by
>> acquiring the rtnl lock in the case that the current fails to
>> trylock it?
>>
>> Hillf
> 
> You know it's a user writing to a file expecting config change, right?
> There are numerous problems with deferring it (e.g. error handling).
> 
> Thanks,
>  Nik

OK, admittedly spoke too soon about the error handling. :) 
But I still think it suffers the same problem if the sysfs files are going to 
be destroyed
under rtnl while you're writing in one. Their users are "drained", so it will 
again wait forever.
Because neither rtnl will be released, nor the writer will finish.
And it may become even more interesting if we're trying to remove the bridge 
module at that time.



Reply via email to