On 04.12.2020 12:51, Julien Grall wrote:
> 
> 
> On 04/12/2020 11:48, Jan Beulich wrote:
>> On 04.12.2020 12:28, Julien Grall wrote:
>>> Hi Jan,
>>>
>>> On 03/12/2020 10:09, Jan Beulich wrote:
>>>> On 02.12.2020 22:10, Julien Grall wrote:
>>>>> On 23/11/2020 13:30, Jan Beulich wrote:
>>>>>> While there don't look to be any problems with this right now, the lock
>>>>>> order implications from holding the lock can be very difficult to follow
>>>>>> (and may be easy to violate unknowingly). The present callbacks don't
>>>>>> (and no such callback should) have any need for the lock to be held.
>>>>>>
>>>>>> However, vm_event_disable() frees the structures used by respective
>>>>>> callbacks and isn't otherwise synchronized with invocations of these
>>>>>> callbacks, so maintain a count of in-progress calls, for evtchn_close()
>>>>>> to wait to drop to zero before freeing the port (and dropping the lock).
>>>>>
>>>>> AFAICT, this callback is not the only place where the synchronization is
>>>>> missing in the VM event code.
>>>>>
>>>>> For instance, vm_event_put_request() can also race against
>>>>> vm_event_disable().
>>>>>
>>>>> So shouldn't we handle this issue properly in VM event?
>>>>
>>>> I suppose that's a question to the VM event folks rather than me?
>>>
>>> Yes. From my understanding of Tamas's e-mail, they are relying on the
>>> monitoring software to do the right thing.
>>>
>>> I will refrain to comment on this approach. However, given the race is
>>> much wider than the event channel, I would recommend to not add more
>>> code in the event channel to deal with such problem.
>>>
>>> Instead, this should be fixed in the VM event code when someone has time
>>> to harden the subsystem.
>>
>> Are effectively saying I should now undo the addition of the
>> refcounting, which was added in response to feedback from you?
> 
> Please point out where I made the request to use the refcounting...

You didn't ask for this directly, sure, but ...

> I pointed out there was an issue with the VM event code.

... this has ultimately led to the decision to use refcounting
(iirc there was one alternative that I had proposed, besides
the option of doing nothing).

> This was latter 
> analysed as a wider issue. The VM event folks doesn't seem to be very 
> concerned on the race, so I don't see the reason to try to fix it in the 
> event channel code.

And you won't need the refcount for vpl011 then? I can certainly
drop it again, but it feels odd to go back to an earlier version
under the circumstances ...

Jan

Reply via email to