On 09/01/2021 15:41, Julien Grall wrote:
Hi Jan,
On 05/01/2021 13:09, Jan Beulich wrote:
The local domain's lock is needed for the port allocation, but for the
remote side the per-channel lock is sufficient. The per-channel locks
then need to be acquired slightly earlier, though.
I was expecting is little bit more information in the commit message
because there are a few changes in behavior with this change:
1) AFAICT, evtchn_allocate_port() rely on rchn->state to be protected
by the rd->event_lock. Now that you dropped the rd->event_lock,
rchn->state may be accessed while it is updated in
evtchn_bind_interdomain(). The port cannot go back to ECS_FREE here, but
I think the access needs to be switched to {read, write}_atomic() or
ACCESS_ONCE.
2) xsm_evtchn_interdomain() is now going to be called without the
rd->event_lock. Can you confirm that the lock is not needed by XSM?
Actually, I think there is a bigger issue. evtchn_close() will check
chn1->state with just d1->event_lock held (IOW, there chn1->lock is not
held).
If the remote domain happen to close the unbound port at the same time
the local domain bound it, then you may end up in the following situation:
evtchn_bind_interdomain() | evtchn_close()
|
| switch ( chn1->state )
| case ECS_UNBOUND:
| /* nothing to do */
double_evtchn_lock() |
rchn->state = ECS_INTERDOMAIN |
double_evtchn_unlock() |
| evtchn_write_lock(chn1)
| evtchn_free(d1, chn1)
| evtchn_write_unlock(chn1)
When the local domain will try to close the port, it will hit the
BUG_ON(chn2->state != ECS_INTERDOMAIN) because the remote port were
already freed.
I think this can be solved by acquiring the event lock earlier on in
evtchn_close(). Although, this may become a can of worms as it would be
more complex to prevent lock inversion because chn1->lock and chn2->lock.
Cheers,
--
Julien Grall