On Tue, Feb 7, 2017 at 11:48 PM, Razvan Cojocaru <rcojoc...@bitdefender.com>
wrote:

> On 02/08/2017 01:23 AM, Tamas K Lengyel wrote:
> >
> >
> > On Tue, Feb 7, 2017 at 1:41 PM, Razvan Cojocaru
> > <rcojoc...@bitdefender.com <mailto:rcojoc...@bitdefender.com>> wrote:
> >
> >     On 02/07/2017 10:20 PM, Tamas K Lengyel wrote:
> >     >
> >     >
> >     > On Tue, Feb 7, 2017 at 11:57 AM, Razvan Cojocaru
> >     > <rcojoc...@bitdefender.com <mailto:rcojoc...@bitdefender.com>
> >     <mailto:rcojoc...@bitdefender.com
> >     <mailto:rcojoc...@bitdefender.com>>> wrote:
> >     >
> >     >     On 02/07/2017 08:39 PM, Andrew Cooper wrote:
> >     >     > On 07/02/17 18:31, Razvan Cojocaru wrote:
> >     >     >> On 02/07/2017 08:15 PM, Tamas K Lengyel wrote:
> >     >     >>>
> >     >     >>> On Tue, Feb 7, 2017 at 9:53 AM, Razvan Cojocaru
> >     >     >>> <rcojoc...@bitdefender.com <mailto:rcojocaru@bitdefender.
> com>
> >     <mailto:rcojoc...@bitdefender.com <mailto:rcojoc...@bitdefender.com
> >>
> >     >     <mailto:rcojoc...@bitdefender.com
> >     <mailto:rcojoc...@bitdefender.com>
> >     >     <mailto:rcojoc...@bitdefender.com
> >     <mailto:rcojoc...@bitdefender.com>>>> wrote:
> >     >     >>>
> >     >     >>>     Hello,
> >     >     >>>
> >     >     >>>     Setting, e.g. 16 VCPUs for a HVM guest, ends up
> >     blocking the
> >     >     guest
> >     >     >>>     completely when subscribing to vm_events, apparently
> >     because
> >     >     of this
> >     >     >>>     code in xen/common/vm_event.c:
> >     >     >>>
> >     >     >>>     315     /* Give this vCPU a black eye if necessary, on
> the
> >     >     way out.
> >     >     >>>     316      * See the comments above wake_blocked() for
> more
> >     >     information
> >     >     >>>     317      * on how this mechanism works to avoid
> >     waiting. */
> >     >     >>>     318     avail_req = vm_event_ring_available(ved);
> >     >     >>>     319     if( current->domain == d && avail_req <
> >     d->max_vcpus )
> >     >     >>>     320         vm_event_mark_and_pause(current, ved);
> >     >     >>>
> >     >     >>>     It would appear that even if the guest only has 2
> online
> >     >     VCPUs, the
> >     >     >>>     "avail_req < d->max_vcpus" condition will pause
> >     current, and we
> >     >     >>>     eventually end up with all the VCPUs paused.
> >     >     >>>
> >     >     >>>     An ugly hack ("avail_req < 2") has allowed booting a
> guest
> >     >     with many
> >     >     >>>     VCPUs (max_vcpus, the guest only brings 2 VCPUs
> online),
> >     >     however that's
> >     >     >>>     just to prove that that was the culprit - a real
> >     solution to
> >     >     this needs
> >     >     >>>     more in-depth understading of the issue and potential
> >     >     solution. That's
> >     >     >>>     basically very old code (pre-2012 at least) that got
> moved
> >     >     around into
> >     >     >>>     the current shape of Xen today - please CC anyone
> relevant
> >     >     to the
> >     >     >>>     discussion that you're aware of.
> >     >     >>>
> >     >     >>>     Thoughts?
> >     >     >>>
> >     >     >>>
> >     >     >>> I think is a side-effect of the growth of the vm_event
> >     structure
> >     >     and the
> >     >     >>> fact that we have a single page ring. The check
> >     effectively sets a
> >     >     >>> threshold of having enough space for each vCPU to place at
> >     least one
> >     >     >>> more event on the ring, and if that's not the case it gets
> >     >     paused. OTOH
> >     >     >>> I think this would only have an effect on asynchronous
> events,
> >     >     for all
> >     >     >>> other events the vCPU is already paused. Is that the case
> >     you have?
> >     >     >> No, on the contrary, all my events are synchronous (the
> VCPU is
> >     >     paused
> >     >     >> waiting for the vm_event reply).
> >     >     >>
> >     >     >> I've debugged this a bit, and the problem seems to be that
> >     >     >> vm_event_wake_blocked() breaks here:
> >     >     >>
> >     >     >> 150     /* We remember which vcpu last woke up to avoid
> >     scanning
> >     >     always
> >     >     >> linearly
> >     >     >> 151      * from zero and starving higher-numbered vcpus
> under
> >     >     high load */
> >     >     >> 152     if ( d->vcpu )
> >     >     >> 153     {
> >     >     >> 154         int i, j, k;
> >     >     >> 155
> >     >     >> 156         for (i = ved->last_vcpu_wake_up + 1, j = 0; j <
> >     >     >> d->max_vcpus; i++, j++)
> >     >     >> 157         {
> >     >     >> 158             k = i % d->max_vcpus;
> >     >     >> 159             v = d->vcpu[k];
> >     >     >> 160             if ( !v )
> >     >     >> 161                 continue;
> >     >     >> 162
> >     >     >> 163             if ( !(ved->blocked) || online >= avail_req
> )
> >     >     >> 164                break;
> >     >     >> 165
> >     >     >> 166             if ( test_and_clear_bit(ved->pause_flag,
> >     >     &v->pause_flags) )
> >     >     >> 167             {
> >     >     >> 168                 vcpu_unpause(v);
> >     >     >> 169                 online++;
> >     >     >> 170                 ved->blocked--;
> >     >     >> 171                 ved->last_vcpu_wake_up = k;
> >     >     >> 172             }
> >     >     >> 173         }
> >     >     >> 174     }
> >     >     >>
> >     >     >> at "if ( !(ved->blocked) || online >= avail_req )". At this
> >     point,
> >     >     >> nothing ever gets unblocked. It's hard to believe that this
> is
> >     >     desired
> >     >     >> behaviour, as I don't know what could possibly happen for
> that
> >     >     condition
> >     >     >> to become false once all the online VCPUs are stuck
> (especially
> >     >     when the
> >     >     >> guest has just started booting).
> >     >
> >     >
> >     > Ah I see what happens. During boot vCPU 0 generates an event and
> gets
> >     > marked blocked because the number of vCPUs is so high. The other
> vCPUs
> >     > are still unblocked since they are idle, but this test here will
> still
> >     > be true (online >= avail_req) and thus we can never unblock vCPU0.
> And
> >     > then the boot process is hanging because vCPU0 never resumes. I
> would
> >     > argue that this test should be changed to check that there is at
> >     least 1
> >     > spot on the ring and only break if that is not the case anymore
> (ie.
> >     > instead of incrementing online we should be decrementing
> avail_req).
> >
> >     That is exactly what happens. And it can't really be fixed just by
> >     increasing the ring buffer (although that definitely helps a lot and
> >     would be a smart move): no matter how large it is, we can always ask
> the
> >     domain to use more VCPUs than there are slots in the buffer.
> >
> >
> > No, not increasing the ring buffer but fixing the check to unblock when
> > there is at least 1 spot on the ring. So look at this comment...
> >
> > /*
> >  * vm_event_wake_blocked() will wakeup vcpus waiting for room in the
> >  * ring. These vCPUs were paused on their way out after placing an event,
> >  * but need to be resumed where the ring is capable of processing at
> least
> >  * one event from them.
> >  */
> >
> > .. it seems to me that the check "online >= avail_req" was just wrong
> > from the start.
>
> I've read that to read that there need to be more than one available
> slot in the ring: "wake vcpus [plural] [...] at leas one event from them".
>

Right, I mean we can unblock all vCPUs as soon as there is 1 spot on the
ring. If they trap again, they will just be placed in the waitqueue instead
of being blocked. I think that would work fine.

Tamas
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Reply via email to