On Tue, Feb 7, 2017 at 11:48 PM, Razvan Cojocaru <rcojoc...@bitdefender.com> wrote:
> On 02/08/2017 01:23 AM, Tamas K Lengyel wrote: > > > > > > On Tue, Feb 7, 2017 at 1:41 PM, Razvan Cojocaru > > <rcojoc...@bitdefender.com <mailto:rcojoc...@bitdefender.com>> wrote: > > > > On 02/07/2017 10:20 PM, Tamas K Lengyel wrote: > > > > > > > > > On Tue, Feb 7, 2017 at 11:57 AM, Razvan Cojocaru > > > <rcojoc...@bitdefender.com <mailto:rcojoc...@bitdefender.com> > > <mailto:rcojoc...@bitdefender.com > > <mailto:rcojoc...@bitdefender.com>>> wrote: > > > > > > On 02/07/2017 08:39 PM, Andrew Cooper wrote: > > > > On 07/02/17 18:31, Razvan Cojocaru wrote: > > > >> On 02/07/2017 08:15 PM, Tamas K Lengyel wrote: > > > >>> > > > >>> On Tue, Feb 7, 2017 at 9:53 AM, Razvan Cojocaru > > > >>> <rcojoc...@bitdefender.com <mailto:rcojocaru@bitdefender. > com> > > <mailto:rcojoc...@bitdefender.com <mailto:rcojoc...@bitdefender.com > >> > > > <mailto:rcojoc...@bitdefender.com > > <mailto:rcojoc...@bitdefender.com> > > > <mailto:rcojoc...@bitdefender.com > > <mailto:rcojoc...@bitdefender.com>>>> wrote: > > > >>> > > > >>> Hello, > > > >>> > > > >>> Setting, e.g. 16 VCPUs for a HVM guest, ends up > > blocking the > > > guest > > > >>> completely when subscribing to vm_events, apparently > > because > > > of this > > > >>> code in xen/common/vm_event.c: > > > >>> > > > >>> 315 /* Give this vCPU a black eye if necessary, on > the > > > way out. > > > >>> 316 * See the comments above wake_blocked() for > more > > > information > > > >>> 317 * on how this mechanism works to avoid > > waiting. */ > > > >>> 318 avail_req = vm_event_ring_available(ved); > > > >>> 319 if( current->domain == d && avail_req < > > d->max_vcpus ) > > > >>> 320 vm_event_mark_and_pause(current, ved); > > > >>> > > > >>> It would appear that even if the guest only has 2 > online > > > VCPUs, the > > > >>> "avail_req < d->max_vcpus" condition will pause > > current, and we > > > >>> eventually end up with all the VCPUs paused. > > > >>> > > > >>> An ugly hack ("avail_req < 2") has allowed booting a > guest > > > with many > > > >>> VCPUs (max_vcpus, the guest only brings 2 VCPUs > online), > > > however that's > > > >>> just to prove that that was the culprit - a real > > solution to > > > this needs > > > >>> more in-depth understading of the issue and potential > > > solution. That's > > > >>> basically very old code (pre-2012 at least) that got > moved > > > around into > > > >>> the current shape of Xen today - please CC anyone > relevant > > > to the > > > >>> discussion that you're aware of. > > > >>> > > > >>> Thoughts? > > > >>> > > > >>> > > > >>> I think is a side-effect of the growth of the vm_event > > structure > > > and the > > > >>> fact that we have a single page ring. The check > > effectively sets a > > > >>> threshold of having enough space for each vCPU to place at > > least one > > > >>> more event on the ring, and if that's not the case it gets > > > paused. OTOH > > > >>> I think this would only have an effect on asynchronous > events, > > > for all > > > >>> other events the vCPU is already paused. Is that the case > > you have? > > > >> No, on the contrary, all my events are synchronous (the > VCPU is > > > paused > > > >> waiting for the vm_event reply). > > > >> > > > >> I've debugged this a bit, and the problem seems to be that > > > >> vm_event_wake_blocked() breaks here: > > > >> > > > >> 150 /* We remember which vcpu last woke up to avoid > > scanning > > > always > > > >> linearly > > > >> 151 * from zero and starving higher-numbered vcpus > under > > > high load */ > > > >> 152 if ( d->vcpu ) > > > >> 153 { > > > >> 154 int i, j, k; > > > >> 155 > > > >> 156 for (i = ved->last_vcpu_wake_up + 1, j = 0; j < > > > >> d->max_vcpus; i++, j++) > > > >> 157 { > > > >> 158 k = i % d->max_vcpus; > > > >> 159 v = d->vcpu[k]; > > > >> 160 if ( !v ) > > > >> 161 continue; > > > >> 162 > > > >> 163 if ( !(ved->blocked) || online >= avail_req > ) > > > >> 164 break; > > > >> 165 > > > >> 166 if ( test_and_clear_bit(ved->pause_flag, > > > &v->pause_flags) ) > > > >> 167 { > > > >> 168 vcpu_unpause(v); > > > >> 169 online++; > > > >> 170 ved->blocked--; > > > >> 171 ved->last_vcpu_wake_up = k; > > > >> 172 } > > > >> 173 } > > > >> 174 } > > > >> > > > >> at "if ( !(ved->blocked) || online >= avail_req )". At this > > point, > > > >> nothing ever gets unblocked. It's hard to believe that this > is > > > desired > > > >> behaviour, as I don't know what could possibly happen for > that > > > condition > > > >> to become false once all the online VCPUs are stuck > (especially > > > when the > > > >> guest has just started booting). > > > > > > > > > Ah I see what happens. During boot vCPU 0 generates an event and > gets > > > marked blocked because the number of vCPUs is so high. The other > vCPUs > > > are still unblocked since they are idle, but this test here will > still > > > be true (online >= avail_req) and thus we can never unblock vCPU0. > And > > > then the boot process is hanging because vCPU0 never resumes. I > would > > > argue that this test should be changed to check that there is at > > least 1 > > > spot on the ring and only break if that is not the case anymore > (ie. > > > instead of incrementing online we should be decrementing > avail_req). > > > > That is exactly what happens. And it can't really be fixed just by > > increasing the ring buffer (although that definitely helps a lot and > > would be a smart move): no matter how large it is, we can always ask > the > > domain to use more VCPUs than there are slots in the buffer. > > > > > > No, not increasing the ring buffer but fixing the check to unblock when > > there is at least 1 spot on the ring. So look at this comment... > > > > /* > > * vm_event_wake_blocked() will wakeup vcpus waiting for room in the > > * ring. These vCPUs were paused on their way out after placing an event, > > * but need to be resumed where the ring is capable of processing at > least > > * one event from them. > > */ > > > > .. it seems to me that the check "online >= avail_req" was just wrong > > from the start. > > I've read that to read that there need to be more than one available > slot in the ring: "wake vcpus [plural] [...] at leas one event from them". > Right, I mean we can unblock all vCPUs as soon as there is 1 spot on the ring. If they trap again, they will just be placed in the waitqueue instead of being blocked. I think that would work fine. Tamas
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel