On Mon, 2010-08-09 at 01:18 +0100, Ben Hutchings wrote: > On Wed, 2010-08-04 at 15:05 +0200, Zdenek Salvet wrote: > > On Wed, Aug 04, 2010 at 03:23:52AM +0100, Ben Hutchings wrote: > > > > I found root cause of the problem; after I added following fix to lenny > > > > xen kernel, none of 56 domU froze again in one week of testing: > > > [...] > > > > > > That sounds promising. Is this patch based on a change made by the > > > upstream Xen or Linux developers? > > > > No, it is my own fix and I have not reported it anywhere else yet. > > OK. It is clearly not applicable to the pvops version of Linux-for-Xen > so it probably doesn't make sense to send upstream. > > > The deadlock it fixes is very similar to that fixed by > > bugfix/all/printk-robustify-printk.patch . > > So you reckon xtime_lock is lower in the lock hierarchy than run-queue > locks? I can't see any place where xtime_lock is obtained after a > run-queue lock, but this change nevertheless looks reasonable and safe. > > Ian, any comment on this?
It seems sane to me, particularly based on the comparison with b845b51 (AKA bugfix/all/printk-robustify-printk.patch) which is concerned with the same deadlock (albeit a different source though). Jan, the Novell kernels might want this, it still seems to be present in the 2.6.32 based tree. (the complete discussion can be seen at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=591362) It's not entirely clear to me why the Xen timer interrupt needs to call clock_was_set anyway -- the native timer interrupts certainly do not (it maybe that this behaviour is a hangover from an ancient kernel when time.c was forked into time-xen.c). I didn't look too deeply since I think the proposed patch is most likely to be the safest option for a stable branch anyway. Ian. > > > --- source_amd64_xen/arch/x86/kernel/time_32-xen.c 2010-07-24 > > 07:28:32.162719094 +0200 > > +++ source_amd64_xen.new/arch/x86/kernel/time_32-xen.c 2010-07-24 > > 07:26:32.416076711 +0200 > > @@ -466,6 +466,7 @@ > > { > > s64 delta, delta_cpu, stolen, blocked; > > unsigned int i, cpu = smp_processor_id(); > > + int schedule_clock_was_set_work = 0; > > struct shadow_time_info *shadow = &per_cpu(shadow_time, cpu); > > struct vcpu_runstate_info runstate; > > > > @@ -525,12 +526,13 @@ > > > > if (shadow_tv_version != HYPERVISOR_shared_info->wc_version) { > > update_wallclock(); > > - if (keventd_up()) > > - schedule_work(&clock_was_set_work); > > + schedule_clock_was_set_work = 1; > > } > > > > write_sequnlock(&xtime_lock); > > > > + if (schedule_clock_was_set_work && keventd_up()) > > + schedule_work(&clock_was_set_work); > > /* > > * Account stolen ticks. > > * HACK: Passing NULL to account_steal_time() > > Ben. > -- Ian Campbell Current Noise: Pearl Jam - MFC Confidence is simply that quiet, assured feeling you have before you fall flat on your face. -- Dr. L. Binder -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org