On Thu, 7 Feb 2008 01:51:10 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote:
> > * Andrew Morton <[EMAIL PROTECTED]> wrote: > > > Nope. > > > > But I tested it on mainline, and mainline exhibits the > > never-powers-off symptom, whereas > > ed50d6cbc394cd0966469d3e249353c9dd1d38b9 demonstrates the > > powers-off-after-20-seconds symptom. > > > > So we _may_ be dealing with two bugs here, and your patch might have > > fixed the first, but that success is obscured by the second. I guess > > I need to prepare a tree which has > > ed50d6cbc394cd0966469d3e249353c9dd1d38b9 at its tip. (Wonders how to > > do that). > > the way i do it in bisection is to do: > > mkdir patches > git-log -1 -p ed50d6cbc394cd0966469d3 > patches/fix.patch > echo fix.patch > patches/series > > and then before testing a bisection point, i do a 'quilt push'. Before > telling git-bisect about the quality of that bisection point (good/bad) > i pop it off via 'quilt pop'. > > this way the 'required fix' can be kept during the bisection, to find > the secondary bug. > > > btw, mainline (plus this patch, not that it changed anything) prints > > > > <stopping disk stuff> > > Disabling non-boot CPUs > > CPU 1 is now offline > > > > and that's it. This machine has eight cpus. Might be a hint? > > what should be the proper message? Seems that it should be a stream of eight CPU n is now offline CPU n down > my suspects, besides there being something wrong in the hung-tasks code > of the softlockup watchdog, would be the cpu-hotplug commits, or some > arch/x86 commit. (although we didnt really have anything specifically > touching the the reboot path) > > does a stupid patch like the one below tell you more about what the > other CPUs are doing during this hang? [32-bit only patch] > > Ingo > > --- > arch/i386/kernel/nmi.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > Index: linux/arch/i386/kernel/nmi.c > =================================================================== > --- linux.orig/arch/x86/kernel/nmi_64.c > +++ linux/arch/x86/kernel/nmi_64.c > @@ -331,6 +331,14 @@ __kprobes int nmi_watchdog_tick(struct p > int touched = 0; > int cpu = smp_processor_id(); > int rc=0; > + static int count[NR_CPUS]; > + > + if (!count[cpu]) { > + count[cpu] = nmi_hz; > + printk("CPU#%d, tick\n", cpu); > + show_regs(regs); > + } > + count[cpu]--; > > /* check for other users first */ > if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) I reworked that on top of ed50d6cbc394cd0966469d3e249353c9dd1d38b9: no change. However I watched the vga console this time (nothing is coming over netconsole at this stage) I saw this: CPU 1 is now offline <10 second pause> CPU 1 is down CPU 2 is now offline CPU 2 is down CPU 3 is now offline CPU 3 is down CPU 4 is now offline <10 second pause> followed by a quick spew of the remaining CPUs going down and offline then poweroff. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/