Re: crashme fault

2007-09-17 Thread Randy Dunlap
On Mon, 17 Sep 2007 07:53:50 -0700 (PDT) Linus Torvalds wrote: > On Mon, 17 Sep 2007, Randy Dunlap wrote: > > > > OK, I haven't done the microcode update yet. I ran crashme overnight > > with your newer patch and it crashed: > > Well, duh. > > That's because I forgot to do the "error_code & PF

Re: crashme fault

2007-09-17 Thread Linus Torvalds
On Mon, 17 Sep 2007, Randy Dunlap wrote: > > OK, I haven't done the microcode update yet. I ran crashme overnight > with your newer patch and it crashed: Well, duh. That's because I forgot to do the "error_code & PF_USER" => "user_mode_vm(regs)" thing in the most common case - the "bad_area

Re: crashme fault

2007-09-17 Thread Randy Dunlap
Linus Torvalds wrote: On Sun, 16 Sep 2007, Randy Dunlap wrote: I'll test this overnight on 2.6.23-rc6-git2 since that was failing. I haven't been able to reproduce the fault on 2.6.21 after several hours of testing. I'll also test a microcode update to see if it helps. Before you do the mic

Re: crashme fault

2007-09-16 Thread Linus Torvalds
On Sun, 16 Sep 2007, Randy Dunlap wrote: > > I'll test this overnight on 2.6.23-rc6-git2 since that was failing. > > I haven't been able to reproduce the fault on 2.6.21 after several > hours of testing. > > I'll also test a microcode update to see if it helps. Before you do the microcode upd

Re: crashme fault

2007-09-16 Thread Randy Dunlap
On Sun, 16 Sep 2007 11:12:23 -0700 (PDT) Linus Torvalds wrote: > > > On Sun, 16 Sep 2007, Linus Torvalds wrote: > > > > I'm really starting to suspect some early EM64T bug, and I also suspect > > that it's harmless but that we should just do the trivial patch to say "if > > the register state

Re: crashme fault

2007-09-16 Thread Andi Kleen
On Sun, Sep 16, 2007 at 10:14:46AM -0700, Linus Torvalds wrote: > > > On Sun, 16 Sep 2007, Randy Dunlap wrote: > > > > I'll apply this patch today, but I haven't done so yet (for the 2 > > bug reports below). > > Actually, it's probably better that you don't change your situation > unnecessari

Re: crashme fault

2007-09-16 Thread Linus Torvalds
On Sun, 16 Sep 2007, Linus Torvalds wrote: > > I'm really starting to suspect some early EM64T bug, and I also suspect > that it's harmless but that we should just do the trivial patch to say "if > the register state is in user mode, we don't care if the CPU says it was a > kernel access". N

Re: crashme fault

2007-09-16 Thread Linus Torvalds
On Sun, 16 Sep 2007, Randy Dunlap wrote: > > I'll apply this patch today, but I haven't done so yet (for the 2 > bug reports below). Actually, it's probably better that you don't change your situation unnecessarily, in case the bug goes away. Since you are triggering the problem even *without

Re: crashme fault

2007-09-16 Thread Randy Dunlap
On Sat, 15 Sep 2007 17:34:54 -0700 (PDT) Linus Torvalds wrote: > > > On Sat, 15 Sep 2007, Randy Dunlap wrote: > > Command: ./crashme +2000 666 1000 1:00:00 1 > > Ok, that's close to what I was testing (one of the examples from the > crashme docs). > > > > The original gjc crashme doesn't even

Re: crashme fault

2007-09-16 Thread Randy Dunlap
On Sun, 16 Sep 2007 17:53:21 +0200 Andrea Arcangeli wrote: > On Wed, Sep 12, 2007 at 10:21:51PM -0700, Randy Dunlap wrote: > > I run almost-daily kernel testing. I haven't seen 'crashme' cause a > > kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2, > > Did the room tempera

Re: crashme fault

2007-09-16 Thread Andrea Arcangeli
On Wed, Sep 12, 2007 at 10:21:51PM -0700, Randy Dunlap wrote: > I run almost-daily kernel testing. I haven't seen 'crashme' cause a > kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2, Did the room temperature change in the server room? ;) Those early EM64T P4 core based are

Re: crashme fault

2007-09-15 Thread Andi Kleen
On Sat, Sep 15, 2007 at 03:47:19PM -0700, Linus Torvalds wrote: > > > On Sat, 15 Sep 2007, Linus Torvalds wrote: > > > > So regardless of whether we want to trust "user_mode(regs)" more than > > "error_code & PF_USER", it would definitely be very interesting if you can > > give a good "this is

Re: crashme fault

2007-09-15 Thread Linus Torvalds
On Sat, 15 Sep 2007, Randy Dunlap wrote: > Command: ./crashme +2000 666 1000 1:00:00 1 Ok, that's close to what I was testing (one of the examples from the crashme docs). > > The original gjc crashme doesn't even do a "mprotect(PROT_EXEC)" by default > > (nor does it even compile on a modern u

Re: crashme fault

2007-09-15 Thread Randy Dunlap
Linus Torvalds wrote: On Sat, 15 Sep 2007, Linus Torvalds wrote: So regardless of whether we want to trust "user_mode(regs)" more than "error_code & PF_USER", it would definitely be very interesting if you can give a good "this is where it started happening". Also, can you point to good cras

Re: crashme fault

2007-09-15 Thread Linus Torvalds
On Sat, 15 Sep 2007, Linus Torvalds wrote: > > So regardless of whether we want to trust "user_mode(regs)" more than > "error_code & PF_USER", it would definitely be very interesting if you can > give a good "this is where it started happening". Also, can you point to good crashme sources, an

Re: crashme fault

2007-09-15 Thread Linus Torvalds
On Sat, 15 Sep 2007, Linus Torvalds wrote: > > Here's a really *stupid* patch (and untested too, btw) to see if it gets > easier to debug when you don't oops, just print the register state > instead. Side note - while thinking about this, I'm wondering whether maybe that "stupid" patch might

Re: crashme fault

2007-09-15 Thread Randy Dunlap
Linus Torvalds wrote: On Sat, 15 Sep 2007, Randy Dunlap wrote: Had another on recent last night (probably not helpful): At least the original "crashme" would write its random number seeds to a logfile each time (and I made it fsync it in some versions), which meant that once a crash happene

Re: crashme fault

2007-09-15 Thread Linus Torvalds
On Sat, 15 Sep 2007, Randy Dunlap wrote: > > Had another on recent last night (probably not helpful): At least the original "crashme" would write its random number seeds to a logfile each time (and I made it fsync it in some versions), which meant that once a crash happened, you could re-prod

Re: crashme fault

2007-09-15 Thread Randy Dunlap
Andi Kleen wrote: Andi, anything comes to mind? No, unfortunately not. There weren't any changes to entry.S recently that could corrupt the error code as far as I remember. Also cannot think of something else. A version where it started happening would be useful. I'll begin testing older k

Re: crashme fault

2007-09-15 Thread Andi Kleen
> Andi, anything comes to mind? No, unfortunately not. There weren't any changes to entry.S recently that could corrupt the error code as far as I remember. Also cannot think of something else. A version where it started happening would be useful. -Andi - To unsubscribe from this list: send the

Re: crashme fault

2007-09-14 Thread Randy Dunlap
On Fri, 14 Sep 2007 22:05:17 -0700 Randy Dunlap wrote: > On Fri, 14 Sep 2007 21:28:12 -0700 (PDT) Linus Torvalds wrote: > > > On Wed, 12 Sep 2007, Randy Dunlap wrote: > > > > > > I run almost-daily kernel testing. I haven't seen 'crashme' cause a > > > kernel fault until today, and now I've seen

Re: crashme fault

2007-09-14 Thread Randy Dunlap
On Fri, 14 Sep 2007 21:28:12 -0700 (PDT) Linus Torvalds wrote: > On Wed, 12 Sep 2007, Randy Dunlap wrote: > > > > I run almost-daily kernel testing. I haven't seen 'crashme' cause a > > kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2, > > x86_64. After the first fault, I

Re: crashme fault

2007-09-14 Thread Linus Torvalds
On Wed, 12 Sep 2007, Randy Dunlap wrote: > > I run almost-daily kernel testing. I haven't seen 'crashme' cause a > kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2, > x86_64. After the first fault, I ran 'crashme' about 10 more times > to get the second fault (usually for

crashme fault

2007-09-12 Thread Randy Dunlap
I run almost-daily kernel testing. I haven't seen 'crashme' cause a kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2, x86_64. After the first fault, I ran 'crashme' about 10 more times to get the second fault (usually for 10 minutes, one time for 30 minutes). [This is gjc-