Re: [BUG] long freezes on thinkpad t60

2007-07-02 Thread Nick Piggin
Linus Torvalds wrote: Nick, call me a worry-wart, but I slept on this, and started worrying.. On Tue, 26 Jun 2007, Linus Torvalds wrote: So try it with just a byte counter, and test some stupid micro-benchmark on both a P4 and a Core 2 Duo, and if it's in the noise, maybe we can make it the

Re: [BUG] long freezes on thinkpad t60

2007-06-27 Thread Davide Libenzi
On Wed, 27 Jun 2007, Linus Torvalds wrote: > On Wed, 27 Jun 2007, Davide Libenzi wrote: > > > On Wed, 27 Jun 2007, Linus Torvalds wrote: > > > > > > Stores never "leak up". They only ever leak down (ie past subsequent > > > loads > > > or stores), so you don't need to worry about them. That's

Re: [BUG] long freezes on thinkpad t60

2007-06-27 Thread Linus Torvalds
On Wed, 27 Jun 2007, Davide Libenzi wrote: > On Wed, 27 Jun 2007, Linus Torvalds wrote: > > > > Stores never "leak up". They only ever leak down (ie past subsequent loads > > or stores), so you don't need to worry about them. That's actually already > > documented (although not in those terms

Re: [BUG] long freezes on thinkpad t60

2007-06-27 Thread Davide Libenzi
On Wed, 27 Jun 2007, Linus Torvalds wrote: > > IOW shouldn't an mfence always be there? Not only loads could leak up > > into the wait phase, but stores too, if they have no dependency with the > > "head" and "tail" loads. > > Stores never "leak up". They only ever leak down (ie past subsequent

Re: [BUG] long freezes on thinkpad t60

2007-06-27 Thread Linus Torvalds
On Wed, 27 Jun 2007, Davide Libenzi wrote: > > > > Now, I have good reason to believe that all Intel and AMD CPU's have a > > stricter-than-documented memory ordering, and that your spinlock may > > actually work perfectly well. But it still worries me. As far as I can > > tell, there's a the

Re: [BUG] long freezes on thinkpad t60

2007-06-27 Thread Davide Libenzi
On Wed, 27 Jun 2007, Linus Torvalds wrote: > On Tue, 26 Jun 2007, Linus Torvalds wrote: > > > > So try it with just a byte counter, and test some stupid micro-benchmark > > on both a P4 and a Core 2 Duo, and if it's in the noise, maybe we can make > > it the normal spinlock sequence just becaus

Re: [BUG] long freezes on thinkpad t60

2007-06-27 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > With the sequence counters, the situation is more complex: > > CPU #0 CPU #1 > > A (= code before the spinlock) > > lock xadd mem (serializing instruction) > > B (= code afte xadd, but not

Re: [BUG] long freezes on thinkpad t60

2007-06-27 Thread Linus Torvalds
Nick, call me a worry-wart, but I slept on this, and started worrying.. On Tue, 26 Jun 2007, Linus Torvalds wrote: > > So try it with just a byte counter, and test some stupid micro-benchmark > on both a P4 and a Core 2 Duo, and if it's in the noise, maybe we can make > it the normal spinlock

Re: [BUG] long freezes on thinkpad t60

2007-06-26 Thread Nick Piggin
Linus Torvalds wrote: On Wed, 27 Jun 2007, Nick Piggin wrote: I don't know why my unlock sequence should be that much slower? Unlocked mov vs unlocked add? Definitely in dumb micro-benchmark testing it wasn't twice as slow (IIRC). Oh, that releasing "add" can be unlocked, and only the holde

Re: [BUG] long freezes on thinkpad t60

2007-06-26 Thread Linus Torvalds
On Wed, 27 Jun 2007, Nick Piggin wrote: > > I don't know why my unlock sequence should be that much slower? Unlocked > mov vs unlocked add? Definitely in dumb micro-benchmark testing it wasn't > twice as slow (IIRC). Oh, that releasing "add" can be unlocked, and only the holder of the lock eve

Re: [BUG] long freezes on thinkpad t60

2007-06-26 Thread Nick Piggin
Linus Torvalds wrote: On Tue, 26 Jun 2007, Nick Piggin wrote: Hmm, not that I have a strong opinion one way or the other, but I don't know that they would encourage bad code. They are not going to reduce latency under a locked section, but will improve determinism in the contended case. xad

Re: [BUG] long freezes on thinkpad t60

2007-06-26 Thread Linus Torvalds
On Tue, 26 Jun 2007, Nick Piggin wrote: > > Hmm, not that I have a strong opinion one way or the other, but I > don't know that they would encourage bad code. They are not going to > reduce latency under a locked section, but will improve determinism > in the contended case. xadd really general

Re: [BUG] long freezes on thinkpad t60

2007-06-26 Thread Jarek Poplawski
On Tue, Jun 26, 2007 at 06:42:10PM +1000, Nick Piggin wrote: ... > They should also improve performance in heavily contended case due to > the nature of how they spin, but I know that's not something you want > to hear about. And theoretically there should be no reason why xadd is > any slower than

Re: [BUG] long freezes on thinkpad t60

2007-06-26 Thread Nick Piggin
Linus Torvalds wrote: On Thu, 21 Jun 2007, Eric Dumazet wrote: This reminds me Nick's proposal of 'queued spinlocks' 3 months ago Maybe this should be re-considered ? (unlock is still a non atomic op, so we dont pay the serializing cost twice) No. The point is simple: IF YOU NEED

Re: [BUG] long freezes on thinkpad t60

2007-06-24 Thread Jarek Poplawski
On Sat, Jun 23, 2007 at 12:36:08PM +0200, Miklos Szeredi wrote: ... > And it's not a NO_HZ kernel. ... BTW, maybe I've missed this and it's unconnected, but I hope the first config has been changed - especially this CONFIG_AGP_AMD64 = y, and this bug from mm/slab.c has gone long ago... Jarek P. -

Re: [BUG] long freezes on thinkpad t60

2007-06-23 Thread Linus Torvalds
On Sat, 23 Jun 2007, Miklos Szeredi wrote: > > What I notice is that the interrupt distribution between the CPUs is > very asymmetric like this: > >CPU0 CPU1 > 0: 220496 42 IO-APIC-edge timer > 1: 3841 0 IO-APIC-edge i8042 ... > LOC

Re: [BUG] long freezes on thinkpad t60

2007-06-23 Thread Miklos Szeredi
> > the freezes that Miklos was seeing were hardirq contexts blocking in > > task_rq_lock() - that is done with interrupts disabled. (Miklos i > > think also tried !NOHZ kernels and older kernels, with a similar > > result.) > > > > plus on the ptrace side, the wait_task_inactive() code had mos

Re: [BUG] long freezes on thinkpad t60

2007-06-22 Thread Jarek Poplawski
On Thu, Jun 21, 2007 at 09:01:28AM -0700, Linus Torvalds wrote: > > > On Thu, 21 Jun 2007, Jarek Poplawski wrote: ... > So I don't see how you could possibly having two different CPU's getting > into some lock-step in that loop: changing "task_rq()" is a really quite > heavy operation (it's abo

Re: [BUG] long freezes on thinkpad t60

2007-06-22 Thread Ingo Molnar
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > the freezes that Miklos was seeing were hardirq contexts blocking in > task_rq_lock() - that is done with interrupts disabled. (Miklos i > think also tried !NOHZ kernels and older kernels, with a similar > result.) > > plus on the ptrace side, the wa

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > On Thu, 21 Jun 2007, Ingo Molnar wrote: > > > > damn, i first wrote up an explanation about why that ugly __delay(1) is > > there (it almost hurts my eyes when i look at it!) but then deleted it > > as superfluous :-/ > > I'm fine with a delay, bu

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Linus Torvalds
On Thu, 21 Jun 2007, Ingo Molnar wrote: > > damn, i first wrote up an explanation about why that ugly __delay(1) is > there (it almost hurts my eyes when i look at it!) but then deleted it > as superfluous :-/ I'm fine with a delay, but the __delay(1) is simply not "correct". It doesn't do a

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Eric Dumazet
Linus Torvalds a écrit : On Thu, 21 Jun 2007, Linus Torvalds wrote: We don't do nesting locking either, for exactly the same reason. Are nesting locks "easier"? Absolutely. They are also almost always a sign of a *bug*. So making spinlocks and/or mutexes nest by default is just a way to encou

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Linus Torvalds
On Thu, 21 Jun 2007, Ingo Molnar wrote: > > yeah. I think Linux is i think the only OS on the planet that is using > the movb trick for unlock, it even triggered a hardware erratum ;) I'm pretty sure others do it too. Maybe not on an OS level (but I actually doubt that - I'd be surprised if

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > > for (;;) { > > for (i = 0; i < loops; i++) { > > if (__raw_write_trylock(&lock->raw_lock)) > > return; > > __delay(1); > >

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > On Thu, 21 Jun 2007, Ingo Molnar wrote: > > > > I can understand why no data is saved by this change: gcc is > > aligning the next field to a natural boundary anyway and we dont > > really have arrays of spinlocks (fortunately). > > Actually, some

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > No, the cache line arbitration doesn't know anything about "locked" vs > "unlocked" instructions (it could, but there really is no point). > > The real issue is that locked instructions on x86 are serializing, > which makes them extremely slow (com

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Linus Torvalds
On Thu, 21 Jun 2007, Ingo Molnar wrote: > > * Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > If somebody can actually come up with a sequence where we have > > spinlock starvation, and it's not about an example of bad locking, and > > nobody really can come up with any other way to fix it, w

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > It's in fact entirely possible that the long freezes have always been > there, but the NOHZ option meant that we had much longer stretches of > time without things like timer interrupts to jumble up the timing! So > maybe the freezes existed before

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > (And no, on 32-bit x86, we don't allow more than 128 CPU's. I don't > think such an insane machine has ever existed). and if people _really_ want to boot a large-smp 32-bit kernel on some new, tons-of-cpus box, as a workaround they can enable the s

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Linus Torvalds
On Thu, 21 Jun 2007, Ingo Molnar wrote: > > I can understand why no data is saved by this change: gcc is aligning > the next field to a natural boundary anyway and we dont really have > arrays of spinlocks (fortunately). Actually, some data structures could well shrink. Look at "struct task_

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > If somebody can actually come up with a sequence where we have > spinlock starvation, and it's not about an example of bad locking, and > nobody really can come up with any other way to fix it, we may > eventually have to add the notion of "fair sp

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > Umm. i386 spinlocks could and should be *one*byte*. > > In fact, I don't even know why they are wasting four bytes right now: > the fact that somebody made them an "int" just wastes memory. All the > actual code uses "decb", so it's not even a ques

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Linus Torvalds
On Thu, 21 Jun 2007, Linus Torvalds wrote: > > We don't do nesting locking either, for exactly the same reason. Are > nesting locks "easier"? Absolutely. They are also almost always a sign of > a *bug*. So making spinlocks and/or mutexes nest by default is just a way > to encourage bad progra

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Linus Torvalds
On Thu, 21 Jun 2007, Eric Dumazet wrote: > > This reminds me Nick's proposal of 'queued spinlocks' 3 months ago > > Maybe this should be re-considered ? (unlock is still a non atomic op, > so we dont pay the serializing cost twice) No. The point is simple: IF YOU NEED THIS, YOU ARE D

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Eric Dumazet
On Thu, 21 Jun 2007 10:31:53 -0700 (PDT) Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > On Thu, 21 Jun 2007, Chuck Ebbert wrote: > > > > A while ago I showed that spinlocks were a lot more fair when doing > > unlock with the xchg instruction on x86. Probably the arbitration is all > > screwed

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Linus Torvalds
On Thu, 21 Jun 2007, Chuck Ebbert wrote: > > A while ago I showed that spinlocks were a lot more fair when doing > unlock with the xchg instruction on x86. Probably the arbitration is all > screwed up because we use a mov instruction, which while atomic is not > locked. No, the cache line arbit

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Chuck Ebbert
On 06/21/2007 12:08 PM, Ingo Molnar wrote: > yeah - i'm not at all arguing in favor of the BTRL patch i did: i always > liked the 'nicer' inner loop of spinlocks, which could btw also easily > use MONITOR/MWAIT. The "nice" inner loop is necessary or else it would generate huge amounts of bus tra

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Linus Torvalds
On Thu, 21 Jun 2007, Ingo Molnar wrote: > > so the problem was not the trylock based spin_lock() itself (no matter > how it's structured in the assembly), the problem was actually modifying > the lock and re-modifying it again and again in a very tight > high-frequency loop, and hence not giv

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > On Thu, 21 Jun 2007, Ingo Molnar wrote: > > > > what worries me a bit though is that my patch that made spinlocks > > equally agressive to that loop didnt solve the hangs! > > Your parch kept doing "spin_trylock()", didn't it? yeah - it changed sp

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Linus Torvalds
On Thu, 21 Jun 2007, Jarek Poplawski wrote: > > BTW, I've looked a bit at these NMI watchdog traces, and now I'm not > even sure it's necessarily the spinlock's problem (but I don't exclude > this possibility yet). It seems both processors use task_rq_lock(), so > there could be also a problem w

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Linus Torvalds
On Thu, 21 Jun 2007, Ingo Molnar wrote: > > what worries me a bit though is that my patch that made spinlocks > equally agressive to that loop didnt solve the hangs! Your parch kept doing "spin_trylock()", didn't it? That's a read-modify-write thing, and keeps bouncing the cacheline back and

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Jarek Poplawski
On Thu, Jun 21, 2007 at 10:39:31AM +0200, Ingo Molnar wrote: > > * Jarek Poplawski <[EMAIL PROTECTED]> wrote: > > > BTW, I've looked a bit at these NMI watchdog traces, and now I'm not > > even sure it's necessarily the spinlock's problem (but I don't exclude > > this possibility yet). It seems

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Ingo Molnar
* Jarek Poplawski <[EMAIL PROTECTED]> wrote: > BTW, I've looked a bit at these NMI watchdog traces, and now I'm not > even sure it's necessarily the spinlock's problem (but I don't exclude > this possibility yet). It seems both processors use task_rq_lock(), so > there could be also a problem

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > In other words, spinlocks are optimized for *lack* of contention. If a > spinlock has contention, you don't try to make the spinlock "fair". > No, you try to fix the contention instead! yeah, and if there's no easy solution, change it to a mutex. F

Re: [BUG] long freezes on thinkpad t60

2007-06-21 Thread Jarek Poplawski
On Wed, Jun 20, 2007 at 10:34:15AM -0700, Linus Torvalds wrote: > > > On Wed, 20 Jun 2007, Jarek Poplawski wrote: > > > > I don't agree with this (+ I know it doesn't matter). > > > > The real bug is what Chuck Ebbert wrote: "Spinlocks aren't fair". > > And here they are simply lawlessly not fa

Re: [BUG] long freezes on thinkpad t60

2007-06-20 Thread Linus Torvalds
On Wed, 20 Jun 2007, Jarek Poplawski wrote: > > I don't agree with this (+ I know it doesn't matter). > > The real bug is what Chuck Ebbert wrote: "Spinlocks aren't fair". > And here they are simply lawlessly not fair. Well, that's certainly a valid standpoint. I wouldn't claim you're _wrong_

Re: [BUG] long freezes on thinkpad t60

2007-06-20 Thread Jarek Poplawski
On 18-06-2007 18:34, Linus Torvalds wrote: > > On Mon, 18 Jun 2007, Ingo Molnar wrote: >> To test this theory, could you try the patch below, does this fix your >> hangs too? > > I really think this the the wrong approach, although *testing* it makes > sense. > > I think we need to handle loop

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Ravikiran G Thirumalai
On Mon, Jun 18, 2007 at 01:20:55AM -0700, Andrew Morton wrote: > On Mon, 18 Jun 2007 10:12:04 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > > Subject: [patch] x86: fix spin-loop starvation bug > > From: Ingo Molnar <[EMAIL PROTECTED]> >

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Linus Torvalds
On Mon, 18 Jun 2007, Ingo Molnar wrote: > > ok. Do we have an guarantee that cpu_relax() is also an smp_rmb()? The common use for cpu_relax() is basically for code that does while (*ptr != val) cpu_relax(); so yes, an architecture that doesn't notice writes by other CP

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > > Boots and runs fine. Fixes the freezes as well, which is not such a > > big surprise, since basically any change in that function seems to > > do that ;) > > Yeah, and that code really was *designed* to make all "locking in a > loop" go away. S

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > That code does: > > if (unlikely(p->array || task_running(rq, p))) { > > to decide if it needs to just unlock and repeat, but then to decide if > it need to *yield* it only uses *one* of those tests (namely > > preempted = !task_runnin

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Linus Torvalds
On Mon, 18 Jun 2007, Miklos Szeredi wrote: > > > Hmm? Untested, I know. Maybe I overlooked something. But even the > > generated assembly code looks fine (much better than it looked before!) > > Boots and runs fine. Fixes the freezes as well, which is not such a > big surprise, since basically

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Miklos Szeredi
> Hmm? Untested, I know. Maybe I overlooked something. But even the > generated assembly code looks fine (much better than it looked before!) Boots and runs fine. Fixes the freezes as well, which is not such a big surprise, since basically any change in that function seems to do that ;) Miklos

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Linus Torvalds
On Mon, 18 Jun 2007, Ingo Molnar wrote: > > To test this theory, could you try the patch below, does this fix your > hangs too? I really think this the the wrong approach, although *testing* it makes sense. I think we need to handle loops that take, release, and then immediately re-take dif

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Miklos Szeredi
> > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > > > how about the patch below? Boot-tested on 32-bit. As a side-effect > > > > this change also removes the 255 CPUs limit from the 32-bit kernel. > > > > > > boot-tested on 64-bit too now. > > > > Strange, I can't even get past the compi

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Ingo Molnar
* Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > how about the patch below? Boot-tested on 32-bit. As a side-effect > > > this change also removes the 255 CPUs limit from the 32-bit kernel. > > > > boot-tested on 64-bit too now. > > Strange, I

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Miklos Szeredi
> * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > how about the patch below? Boot-tested on 32-bit. As a side-effect > > this change also removes the 255 CPUs limit from the 32-bit kernel. > > boot-tested on 64-bit too now. Strange, I can't even get past the compile stage ;) CC kernel/sp

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Ingo Molnar
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > how about the patch below? Boot-tested on 32-bit. As a side-effect > this change also removes the 255 CPUs limit from the 32-bit kernel. boot-tested on 64-bit too now. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kern

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Ingo Molnar
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > This change causes the memory access of the "easy" spin-loop > > > > > portion to be more agressive: after the REP; NOP we'd not do > > > > > the 'easy-loop' with a simple CMPB, but we'd re-attempt the > > > > > atomic op. > > > > > > > > It

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Ingo Molnar
* Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > > This change causes the memory access of the "easy" spin-loop portion > > > > to be more agressive: after the REP; NOP we'd not do the 'easy-loop' > > > > with a simple CMPB, but we'd re-attempt the atomic op. > > > > > > It looks as if this i

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Miklos Szeredi
> > > This change causes the memory access of the "easy" spin-loop portion > > > to be more agressive: after the REP; NOP we'd not do the 'easy-loop' > > > with a simple CMPB, but we'd re-attempt the atomic op. > > > > It looks as if this is going to overflow of the lock counter, no? > > hm, wh

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Ingo Molnar
(Ravikiran Cc:-ed too) * Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > To test this theory, could you try the patch below, does this fix > > your hangs too? > > Not tried yet, but obviously it does, since it's a superset of the > previous fix. I could try without the smb_mb(), but see below.

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Miklos Szeredi
> To test this theory, could you try the patch below, does this fix your > hangs too? Not tried yet, but obviously it does, since it's a superset of the previous fix. I could try without the smb_mb(), but see below. > This change causes the memory access of the "easy" spin-loop portion > to be

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Andrew Morton
On Mon, 18 Jun 2007 10:12:04 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > Subject: [patch] x86: fix spin-loop starvation bug > From: Ingo Molnar <[EMAIL PROTECTED]> > > Miklos Szeredi reported very long pauses (several seconds, sometimes

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Ingo Molnar
* Miklos Szeredi <[EMAIL PROTECTED]> wrote: > My previous attempt was just commenting out parts of your patch. But > maybe it's more logical to move the barrier to immediately after the > unlock. > > With this patch I can't reproduce the problem, which may not mean very > much, since it was

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Miklos Szeredi
> > > If this solves the problem on your box then i'll do a proper fix and > > > introduce a cpu_relax_memory_change(*addr) type of API to around > > > monitor/mwait. This patch boots fine on my T60 - but i never saw > > > your problem. > > > > Yes, the patch does make the pauses go away. In f

Re: [BUG] long freezes on thinkpad t60

2007-06-17 Thread Ingo Molnar
* Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > could you try the quick hack below, ontop of cfs-v17? It adds two > > things to wait_task_inactive(): > > > > - a cond_resched() [in case you are running !PREEMPT] > > > > - use MONITOR+MWAIT to monitor memory transactions to the rq->curr > > c

Re: [BUG] long freezes on thinkpad t60

2007-06-17 Thread Miklos Szeredi
Chuck, Ingo, thanks for the responses. > > The pattern that emerges is that on CPU0 we have an interrupt, which > > is trying to acquire the rq lock, but can't. > > > > On CPU1 we have strace which is doing wait_task_inactive(), which sort > > of spins acquiring and releasing the rq lock. I've

Re: [BUG] long freezes on thinkpad t60

2007-06-16 Thread Ingo Molnar
* Miklos Szeredi <[EMAIL PROTECTED]> wrote: > I've got some more info about this bug. It is gathered with > nmi_watchdog=2 and a modified nmi_watchdog_tick(), which instead of > calling die_nmi() just prints a line and calls show_registers(). great! > The pattern that emerges is that on CPU0

Re: [BUG] long freezes on thinkpad t60

2007-06-15 Thread Chuck Ebbert
On 06/14/2007 12:04 PM, Miklos Szeredi wrote: > I've got some more info about this bug. It is gathered with > nmi_watchdog=2 and a modified nmi_watchdog_tick(), which instead of > calling die_nmi() just prints a line and calls show_registers(). > > This makes the machine actually survive the NMI

Re: [BUG] long freezes on thinkpad t60

2007-06-14 Thread Miklos Szeredi
I've got some more info about this bug. It is gathered with nmi_watchdog=2 and a modified nmi_watchdog_tick(), which instead of calling die_nmi() just prints a line and calls show_registers(). This makes the machine actually survive the NMI tracing. The attached traces are gathered over about an

Re: [BUG] long freezes on thinkpad t60

2007-05-25 Thread Miklos Szeredi
> > 2.6.22-rc2, only EVENT_TRACE - boots, can't rerpoduce > > 2.6.21-vanila - can reproduce > > 2.6.21-rt7, trace options off - can reproduce > > 2.6.21-rt7, trace options on - can't reproduce > > > > Possibly something timing related, that's altered by the trace code. I > > tried the trace kerne

Re: [BUG] long freezes on thinkpad t60

2007-05-25 Thread Ingo Molnar
* Kok, Auke <[EMAIL PROTECTED]> wrote: > Henrique de Moraes Holschuh wrote: > >On Thu, 24 May 2007, Miklos Szeredi wrote: > >>Tried nmi_watchdog=1, but then the machine locks up hard shortly after > >>boot. > > > >NMIs in some thinkpads are bad trouble, they lock up the blasted IBM/Lenovo > >SMBI

Re: [BUG] long freezes on thinkpad t60

2007-05-24 Thread Kok, Auke
Henrique de Moraes Holschuh wrote: On Thu, 24 May 2007, Miklos Szeredi wrote: Tried nmi_watchdog=1, but then the machine locks up hard shortly after boot. NMIs in some thinkpads are bad trouble, they lock up the blasted IBM/Lenovo SMBIOS if they happen to hit it while it is servicing a SMI, an

Re: [BUG] long freezes on thinkpad t60

2007-05-24 Thread Henrique de Moraes Holschuh
On Thu, 24 May 2007, Miklos Szeredi wrote: > Tried nmi_watchdog=1, but then the machine locks up hard shortly after > boot. NMIs in some thinkpads are bad trouble, they lock up the blasted IBM/Lenovo SMBIOS if they happen to hit it while it is servicing a SMI, and thinkpads do SMIs like crazy. --

Re: [BUG] long freezes on thinkpad t60

2007-05-24 Thread Ingo Molnar
* Miklos Szeredi <[EMAIL PROTECTED]> wrote: > 2.6.22-rc2, only EVENT_TRACE - boots, can't rerpoduce > 2.6.21-vanila - can reproduce > 2.6.21-rt7, trace options off - can reproduce > 2.6.21-rt7, trace options on - can't reproduce > > Possibly something timing related, that's altered by the trace

Re: [BUG] long freezes on thinkpad t60

2007-05-24 Thread Miklos Szeredi
> could you just try v2.6.21 plus the -rt patch, which has the tracer > built-in? That's a combination that should work well. You can pick it up > from: > >http://people.redhat.com/mingo/realtime-preempt/ > > same config options as above. If you dont turn on PREEMPT_RT you'll get > an almo

Re: [BUG] long freezes on thinkpad t60

2007-05-24 Thread Ingo Molnar
* Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > CONFIG_EVENT_TRACE=y > > CONFIG_FUNCTION_TRACE=y > > # CONFIG_WAKEUP_TIMING is not set > > # CONFIG_CRITICAL_IRQSOFF_TIMING is not set > > CONFIG_MCOUNT=y > > > > does it boot with these? > > Nope. Same segfault. If I try to continue manual

Re: [BUG] long freezes on thinkpad t60

2007-05-24 Thread Ingo Molnar
* Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > hm, you should only need these: > > > > CONFIG_EVENT_TRACE=y > > CONFIG_FUNCTION_TRACE=y > > # CONFIG_WAKEUP_TIMING is not set > > # CONFIG_CRITICAL_IRQSOFF_TIMING is not set > > CONFIG_MCOUNT=y > > > > does it boot with these? > > Nope. Sa

Re: [BUG] long freezes on thinkpad t60

2007-05-24 Thread Miklos Szeredi
> > > how reproducable are these lockups - could you possibly trace it? If > > > yes then please apply: > > > > > > http://www.tglx.de/private/tglx/ht-debug/tracer.diff > > > > With this patch boot stops at segfaulting fsck. I enabled all the new > > config options, is that not a good idea?

Re: [BUG] long freezes on thinkpad t60

2007-05-24 Thread Ingo Molnar
* Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > how reproducable are these lockups - could you possibly trace it? If > > yes then please apply: > > > > http://www.tglx.de/private/tglx/ht-debug/tracer.diff > > With this patch boot stops at segfaulting fsck. I enabled all the new > config op

Re: [BUG] long freezes on thinkpad t60

2007-05-24 Thread Ingo Molnar
* Miklos Szeredi <[EMAIL PROTECTED]> wrote: > On some strange workload involving strace and fuse I get ocasional > long periods (10-100s) of total unresponsiveness, not even SysRq-* > working. Then the machine continues as normal. Nothing in dmesg, > absolutely no indication about what is ha