Re: Lock-up on PPC64

2009-02-23 Thread Geoff Levand
On 01/05/2009 07:46 AM, Arnd Bergmann wrote: > On Sunday 28 December 2008, malc wrote: >> Now to the Christmas cheer, i've tried v2.6.28 and couldn't help but >> notice that the problem is gone, bisecting v2.6.27 (which funnily i >> had to mark good) to v2.6.28 (which has to be marked bad) wasn't f

Re: Lock-up on PPC64

2009-02-22 Thread Benjamin Herrenschmidt
On Sun, 2009-02-22 at 11:35 +0300, malc wrote: > After writing valgrind tool that was simulating Cell XER.SO syscall > (mis)behaviour (pre ab598b6680f1e74c267d1547ee352f3e1e530f89 that is) > and banging my had against the wall for a while trying to figure out > which of the failing syscalls was res

Re: Lock-up on PPC64

2009-02-22 Thread malc
On Wed, 7 Jan 2009, malc wrote: > On Wed, 7 Jan 2009, Benjamin Herrenschmidt wrote: > > > > > > As you wish :) I've written some ad-hoc stuff in the failing path which > > > manually triggers sysrq and then sends the klogctl output via network > > > and here it is: > > > > Allright, something's

Re: Lock-up on PPC64

2009-01-06 Thread malc
On Wed, 7 Jan 2009, Benjamin Herrenschmidt wrote: As you wish :) I've written some ad-hoc stuff in the failing path which manually triggers sysrq and then sends the klogctl output via network and here it is: Allright, something's unclear to me. What do you mean by the system goes down then ?

Re: Lock-up on PPC64

2009-01-06 Thread Benjamin Herrenschmidt
> As you wish :) I've written some ad-hoc stuff in the failing path which > manually triggers sysrq and then sends the klogctl output via network > and here it is: Allright, something's unclear to me. What do you mean by the system goes down then ? The kernel appears to be working at least to a

Re: Lock-up on PPC64

2009-01-06 Thread malc
On Tue, 6 Jan 2009, Benjamin Herrenschmidt wrote: On Mon, 2009-01-05 at 19:34 +0300, malc wrote: Before this change (atleast) mono_handle_native_sigsegv was executed (before machine locks-up hard) after the change this code path is never touched. The fact that machine locks up hard and not eve

Re: Lock-up on PPC64

2009-01-06 Thread Benjamin Herrenschmidt
On Mon, 2009-01-05 at 23:28 +1100, Michael Ellerman wrote: > I'm confused. Which code never exercises which path, and so what > deserves a better look? Well, the thing is with the fix that went into 2.6.30, some error path will never bit hit, or pretty much. Without the fix, what can happen is th

Re: Lock-up on PPC64

2009-01-06 Thread Benjamin Herrenschmidt
On Mon, 2009-01-05 at 19:34 +0300, malc wrote: > Before this change (atleast) mono_handle_native_sigsegv was executed > (before machine locks-up hard) after the change this code path is > never touched. > > The fact that machine locks up hard and not even magic sysrq works > is what deserves a bet

Re: Lock-up on PPC64

2009-01-05 Thread malc
On Mon, 5 Jan 2009, Michael Ellerman wrote: On Sun, 2008-12-28 at 03:45 +0300, malc wrote: On Thu, 25 Dec 2008, Benjamin Herrenschmidt wrote: On Wed, 2008-12-24 at 03:08 +0300, m...@pulsesoft.com wrote: Ken Moffat writes: On Tue, Dec 23, 2008 at 06:04:45AM +0300, m...@pulsesoft.com wrote:

Re: Lock-up on PPC64

2009-01-05 Thread Arnd Bergmann
On Sunday 28 December 2008, malc wrote: > Now to the Christmas cheer, i've tried v2.6.28 and couldn't help but > notice that the problem is gone, bisecting v2.6.27 (which funnily i > had to mark good) to v2.6.28 (which has to be marked bad) wasn't fun > but eventually converged at ab598b6680f1e74c2

Re: Lock-up on PPC64

2009-01-05 Thread Michael Ellerman
On Sun, 2008-12-28 at 03:45 +0300, malc wrote: > On Thu, 25 Dec 2008, Benjamin Herrenschmidt wrote: > > > On Wed, 2008-12-24 at 03:08 +0300, m...@pulsesoft.com wrote: > >> Ken Moffat writes: > >> > >>> On Tue, Dec 23, 2008 at 06:04:45AM +0300, m...@pulsesoft.com wrote: > > [..snip..] > > >> > >

Re: Lock-up on PPC64

2008-12-27 Thread malc
On Thu, 25 Dec 2008, Benjamin Herrenschmidt wrote: On Wed, 2008-12-24 at 03:08 +0300, m...@pulsesoft.com wrote: Ken Moffat writes: On Tue, Dec 23, 2008 at 06:04:45AM +0300, m...@pulsesoft.com wrote: [..snip..] Thanks for the reference, but i'm sure, now more than ever, that bad memory h