<see middle post>
>> -----Original Message----- >> From: Kumar Gala [mailto:ga...@kernel.crashing.org] >> Sent: Thursday, May 21, 2009 9:13 AM >> To: Morrison, Tom >> Cc: linuxppc-dev@ozlabs.org; Young, Andrew; Brown, Jeff >> Subject: Re: How to debug a hung multi-core system.... >> >> >> On May 20, 2009, at 6:17 PM, Morrison, Tom wrote: [Morrison, Tom] <snip some verbose explanations> >> > >> > Core 1 seems to be Idle loop - happily doing nothing >> > (and not servicing TCP and/or the console)... >> > >> > Core 0 seems to be 'stuck' at the "InstructionStorage" >> > Exception. And it seems to be going 'nowhere' fast >> > >> > SRR0 seems to point to this same spot (0xc00006C0) >> > SRR1 value is 0x00021200 >> > >> > I am at a loss to see how the kernel (and/or our kernel BSP) >> > cause this exception, and I am even more of a loss on figuring >> > out an application could cause this exception... >> >> This is a bit odd as we shouldn't see an ISI from 0xc00006C0. >> >> Are you able to single step Core0? Can you dump the contents of the >> TLBs on Core0 [Morrison, Tom] [Morrison, Tom] <snip some of verbose explanation> Yes, very odd... And I am able to get TLB entries from the core that is in Instruction Storage Exception, I made [Morrison, Tom] >BKM>tat Entry EPN RPN TID TMASK WIMGE TSIZ U0:3 X0:1 PID TS PROT SHEN UR UW UX SR SW SX TIDZ VAL IT0 0000C000 00000000 00 000 0A 0 0 0 0 0 U P D D D D D D D I IT1 0000C000 00000000 00 000 0A 0 0 0 0 0 U P D D D D D D D I IT2 0000C000 00000000 00 000 0A 0 0 0 0 0 U P D D D D D D D I IT3 0000C000 00000000 00 000 0A 0 0 0 0 0 U P D D D D D D D I DT0 0011C000 00000000 00 000 06 0 0 0 0 0 U P D D D D D D D I DT1 D435C000 20000000 00 000 1E 0 0 0 0 0 U P D D D D D D D I DT2 0011C000 00000000 00 000 06 0 0 0 0 0 U P D D D D D D D I DT3 D435C000 20000000 00 000 1E 0 0 0 0 0 U P D D D D D D D I LT0 C0000000 00000000 00 0FF 04 9 0 0 0 0 P P E E D E E D D V LT1 D0000000 01000000 00 0FF 04 9 0 0 0 0 P P E E D E E D D V LT2 E0000000 02000000 00 0FF 04 9 0 0 0 0 P P E E D E E D D V LT3 39A40000 027FF700 0D 000 06 E A 3 0 1 U S D D D E E D D I LT4 F924E000 7C054500 BA 000 0B E 0 3 0 0 P S E E D E E D D V LT5 82A9F000 46664C00 FB 000 1A F 4 2 0 0 U S E E D D E D D I LT6 80000000 1F000000 F2 0FF 1D 9 B 3 0 0 U S D E D E E E D V LT7 64000000 1F000000 B3 07F 02 8 B 0 0 1 U S D E D D E E D V LT8 E5BF1000 995EA900 96 000 0C D 8 0 0 1 U S D E E E E D D V LT9 7F3BF000 C6DF7300 DF 000 15 1 2 3 0 1 U S E D D E E E D I LT10 917C7000 EEA67F00 7F 000 17 C 5 3 0 1 P S E E E E E E D I LT11 6B000000 F5700000 BC 03F 04 7 D 0 0 1 P S E E E E E E D V LT12 712DB000 F1B59100 2A 000 19 C F 1 0 1 P S E E E E D E D V LT13 00000000 F0000000 7F 0FF 07 B 0 0 0 1 P S D D E E E E D V LT14 A3000000 FDD00000 C5 03F 16 7 E 3 0 1 P S E E E D D E D V LT15 F7F00000 B0B80000 82 00F 1F 5 F 0 0 1 P P E E D D D D D V To answer your 2nd question - we have about 10 processes, and about 60-70 threads total (30+ for the main processing process)... >> > Anybody have any ideas - and/or ways to re-configure our >> > setup to obtain more data? Or does this sound familiar to >> > a bug somebody has already found in the kernel? >> > >> > We are even having trouble defining a test program that can >> > cause (on purpose) the 'InstructionStorage' Exception (does >> > anybody have an simple 'c' (or ppc assembly) program that >> > causes this exception (so we can run in user application land >> > and see if the symptoms are similar))? >> > >> > Thank you in advance for any / all help you can provide.... >> > because I am completely stumped on even how to proceed! >> >> >> Is your application generating a lot of processes or have a lot of >> concurrent processes on the 8572? >> >> - k
_______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev