3.9.2/3.9.3: stack overrun on s390x and ppc64 (WAS Re: 3.9.2: xfstests triggered panic)

2013-05-22 Thread CAI Qian
Original report:
http://oss.sgi.com/archives/xfs/2013-05/msg00683.html

Also seen on Power7:
http://marc.info/?l=linux-kernel&m=136927904900692&w=2

CAI Qian

- Original Message -
> From: "Dave Chinner" 
> To: "CAI Qian" 
> Cc: "LKML" , sta...@vger.kernel.org, 
> x...@oss.sgi.com
> Sent: Thursday, May 23, 2013 11:46:11 AM
> Subject: Re: 3.9.2: xfstests triggered panic
> 
> On Wed, May 22, 2013 at 11:16:56PM -0400, CAI Qian wrote:
> > ----- Original Message -
> > > From: "Dave Chinner" 
> > > To: "CAI Qian" 
> > > Cc: "LKML" , sta...@vger.kernel.org,
> > > x...@oss.sgi.com
> > > Sent: Wednesday, May 22, 2013 5:53:00 PM
> > > Subject: Re: 3.9.2: xfstests triggered panic
> > > 
> > > On Wed, May 22, 2013 at 04:39:58AM -0400, CAI Qian wrote:
> > > > Reproduced on almost all s390x guests by running xfstests.
> > > > 
> > > > 14634.396658¨ XFS (dm-1): Mounting Filesystem
> > > > 14634.525522¨ XFS (dm-1): Ending clean mount
> > > > 14640.413007¨  <0017c6d4>¨ idle_balance+0x1a0/0x340
> > > > 14640.413010¨  <0063303e>¨ __schedule+0xa22/0xaf0
> > > > 14640.428279¨  <00630da6>¨ schedule_timeout+0x186/0x2c0
> > > > 14640.428289¨  <001cf864>¨ rcu_gp_kthread+0x1bc/0x298
> > > > 14640.428300¨  <00158c5a>¨ kthread+0xe6/0xec
> > > > 14640.428304¨  <00634de6>¨ kernel_thread_starter+0x6/0xc
> > > > 14640.428308¨  <00634de0>¨ kernel_thread_starter+0x0/0xc
> > > > 14640.428311¨ Last Breaking-Event-Address:
> > > > 14640.428314¨  <0016bd76>¨ walk_tg_tree_from+0x3a/0xf4
> > > > 14640.428319¨  list_add corruption. next->prev should be prev
> > > > (0918
> > > > ), but was   (null). (next=  (null)).
> > > 
> > > Where's XFS in this? walk_tg_tree_from() is part of the scheduler
> > > code. This kind of implies a stack corruption
> > > 
> > > > Sometimes, this pops up,
> > > > [16907.275002] WARNING: at kernel/rcutree.c:1960
> > > > 
> > > > or this,
> > > > 15316.154171¨ XFS (dm-1): Mounting Filesystem
> > > > 15316.255796¨ XFS (dm-1): Ending clean mount
> > > > 15320.364246¨006367a2: e310b0080004lg
> > > > %r1,8(%r
> > > > 11)
> > > > 15320.364249¨006367a8: 41101010la
> > > > %r1,16(%
> > > > r1)
> > > > 15320.364251¨006367ac: e3301004lg
> > > > %r3,0(%r
> > > > 1)
> > > > 15320.364252¨ Call Trace:
> > > > 15320.364252¨ Last Breaking-Event-Address:
> > > > 15320.364253¨  � <>¨ Kernel stack overflow.
> > > > 15320.364308¨ CPU: 0 Tainted: GF   W3.9.2 #1
> > > > 15320.364309¨ Process rhts-test-runne (pid: 625, task:
> > > > 3dccc890,
> > > > ksp: 0
> > > 
> > >  and there you go - a stack overflow. Your kernel stack size is
> > > too small.
> > > 
> > > I'd suggest that you need 16k stacks on s390 - IIRC every function
> > > call has 128 byte stack frame, and there are call chains 70-80
> > > functions deep in the storage stack...
> > Hmm, I am unsure how to set to 16k stack there
> 
> Are you build a 64 bit s390 kernel or a 32 bit kernel? 32 bit
> kernels only have an 8k stack size, 64 bit kernels are 16k (see
> arch/s390/Makefile).
> 
> $ git grep STACK_SIZE arch/s390 |head -2
> arch/s390/Makefile:STACK_SIZE   := 8192
> arch/s390/Makefile:STACK_SIZE   := 16384
> 
> As it is, the stack frame usage is worse than I thought:
> 
> $ git grep STACK_FRAME_OVERHEAD arch/s390 |head -2
> arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 96  /*
> size of minimum stack frame */
> arch/s390/include/uapi/asm/ptrace.h:#define STACK_FRAME_OVERHEAD 160  /*
> size of minimum stack frame */
> 
> Overhead is 96 bytes for 32 bit and 160 bytes for 64 bit. So 16k
> stack size is going to have big troubles with a 70-80 function deep
> call chain.
> 
> As for powerpc:
> 
> arch/powerpc/include/asm/ppc_asm.h:#define STACKFRAMESIZE 256
> 
> Yeah, same issue.
> 
> But, seriously, these stack traces are meaningless to anyone not
> familiar with s390 or power7 - they indicate a problem detected
> in the idle loop, not where ever the stack overran.
> 
> Can you please work with the s390/power7 people to obtain whatever
> stack it was that overflowed, and we can go from there.
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: 3.9.2/3.9.3: stack overrun on s390x and ppc64 (WAS Re: 3.9.2: xfstests triggered panic)

2013-05-23 Thread CAI Qian
OK, here is clearer stack output from the run.
CAI Qian

+ ./check
FSTYP -- xfs (non-debug)
PLATFORM  -- Linux/s390x ibm-z10-23 3.9.3

001  29s
002  3s
003  2s
004  [not run] this test requires a valid $SCRATCH_DEV
005  2s
006  9s
007  10s
008  7s
009  [not run] this test requires a valid $SCRATCH_DEV
010  [not run] dbtest was not built for this platform
011  9s
012  10s
013  35s
014  5s
015  [not run] this test requires a valid $SCRATCH_DEV
016  [not run] this test requires a valid $SCRATCH_DEV
017  [not run] this test requires a valid $SCRATCH_DEV
018  [not run] this test requires a valid $SCRATCH_DEV
019  [not run] this test requires a valid $SCRATCH_DEV
020 


[ 1316.571227] XFS (dm-0): Mounting Filesystem
[ 1316.697803] XFS (dm-0): Ending clean mount
[ 1318.080615] XFS (dm-0): Ending clean mount
[ 1348.791125] XFS (dm-0): Mounting Filesystem
[ 1348.989166] XFS (dm-0): Ending clean mount
[ 1353.335478] XFS (dm-0): Mounting Filesystem
[ 1353.496364] XFS (dm-0): Ending clean mount
[ 1357.495427] XFS (dm-0): Mounting Filesystem
[ 1357.676971] XFS (dm-0): Ending clean mount
[ 1361.646399] XFS (dm-0): Mounting Filesystem
[ 1361.890426] XFS (dm-0): Ending clean mount
[ 1371.798944] XFS (dm-0): Mounting Filesystem
[ 1371.976922] XFS (dm-0): Ending clean mount
[ 1384.559103] XFS (dm-0): Mounting Filesystem
[ 1384.725657] XFS (dm-0): Ending clean mount
[ 1393.131347] XFS (dm-0): Mounting Filesystem
[ 1393.357927] XFS (dm-0): Ending clean mount
[ 1407.282708] XFS (dm-0): Mounting Filesystem
[ 1407.745176] XFS (dm-0): Ending clean mount
[ 1422.927074] XFS (dm-0): Mounting Filesystem
[ 1423.136266] XFS (dm-0): Ending clean mount
[ 1425.500910] XFS (dm-0): Mounting Filesystem
[ 1425.608851] XFS (dm-0): Ending clean mount
[ 1450.978110] XFS (dm-0): Mounting Filesystem
[ 1451.255368] XFS (dm-0): Ending clean mount
[ 1453.603742] XFS (dm-0): Mounting Filesystem
[ 1453.680657] XFS (dm-0): Ending clean mount
[ 1456.262266] XFS (dm-0): Mounting Filesystem
[ 1456.330515] XFS (dm-0): Ending clean mount
[ 1457.053767] XFS (dm-0): Mounting Filesystem
[ 1457.107258] XFS (dm-0): Ending clean mount
[ 1462.049374] XFS (dm-0): Mounting Filesystem
[ 1462.111389] XFS (dm-0): Ending clean mount
[ 1471.109589] ODEBUG: deactivate not available (active state 0) object type: ti
mer_list hint: process_timeout+0x0/0x8
[ 1471.109683] [ cut here ]
[ 1471.109688] WARNING: at lib/debugobjects.c:260
[ 1471.109692] Modules linked in: lockd(F) sunrpc(F) nf_conntrack_netbios_ns(F)
nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_nat(F) nf_nat_ipv6(F) ip6ta
ble_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(
F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ipv4(F)
 nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F)
 ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) qeth_l2(F
) vmur(F) xfs(F) libcrc32c(F) dasd_fba_mod(F) dasd_eckd_mod(F) lcs(F) dasd_mod(F
) ctcm(F) qeth(F) qdio(F) ccwgroup(F) fsm(F) dm_mirror(F) dm_region_hash(F) dm_l
og(F) dm_mod(F)
[ 1471.109848] CPU: 0 Tainted: GF3.9.3 #2
[ 1471.109858] Process swapper/0 (pid: 0, task: 00a2b4d0, ksp: 0
0a17d28)
[ 1471.109868] Krnl PSW : 0404c0018000 0046c84a (debug_print_object+
0xca/0xd8)
[ 1471.114762]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:
3
Krnl GPRS:  00a2b4d0 0067 0101f708
[ 1471.114769]0046c846 84a4d448 0086936a 000
001040700
[ 1471.114773]01a0f290 0401 00874cf8 000
000a395d8
[ 1471.114777]0195f820 0001bd20 0046c846 000
1bc20
[ 1471.114792] Krnl Code: 0046c83a: e3441004lg  %r4,0(%r
4,%r1)
   0046c840: c0e500139f88   brasl   %r14,6e0750
  #0046c846: a7f40001   brc 15,46c848
  >0046c84a: a7f4ffc2   brc 15,46c7ce
   0046c84e: a729   lghi%r2,0
   0046c852: a7f4ffd7   brc 15,46c800
   0046c856: 0707   bcr 0,%r7
   0046c858: ebaff0680024   stmg%r10,%r15,104(%r15)
[ 1471.114825] Call Trace:
[ 1471.114828] ([<0046c846>] debug_print_object+0xc6/0xd8)
[ 1471.114833]  [<0046d35c>] debug_object_deactivate+0x15c/0x160
[ 1471.114838]  [<00148244>] run_timer_softirq+0x180/0x464
[ 1471.114843]  [<0013d8d6>] __do_softirq+0x112/0x42c
[ 1471.114847]  [<0013ddf8>] irq_exit+0xc8/0xe8
[ 1471.114851]  [<0010d55e>] do_extint+0x25e/0x318
[ 1471.114859]  [<006f0d90>] ext_skip+0x40/0x44
[ 1471.114866]  [<006f05d6>] vtime_st