We appear to be underestimating block credits for quota synching. OCFS2_QSYNC_CREDITS.
Please file a bugzilla at oss.oracle.com/bugzilla so that we don't forget this. Possible temporary workarounds include: 1. Incrementing the above #define by a few. 2. Disabling quotas until we have a fix. Sunil On 08/30/2011 11:18 AM, Ryan wrote:
Server B is a mirror of Server A, both servers share identical software& kernel but different sparc64 cpu.. Using Kernel 3.0.3 on Debian Squeeze w/ocfs2-tools 1.6 from back ports. Server A shows no problems but Server B crashes regularly with: [24361.903485] kernel BUG at fs/jbd2/transaction.c:1083! [24361.969968] \|/ ____ \|/ [24361.969971] "@'/ .. \`@" [24361.969974] /_| \__/ |_\ [24361.969976] \__U_/ [24362.163308] kworker/1:3(29218): Kernel bad sw trap 5 [#1] [24362.234313] TSTATE: 0000008080001607 TPC: 0000000010292964 TNPC: 0000000010292968 Y: 00000000 Not tainted [24362.363592] TPC:<jbd2_journal_dirty_metadata+0xcc/0x150 [jbd2]> [24362.442576] g0: 0000000000000000 g1: 00000000008a2c00 g2: 0000000000000001 g3: 00000000008d2000 [24362.556987] g4: fffff8103df5d400 g5: fffff80001d5e000 g6: fffff81036e1c000 g7: 0000000000000e80 [24362.671456] o0: 000000000000003c o1: 000000001029c200 o2: 000000000000043b o3: 0000000000000001 [24362.785983] o4: fffff81036e1f788 o5: 0000000000000016 sp: fffff81036e1efd1 ret_pc: 000000001029295c [24362.905004] RPC:<jbd2_journal_dirty_metadata+0xc4/0x150 [jbd2]> [24362.984026] l0: fffff8123af5a000 l1: fffff8123e3c35e0 l2: fffff8003f8b1a00 l3: 0000000000100000 [24363.098509] l4: 001704f565f50017 l5: fffff81229af3018 l6: 00000000004b4262 l7: 0000000046b4b880 [24363.212977] i0: fffff8123e3541f8 i1: fffff81229c18478 i2: 0000000000000004 i3: 0000000000000001 [24363.327448] i4: 0000000000000000 i5: 0000000000000000 i6: fffff81036e1f081 i7: 00000000106d499c [24363.442013] I7:<ocfs2_journal_dirty+0x48/0x74 [ocfs2]> [24363.510626] Call Trace: [24363.542676] [00000000106d499c] ocfs2_journal_dirty+0x48/0x74 [ocfs2] [24363.627464] [000000001071a2bc] ocfs2_modify_bh+0x1a0/0x250 [ocfs2] [24363.709888] [000000001071cc10] ocfs2_local_write_dquot+0xe0/0x194 [ocfs2] [24363.800344] [0000000010720ca0] ocfs2_sync_dquot_helper+0x20c/0x2c4 [ocfs2] [24363.891923] [000000000055d95c] dquot_scan_active+0x98/0xf4 [24363.965190] [000000001071f6c4] qsync_work_fn+0x18/0x34 [ocfs2] [24364.043029] [000000000047f87c] process_one_work+0x2bc/0x434 [24364.117443] [000000000047feb8] worker_thread+0x264/0x454 [24364.188357] [000000000048369c] kthread+0x5c/0x70 [24364.250123] [000000000042ad78] kernel_thread+0x30/0x48 [24364.318740] [0000000000483784] kthreadd+0xd4/0x120 [24364.382789] Disabling lock debugging due to kernel taint [24364.452618] Caller[00000000106d499c]: ocfs2_journal_dirty+0x48/0x74 [ocfs2] [24364.544243] Caller[000000001071a2bc]: ocfs2_modify_bh+0x1a0/0x250 [ocfs2] [24364.633527] Caller[000000001071cc10]: ocfs2_local_write_dquot+0xe0/0x194 [ocfs2] [24364.730878] Caller[0000000010720ca0]: ocfs2_sync_dquot_helper+0x20c/0x2c4 [ocfs2] [24364.829320] Caller[000000000055d95c]: dquot_scan_active+0x98/0xf4 [24364.909499] Caller[000000001071f6c4]: qsync_work_fn+0x18/0x34 [ocfs2] [24364.994206] Caller[000000000047f87c]: process_one_work+0x2bc/0x434 [24365.075491] Caller[000000000047feb8]: worker_thread+0x264/0x454 [24365.153262] Caller[000000000048369c]: kthread+0x5c/0x70 [24365.221895] Caller[000000000042ad78]: kernel_thread+0x30/0x48 [24365.297377] Caller[0000000000483784]: kthreadd+0xd4/0x120 [24365.368286] Instruction DUMP: 11040a70 7c065c67 90122200<91d02005> 7c0ccdba 92100019 c25c6028 80a04012 2260000c [24365.510168] Unable to handle kernel paging request at virtual address ffffffffffffe000 [24365.614294] tsk->{mm,active_mm}->context = 0000000000000c9e [24365.687485] tsk->{mm,active_mm}->pgd = fffff81033b5e000 [24365.756111] \|/ ____ \|/ [24365.756113] "@'/ .. \`@" [24365.756116] /_| \__/ |_\ [24365.756118] \__U_/ [24365.949404] kworker/1:3(29218): Oops [#2] [24366.002020] TSTATE: 0000000011e01605 TPC: 0000000000483314 TNPC: 000000000047ec78 Y: 00000000 Tainted: G D [24366.142709] TPC:<kthread_data+0x8/0xc> [24366.193025] g0: fffff8103b5a2b18 g1: 0000000000000000 g2: fffff80002602280 g3: 0000000000000006 [24366.307407] g4: fffff8103df5d400 g5: fffff80001d5e000 g6: fffff81036e1c000 g7: fffffffff0825c48 [24366.421781] o0: fffff8103df5d400 o1: fffff8103df5d400 o2: 0000000000000001 o3: 000000000059c82c [24366.536159] o4: 00000000000003b1 o5: 0000000000000000 sp: fffff81036e1e951 ret_pc: 000000000047ec70 [24366.655108] RPC:<wq_worker_sleeping+0x8/0xd0> [24366.713433] l0: 00000000008f4800 l1: 000000000059c82c l2: 0000000000000001 l3: fffff8123f6c3260 [24366.827815] l4: 00000000007d36a0 l5: fffff8003f852300 l6: 0000000000000000 l7: 0000000000007222 [24366.942188] i0: fffff8103df5d400 i1: 0000000000000001 i2: 0000000000000001 i3: 0000000000000001 [24367.056565] i4: fffff8123f6c3260 i5: 0000000000000004 i6: fffff81036e1ea01 i7: 0000000000731420 [24367.170945] I7:<schedule+0x148/0x84c> [24367.220112] Call Trace: [24367.252139] [0000000000731420] schedule+0x148/0x84c [24367.317337] [00000000004694f4] do_exit+0x760/0x788 [24367.381385] [0000000000427c30] die_if_kernel+0x2a4/0x2cc [24367.452297] [0000000000429f20] bad_trap+0x88/0xfc [24367.515206] [00000000004220b0] tl0_resv104+0x30/0xa0 [24367.581550] [0000000010292964] jbd2_journal_dirty_metadata+0xcc/0x150 [jbd2] [24367.675360] [00000000106d499c] ocfs2_journal_dirty+0x48/0x74 [ocfs2] [24367.760107] [000000001071a2bc] ocfs2_modify_bh+0x1a0/0x250 [ocfs2] [24367.842556] [000000001071cc10] ocfs2_local_write_dquot+0xe0/0x194 [ocfs2] [24367.933015] [0000000010720ca0] ocfs2_sync_dquot_helper+0x20c/0x2c4 [ocfs2] [24368.024591] [000000000055d95c] dquot_scan_active+0x98/0xf4 [24368.097818] [000000001071f6c4] qsync_work_fn+0x18/0x34 [ocfs2] [24368.175668] [000000000047f87c] process_one_work+0x2bc/0x434 [24368.250011] [000000000047feb8] worker_thread+0x264/0x454 [24368.320924] [000000000048369c] kthread+0x5c/0x70 [24368.382687] [000000000042ad78] kernel_thread+0x30/0x48 [24368.451312] Caller[0000000000731420]: schedule+0x148/0x84c [24368.523367] Caller[00000000004694f4]: do_exit+0x760/0x788 [24368.594281] Caller[0000000000427c30]: die_if_kernel+0x2a4/0x2cc [24368.672057] Caller[0000000000429f20]: bad_trap+0x88/0xfc [24368.741824] Caller[00000000004220b0]: tl0_resv104+0x30/0xa0 [24368.815032] Caller[000000001029295c]: jbd2_journal_dirty_metadata+0xc4/0x150 [jbd2] [24368.915706] Caller[00000000106d499c]: ocfs2_journal_dirty+0x48/0x74 [ocfs2] [24369.007311] Caller[000000001071a2bc]: ocfs2_modify_bh+0x1a0/0x250 [ocfs2] [24369.096627] Caller[000000001071cc10]: ocfs2_local_write_dquot+0xe0/0x194 [ocfs2] [24369.193954] Caller[0000000010720ca0]: ocfs2_sync_dquot_helper+0x20c/0x2c4 [ocfs2] [24369.292393] Caller[000000000055d95c]: dquot_scan_active+0x98/0xf4 [24369.372478] Caller[000000001071f6c4]: qsync_work_fn+0x18/0x34 [ocfs2] [24369.457191] Caller[000000000047f87c]: process_one_work+0x2bc/0x434 [24369.538398] Caller[000000000047feb8]: worker_thread+0x264/0x454 [24369.616171] Caller[000000000048369c]: kthread+0x5c/0x70 [24369.684797] Caller[000000000042ad78]: kernel_thread+0x30/0x48 [24369.760283] Caller[0000000000483784]: kthreadd+0xd4/0x120 [24369.831194] Instruction DUMP: d0407ff0 c25a2258 81c3e008<d0587ff8> 81c3e008 90102000 82022008 c0722018 c2722010 [24369.973018] Fixing recursive fault but reboot is needed! [24395.676061] BUG: NMI Watchdog detected LOCKUP on CPU1, ip 00733828, registers: [24395.770979] TSTATE: 0000009911e01603 TPC: 0000000000733828 TNPC: 000000000073382c Y: 00000000 Tainted: G D [24395.911668] TPC:<_raw_spin_trylock_bh+0x5c/0xf8> [24395.973416] g0: 00000000000568aa g1: 00000000000000ff g2: 0000000000831930 g3: 0000000000831810 [24396.087798] g4: fffff8103df5d400 g5: fffff80001d5e000 g6: fffff81036e1c000 g7: 00000000efffffff [24396.202174] o0: fffff80002602280 o1: 000000000083ff40 o2: 000000000083ff8c o3: 00000000007bfe90 [24396.316549] o4: 0000000000000001 o5: 000000000000000e sp: fffff81036e1e2d1 ret_pc: 0000000000731398 [24396.435499] RPC:<schedule+0xc0/0x84c> [24396.484672] l0: 0000000000000000 l1: fffff80002602280 l2: fffff8103df5d400 l3: 00000000008f2230 [24396.599057] l4: fffff8103df5d6b0 l5: 0000000000000001 l6: 00000000008f2230 l7: 000000003b9aca00 [24396.713431] i0: 0000000000000042 i1: 0000000000000000 i2: fffff8103df5d400 i3: ffffffffffffffff [24396.827806] i4: 0000000000000000 i5: 000000000000000e i6: fffff81036e1e441 i7: 0000000000468ea8 [24396.942182] I7:<do_exit+0x114/0x788> [24396.990209] Call Trace: [24397.022236] [0000000000468ea8] do_exit+0x114/0x788 [24397.086288] [0000000000427c30] die_if_kernel+0x2a4/0x2cc [24397.157201] [0000000000734884] unhandled_fault+0x84/0x90 [24397.228113] [00000000007357c4] do_sparc64_fault+0xf34/0xffc [24397.302458] [00000000004079e8] sparc64_realfault_common+0x10/0x20 [24397.383662] [0000000000483314] kthread_data+0x8/0xc [24397.448858] [0000000000731420] schedule+0x148/0x84c [24397.514051] [00000000004694f4] do_exit+0x760/0x788 [24397.578100] [0000000000427c30] die_if_kernel+0x2a4/0x2cc [24397.649014] [0000000000429f20] bad_trap+0x88/0xfc [24397.698771] BUG: NMI Watchdog detected LOCKUP on CPU0, ip 007337f8, registers: [24397.698788] TSTATE: 0000009980e01605 TPC: 00000000007337f8 TNPC: 00000000007337fc Y: 00000000 Tainted: G D [24397.698813] TPC:<_raw_spin_trylock_bh+0x2c/0xf8> [24397.698819] g0: 00000000000162b9 g1: 00000000000000ff g2: fffff8003f816120 g3: 0000000000832954 [24397.698826] g4: fffff8123f1cc800 g5: fffff80001b5e000 g6: fffff8123f1c4000 g7: 0000000000000400 [24397.698833] o0: fffff80002602280 o1: fffff80002602280 o2: fffffffffffffffc o3: 0000000000000002 [24397.698839] o4: 0000000000000002 o5: 000000000000000e sp: fffff8123f1c6f71 ret_pc: 000000000046264c [24397.698857] RPC:<load_balance+0x1e8/0x654> [24397.698863] l0: fffff80002602280 l1: 00000000000162b9 l2: fffff8003f816100 l3: fffff8003f816118 [24397.698870] l4: 000000000000763d l5: fffff80002400b40 l6: 00000000008a4280 l7: 0000000000001fff [24397.698876] i0: 0000000000000000 i1: fffff80002402280 i2: fffff8003f862d80 i3: 0000000000000002 [24397.698883] i4: fffff8123f1c7a0c i5: 00000000008a4280 i6: fffff8123f1c70a1 i7: 000000000073162c [24397.698891] I7:<schedule+0x354/0x84c> [24397.698895] Call Trace: [24397.698901] [000000000073162c] schedule+0x354/0x84c [24397.698961] [0000000010233660] md_super_wait+0x60/0x80 [md_mod] [24397.698974] [0000000010236d58] md_update_sb+0x2e4/0x370 [md_mod] [24397.698987] [0000000010238018] md_check_recovery+0x2d4/0x70c [md_mod] [24397.699001] [000000001025596c] raid10d+0xc/0x668 [raid10] [24397.699013] [0000000010235d24] md_thread+0x114/0x138 [md_mod] [24397.699021] [000000000048369c] kthread+0x5c/0x70 [24397.699034] [000000000042ad78] kernel_thread+0x30/0x48 [24397.699039] [0000000000483784] kthreadd+0xd4/0x120 [24397.699043] Call Trace: [24397.699052] [00000000004209f4] tl0_irq15+0x14/0x20 [24397.699059] [00000000007337f8] _raw_spin_trylock_bh+0x2c/0xf8 [24397.699065] [000000000073162c] schedule+0x354/0x84c [24397.699077] [0000000010233660] md_super_wait+0x60/0x80 [md_mod] [24397.699089] [0000000010236d58] md_update_sb+0x2e4/0x370 [md_mod] [24397.699101] [0000000010238018] md_check_recovery+0x2d4/0x70c [md_mod] [24397.699110] [000000001025596c] raid10d+0xc/0x668 [raid10] [24397.699121] [0000000010235d24] md_thread+0x114/0x138 [md_mod] [24397.699127] [000000000048369c] kthread+0x5c/0x70 [24397.699133] [000000000042ad78] kernel_thread+0x30/0x48 [24397.699138] [0000000000483784] kthreadd+0xd4/0x120 [24400.543852] [00000000004220b0] tl0_resv104+0x30/0xa0 [24400.610194] [0000000010292964] jbd2_journal_dirty_metadata+0xcc/0x150 [jbd2] [24400.704005] [00000000106d499c] ocfs2_journal_dirty+0x48/0x74 [ocfs2] [24400.788752] [000000001071a2bc] ocfs2_modify_bh+0x1a0/0x250 [ocfs2] [24400.871203] [000000001071cc10] ocfs2_local_write_dquot+0xe0/0x194 [ocfs2] [24400.961662] [0000000010720ca0] ocfs2_sync_dquot_helper+0x20c/0x2c4 [ocfs2] [24401.053235] Call Trace: [24401.085261] [00000000004209f4] tl0_irq15+0x14/0x20 [24401.149311] [0000000000733828] _raw_spin_trylock_bh+0x5c/0xf8 [24401.225944] [0000000000468ea8] do_exit+0x114/0x788 [24401.289992] [0000000000427c30] die_if_kernel+0x2a4/0x2cc [24401.360907] [0000000000734884] unhandled_fault+0x84/0x90 [24401.431818] [00000000007357c4] do_sparc64_fault+0xf34/0xffc [24401.506161] [00000000004079e8] sparc64_realfault_common+0x10/0x20 [24401.587369] [0000000000483314] kthread_data+0x8/0xc [24401.652563] [0000000000731420] schedule+0x148/0x84c [24401.717756] [00000000004694f4] do_exit+0x760/0x788 [24401.781807] [0000000000427c30] die_if_kernel+0x2a4/0x2cc [24401.852719] [0000000000429f20] bad_trap+0x88/0xfc [24401.915625] [00000000004220b0] tl0_resv104+0x30/0xa0 [24401.981967] [0000000010292964] jbd2_journal_dirty_metadata+0xcc/0x150 [jbd2] [24402.075777] [00000000106d499c] ocfs2_journal_dirty+0x48/0x74 [ocfs2] [24402.160524] [000000001071a2bc] ocfs2_modify_bh+0x1a0/0x250 [ocfs2] _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users