Package: linux-image-2.6.35.6 Version: 2.6.35.6-10.00.Custom Severity: important
Hello. First of all - this it my first bugreport to debian and I sorry if I do something wrong - just tell me what need to fix in it. I have 2 servers Dell 2950 and try to use it as a email cluster. I use DRBD with OCFS2 over it. Both nodes is reboot on heavy load every time. I report bug for a package linux-image-2.6.35.6 but it is not true - I have this problem on 2.6.26(stable) and 2.6.32(testing). I just try latest kernel to be sure. I try ocfs2-tools from stable and from testing - nodes reboot. I try DRBD8 from backports and then on 2.6.32 native and compile DRBD-8.3.8 from sourse with 2.6.35-6 - nodes reboot. So I think it is a kernel relaited but I can be really wrong. Im not sure what couse this reboots. What I do: 1) Create a DRBD md on both nodes drbdadm create-md drbd0 2) Sync it drbdadm -- --overwrite-data-of-peer primary drbd0 drbdsetup /dev/drbd0 syncer -r 110M 3) Make both primary drbdadm primary drbd0 4) Make FS mkfs.ocfs2 -L ocfs2_drbd -N 2 -T mail --fs-feature-level=max-features /dev/drbd0 5) Mount it on both nodes mount /var/spool/dovecot (fstab options - nodev,noauto,noatime,data=writeback) 6) Make folders for test mkdir /var/spool/dovecot/iozone1 mkdir /var/spool/dovecot/iozone2 7) Start IO test on both nodes in different folders iozone -RK -t 4 -s 10g -i 0 -i 1 -i 2 -b /tmp/`hostname`.xls 8) Allways got reboot after 30-180 min. Sometimes with stack trace and halt but not everytime. OCFS2 partition seems to work ok at normal work. P.S. If i was wrong to write this in sid like system - just tell me. This bug easly repeatable on stable or testing. -- System Information: Debian Release: squeeze/sid APT prefers testing APT policy: (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 2.6.35.6 (SMP w/4 CPU cores) Locale: LANG=ru_RU.UTF-8, LC_CTYPE=ru_RU.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages linux-image-2.6.35.6 depends on: ii coreutils 8.5-1 GNU core utilities ii debconf [debconf-2.0] 1.5.35 Debian configuration management sy linux-image-2.6.35.6 recommends no packages. Versions of packages linux-image-2.6.35.6 suggests: pn fdutils <none> (no description available) pn ksymoops <none> (no description available) pn linux-doc-2.6.35.6 | linux-so <none> (no description available) pn linux-image-2.6.35.6-dbg <none> (no description available) -- debconf information: linux-image-2.6.35.6/postinst/old-dir-initrd-link-2.6.35.6: true linux-image-2.6.35.6/prerm/removing-running-kernel-2.6.35.6: true linux-image-2.6.35.6/preinst/abort-overwrite-2.6.35.6: linux-image-2.6.35.6/postinst/old-system-map-link-2.6.35.6: true linux-image-2.6.35.6/preinst/already-running-this-2.6.35.6: linux-image-2.6.35.6/preinst/overwriting-modules-2.6.35.6: true linux-image-2.6.35.6/postinst/depmod-error-initrd-2.6.35.6: false linux-image-2.6.35.6/postinst/kimage-is-a-directory: linux-image-2.6.35.6/preinst/failed-to-move-modules-2.6.35.6: linux-image-2.6.35.6/postinst/depmod-error-2.6.35.6: false
node: ip_port = 7777 ip_address = 192.168.1.1 number = 0 name = mail01.fxclub.org cluster = ocfs2 node: ip_port = 7777 ip_address = 192.168.1.2 number = 1 name = mail02.fxclub.org cluster = ocfs2 cluster: node_count = 2 name = ocfs2
resource drbd0 { on mail01.fxclub.org { device /dev/drbd0; disk /dev/sda9; address 192.168.1.1:7789; meta-disk internal; } on mail02.fxclub.org { device /dev/drbd0; disk /dev/sda9; address 192.168.1.2:7789; meta-disk internal; } }
global { usage-count yes; # minor-count dialog-refresh disable-ip-verification } common { protocol C; handlers { # What should be done in case the node is primary, degraded (=no connection) and has inconsistent data. #pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; #pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /sbin/ifconfig eth1 down"; # The node is currently primary, but lost the after split brain auto recovery procedure. As as consequence it should go away. #pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; #pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /sbin/ifconfig eth1 down"; #local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; #outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; #split-brain "/usr/lib/drbd/notify-split-brain.sh root"; # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; } startup { wfc-timeout 60; degr-wfc-timeout 30; outdated-wfc-timeout 15; become-primary-on both; # wait-after-sb; } disk { fencing resource-and-stonith; # RAID WITH BBU ONLY!!! no-disk-flushes; no-md-flushes; no-disk-barrier; # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes # no-disk-drain no-md-flushes max-bio-bvecs } net { cram-hmac-alg sha1; shared-secret "password"; allow-two-primaries; ping-timeout 20; #after-sb-0pri discard-zero-changes; #after-sb-1pri discard-secondary; #after-sb-2pri disconnect; data-integrity-alg sha1; # Tuning max-buffers 8000; max-epoch-size 8000; sndbuf-size 0; # snd.buf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers # max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork } syncer { # MagaBYTE! Not Bit. rate 40M; al-extents 3389; # rate after al-extents use-rle cpu-mask verify-alg csums-alg } }
Driver for "configfs": Loaded Filesystem "configfs": Mounted Stack glue driver: Loaded Stack plugin "o2cb": Loaded Driver for "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster ocfs2: Online Heartbeat dead threshold = 31 Network idle timeout: 15000 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Not active
Stable: Message from sysl...@mail02 at Sep 16 09:03:19 ... kernel:[92182.173794] ------------[ cut here ]------------ Message from sysl...@mail02 at Sep 16 09:03:19 ... kernel:[92182.173872] invalid opcode: 0000 [#1] SMP Message from sysl...@mail02 at Sep 16 09:03:19 ... kernel:[92182.173899] last sysfs file: /sys/module/ocfs2/refcnt Testing: Message from sysl...@mail01 at Sep 16 15:18:37 ... kernel:[ 1432.310479] ------------[ cut here ]------------ Message from sysl...@mail01 at Sep 16 15:18:37 ... kernel:[ 1432.310648] invalid opcode: 0000 [#1] SMP Message from sysl...@mail01 at Sep 16 15:18:37 ... kernel:[ 1432.310801] last sysfs file: /sys/fs/o2cb/interface_revision Message from sysl...@mail01 at Sep 16 15:18:37 ... kernel:[ 1432.312251] Stack: Message from sysl...@mail01 at Sep 16 15:18:37 ... kernel:[ 1432.312251] Call Trace: Message from sysl...@mail01 at Sep 16 15:18:37 ... kernel:[ 1432.312251] Code: 83 c3 08 48 83 3b 00 eb ec 48 83 fd 10 0f 86 89 00 00 00 48 89 ef e8 b9 e8 ff ff 48 89 c7 48 8b 00 84 c0 78 13 66 a9 00 c0 75 04 <0f> 0b eb fe 5b 5d 41 5c e9 94 58 fd ff 48 8b 4c 24 18 4c 8b 4f Testing: 2.6.35 + DRBD 8.3.8 mail01:/usr/local/sbin# mount /var/spool/dovecot Message from sysl...@mail01 at Sep 28 07:00:25 ... kernel:[55921.451479] ------------[ cut here ]------------ Message from sysl...@mail01 at Sep 28 07:00:25 ... kernel:[55921.451530] invalid opcode: 0000 [#1] SMP Message from sysl...@mail01 at Sep 28 07:00:25 ... kernel:[55921.451557] last sysfs file: /sys/module/drbd/parameters/cn_idx Message from sysl...@mail01 at Sep 28 07:00:25 ... kernel:[55921.452451] Stack: Message from sysl...@mail01 at Sep 28 07:00:25 ... kernel:[55921.452623] Call Trace: Message from sysl...@mail01 at Sep 28 07:00:25 ... kernel:[55921.452841] Code: c5 10 48 83 7d 00 00 eb e6 48 83 fb 10 0f 86 80 00 00 00 48 89 df e8 a9 f0 ff ff 48 89 c6 48 8b 00 84 c0 78 16 66 a9 00 c0 75 04 <0f> 0b eb fe 5b 5d 41 5c 48 89 f7 e9 7d 75 fd ff 48 8b 4c 24 18 Message from sysl...@mail01 at Sep 28 07:00:25 ... kernel:[55921.461099] general protection fault: 0000 [#2] SMP Message from sysl...@mail01 at Sep 28 07:00:25 ... kernel:[55921.461269] last sysfs file: /sys/module/drbd/parameters/cn_idx mail01:/usr/local/sbin# Message from sysl...@mail01 at Sep 28 07:00:25 ... kernel:[55921.465065] Stack: Message from sysl...@mail01 at Sep 28 07:00:25 ... kernel:[55921.465065] Call Trace: Message from sysl...@mail01 at Sep 28 07:00:25 ... kernel:[55921.465065] Code: 0f 1f 44 00 00 49 89 c7 fa 66 0f 1f 44 00 00 65 4c 8b 04 25 b0 ea 00 00 48 8b 45 00 49 01 c0 49 8b 18 48 85 db 74 0d 48 63 45 18 <48> 8b 04 03 49 89 00 eb 11 83 ca ff 44 89 f6 48 89 ef e8 a1 f1 55921.451479] ------------[ cut here ]------------ [55921.451506] kernel BUG at mm/slub.c:2834! [55921.451530] invalid opcode: 0000 [#1] SMP [55921.451557] last sysfs file: /sys/module/drbd/parameters/cn_idx [55921.451584] CPU 1 [55921.451589] Modules linked in: ocfs2 jbd2 quota_tree drbd xt_multiport sha1_generic hmac lru_cache cn xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables ocf s2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs ext2 loop snd_pcm i5000_edac edac_core i5k_amb snd_timer processor snd evdev button rng_core shpchp soundcore snd_page_alloc tpm _tis pci_hotplug psmouse dcdbas tpm pcspkr tpm_bios serio_raw ext3 jbd mbcache ide_cd_mod uhci_hcd cdrom ata_generic ata_piix libata ses sd_mod enclosure crc_t10dif ehci_hcd megaraid_sas piix ide_core usbcor e scsi_mod nls_base bnx2 thermal thermal_sys [last unloaded: drbd] [55921.451964] [55921.451984] Pid: 2995, comm: udevd Not tainted 2.6.35.6 #1 0NH278/PowerEdge 2950 [55921.452027] RIP: 0010:[<ffffffff810df05d>] [<ffffffff810df05d>] kfree+0x5b/0xc8 [55921.452076] RSP: 0018:ffff88012aa61d58 EFLAGS: 00010246 [55921.452102] RAX: 0200000000000400 RBX: ffff880100000001 RCX: 0000000000000002 [55921.452131] RDX: ffffea0000000000 RSI: ffffea0003800000 RDI: ffff880100000001 [55921.452160] RBP: ffff8800375d8f00 R08: 0000000000000000 R09: 0000000000000000 [55921.452189] R10: ffff88012bce1070 R11: ffff8800375d8f00 R12: ffffffff810f061e [55921.452219] R13: 0000000018000040 R14: ffff88012c375cf0 R15: ffff88012bce1070 [55921.452248] FS: 00007f7646a967a0(0000) GS:ffff880001a40000(0000) knlGS:0000000000000000 [55921.452293] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [55921.452319] CR2: 00007f7646a9c000 CR3: 000000012d245000 CR4: 00000000000006e0 [55921.452349] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [55921.452378] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [55921.452407] Process udevd (pid: 2995, threadinfo ffff88012aa60000, task ffff880121f4d890) [55921.452451] Stack: [55921.452471] 0000000000000000 ffff8800375d8f00 ffff88012bce1070 ffffffff810f061e [55921.452505] <0> ffff880108000080 000000002bce1070 ffff88012c3759d0 ffff880100000001 [55921.452556] <0> 0000029d0000029d ffff8800375d8fa0 ffff88012f8a4900 ffff8800375d8f00 [55921.452623] Call Trace: [55921.452647] [<ffffffff810f061e>] ? vfs_rename+0x3d3/0x3e4 [55921.452674] [<ffffffff810f1c78>] ? sys_renameat+0x1aa/0x22b [55921.452702] [<ffffffff810d13ab>] ? free_pages_and_swap_cache+0x53/0x6e [55921.452732] [<ffffffff810c83fb>] ? tlb_finish_mmu+0x2a/0x33 [55921.452759] [<ffffffff810c8470>] ? remove_vma+0x6c/0x74 [55921.452786] [<ffffffff810c95d8>] ? do_munmap+0x307/0x329 [55921.452814] [<ffffffff810089c2>] ? system_call_fastpath+0x16/0x1b [55921.452841] Code: c5 10 48 83 7d 00 00 eb e6 48 83 fb 10 0f 86 80 00 00 00 48 89 df e8 a9 f0 ff ff 48 89 c6 48 8b 00 84 c0 78 16 66 a9 00 c0 75 04 <0f> 0b eb fe 5b 5d 41 5c 48 89 f7 e9 7d 75 fd ff 48 8b 4 c 24 18 [55921.453030] RIP [<ffffffff810df05d>] kfree+0x5b/0xc8 [55921.453057] RSP <ffff88012aa61d58> [55921.453437] ---[ end trace 3f96fca7c9cbfb03 ]--- [55921.454368] JBD: Ignoring recovery information on journal [55921.461099] general protection fault: 0000 [#2] SMP [55921.461269] last sysfs file: /sys/module/drbd/parameters/cn_idx [55921.461338] CPU 1 [55921.461385] Modules linked in: ocfs2 jbd2 quota_tree drbd xt_multiport sha1_generic hmac lru_cache cn xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs ext2 loop snd_pcm i5000_edac edac_core i5k_amb snd_timer processor snd evdev button rng_core shpchp soundcore snd_page_alloc tpm_tis pci_hotplug psmouse dcdbas tpm pcspkr tpm_bios serio_raw ext3 jbd mbcache ide_cd_mod uhci_hcd cdrom ata_generic ata_piix libata ses sd_mod enclosure crc_t10dif ehci_hcd megaraid_sas piix ide_core usbcore scsi_mod nls_base bnx2 thermal thermal_sys [last unloaded: drbd] [55921.464840] [55921.464902] Pid: 9281, comm: mount.ocfs2 Tainted: G D 2.6.35.6 #1 0NH278/PowerEdge 2950 [55921.464990] RIP: 0010:[<ffffffff810dffaa>] [<ffffffff810dffaa>] __kmalloc+0xd3/0x136 [55921.465065] RSP: 0018:ffff880103e21ba8 EFLAGS: 00010006 [55921.465065] RAX: 0000000000000000 RBX: 0800000000000000 RCX: ffffffffa0449421 [55921.465065] RDX: 0000000000000000 RSI: ffff88012cfaf000 RDI: 0000000000000004 [55921.465065] RBP: ffffffff81625520 R08: ffff880001a524d0 R09: 0000000000000000 [55921.465065] R10: ffff88012cfaf260 R11: ffff88012ca24420 R12: 000000000000000a [55921.465065] R13: 00000000000080d0 R14: 00000000000080d0 R15: 0000000000000246 [55921.465065] FS: 00007fee60afe720(0000) GS:ffff880001a40000(0000) knlGS:0000000000000000 [55921.465065] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [55921.465065] CR2: 00007f764630ab8c CR3: 000000012eae3000 CR4: 00000000000006e0 [55921.465065] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [55921.465065] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [55921.465065] Process mount.ocfs2 (pid: 9281, threadinfo ffff880103e20000, task ffff88012ca24420) [55921.465065] Stack: [55921.465065] 0000000000000000 ffffffffa0449421 ffff88012cfaf108 ffff88012cfaf000 [55921.465065] <0> ffff88012cfaf000 ffff88012cfaf000 ffff88012aa2e000 ffff88012ca24420 [55921.465065] <0> 0000000000000200 ffffffffa0449421 0000000000000000 ffffffffa044ccec [55921.465065] Call Trace: [55921.465065] [<ffffffffa0449421>] ? ocfs2_compute_replay_slots+0x31/0x10f [ocfs2] [55921.465065] [<ffffffffa0449421>] ? ocfs2_compute_replay_slots+0x31/0x10f [ocfs2] [55921.465065] [<ffffffffa044ccec>] ? ocfs2_journal_load+0x1d0/0x2b1 [ocfs2] [55921.465065] [<ffffffffa0473525>] ? ocfs2_fill_super+0x19a2/0x2101 [ocfs2] [55921.465065] [<ffffffff8118aa8f>] ? snprintf+0x36/0x3b [55921.465065] [<ffffffff810e9f9e>] ? get_sb_bdev+0x137/0x19a [55921.465065] [<ffffffffa0471b83>] ? ocfs2_fill_super+0x0/0x2101 [ocfs2] [55921.465065] [<ffffffff810e9675>] ? vfs_kern_mount+0xa6/0x196 [55921.465065] [<ffffffff810e97c4>] ? do_kern_mount+0x49/0xe7 [55921.465065] [<ffffffff810fdabb>] ? do_mount+0x75c/0x7d6 [55921.465065] [<ffffffff810d829a>] ? alloc_pages_current+0x9f/0xc2 [55921.465065] [<ffffffff810fdbbd>] ? sys_mount+0x88/0xc3 [55921.465065] [<ffffffff810089c2>] ? system_call_fastpath+0x16/0x1b [55921.465065] Code: 0f 1f 44 00 00 49 89 c7 fa 66 0f 1f 44 00 00 65 4c 8b 04 25 b0 ea 00 00 48 8b 45 00 49 01 c0 49 8b 18 48 85 db 74 0d 48 63 45 18 <48> 8b 04 03 49 89 00 eb 11 83 ca ff 44 89 f6 48 89 ef e8 a1 f1 [55921.465065] RIP [<ffffffff810dffaa>] __kmalloc+0xd3/0x136 [55921.465065] RSP <ffff880103e21ba8> [55921.465065] ---[ end trace 3f96fca7c9cbfb04 ]--- [55941.839304] o2net: accepted connection from node mail02.fxclub.org (num 1) at 192.168.1.2:7777 [55946.003594] o2dlm: Node 1 joins domain E4B99C68B65449068DC403326917DC29 [55946.003673] o2dlm: Nodes in domain E4B99C68B65449068DC403326917DC29: 0 1 Message from sysl...@mail01 at Sep 28 07:27:03 ... kernel:[57519.645448] general protection fault: 0000 [#3] SMP Message from sysl...@mail01 at Sep 28 07:27:03 ... kernel:[57519.645615] last sysfs file: /sys/module/drbd/parameters/cn_idx Message from sysl...@mail01 at Sep 28 07:27:03 ... kernel:[57519.649409] Stack: Message from sysl...@mail01 at Sep 28 07:27:03 ... kernel:[57519.649409] Call Trace: Message from sysl...@mail01 at Sep 28 07:27:03 ... kernel:[57519.649409] Code: 0f 1f 44 00 00 49 89 c7 fa 66 0f 1f 44 00 00 65 4c 8b 04 25 b0 ea 00 00 48 8b 45 00 49 01 c0 49 8b 18 48 85 db 74 0d 48 63 45 18 <48> 8b 04 03 49 89 00 eb 11 83 ca ff 44 89 f6 48 89 ef e8 a1 f1