Hi, I have an issue with ocfs2 and I am not quite sure, where the problem is. I would be grateful for any feedback. The issue looks like a multipath issue, however I have redundant links, so not quite sure why ocfs2 would barf and bring the server down.
I have a set of production servers that have started showing the same error. I am not aware of any changes within the infrastructure. setup is. 4 off Equallogic ps6100X. lots of Dell R610 servers, all with multiple ISCSI interfaces. This has happened on 3 different servers in the last week, causing the servers to hang. I have checked all switches and logs and can see no flapping interfaces. I can see the ISCSI initiator make logout and login requests during this time period. I See in the logs Apr 22 15:53:09 servername multipathd: eql-0-8a0906-2d6a4c605-13244eee0b250b79_a: Entering recovery mode: max_retries=5 Apr 22 15:53:09 servername multipathd: 8:176: mark as failed Apr 22 15:53:09 servername multipathd: 8:16: mark as failed Apr 22 15:53:09 servername multipathd: 8:48: mark as failed Apr 22 15:53:09 servername multipathd: 8:64: mark as failed Apr 22 15:53:09 servername multipathd: 8:128: mark as failed Apr 22 15:53:09 servername multipathd: 8:160: mark as failed Apr 22 15:53:09 servername multipathd: eql-0-8a0906-2d6a4c605-13244eee0b250b79_a: Entering recovery mode: max_retries=5 Apr 22 15:53:09 servername multipathd: 8:176: mark as failed Apr 22 15:53:09 servername multipathd: 8:16: mark as failed Apr 22 15:53:09 servername multipathd: 8:48: mark as failed Apr 22 15:53:09 servername multipathd: 8:64: mark as failed Apr 22 15:53:09 servername multipathd: 8:128: mark as failed Apr 22 15:53:09 servername multipathd: 8:160: mark as failed Apr 22 15:53:11 servername kernel: (kmpathd/6,2888,6):o2hb_bio_end_io:241 ERROR: IO Error -5 Apr 22 15:53:11 servername kernel: Buffer I/O error on device dm-7, logical block 480 Apr 22 15:53:11 servername kernel: lost page write due to I/O error on dm-7 Apr 22 15:53:11 servername kernel: scsi 114:0:0:0: rejecting I/O to dead device Apr 22 15:53:11 servername kernel: device-mapper: multipath: Failing path 8:176. Apr 22 15:53:11 servername kernel: (o2hb-1B3B9BEE63,4754,7):o2hb_do_disk_heartbeat:772 ERROR: status = -5 Apr 22 15:53:11 servername multipathd: dm-4: add map (uevent) Apr 22 15:53:11 servername kernel: scsi 115:0:0:0: rejecting I/O to dead device Apr 22 15:53:11 servername kernel: device-mapper: multipath: Failing path 8:16. Apr 22 15:53:11 servername multipathd: dm-4: devmap already registered Apr 22 15:53:11 servername multipathd: dm-4: add map (uevent) Apr 22 15:53:11 servername multipathd: dm-4: devmap already registered Apr 22 15:53:11 servername multipathd: dm-3: add map (uevent) Apr 22 15:53:11 servername kernel: scsi 110:0:0:0: rejecting I/O to dead device Apr 22 15:53:11 servername kernel: device-mapper: multipath: Failing path 8:48. Apr 22 15:53:17 servername multipathd: asvolume: load table [0 629145600 multipath 0 0 1 1 round-robin 0 6 1 8:32 10 8:80 10 8:96 10 8:112 10 8:144 10 8:16 10] Apr 22 15:53:17 servername multipathd: dm-2: add map (uevent) Apr 22 15:53:17 servername multipathd: dm-2: devmap already registered Apr 22 15:53:17 servername multipathd: dm-8: add map (uevent) Apr 22 15:53:17 servername iscsid: Connection117:0 to [target: iqn.2001-05.com.equallogic:0-8a0906-2d6a4c605-13244eee 0b250b79-as14volumeocfs2, portal: 192.168.5.100,3260] through [iface: eql.eth2_2] is operational now Apr 22 15:53:22 servername multipathd: dm-3: add map (uevent) Apr 22 15:53:22 servername multipathd: dm-3: devmap already registered Apr 22 15:53:22 servername multipathd: dm-4: add map (uevent) Apr 22 15:53:22 servername multipathd: dm-4: devmap already registered Apr 22 15:53:22 servername multipathd: dm-5: add map (uevent) Apr 22 15:53:22 servername multipathd: dm-5: devmap already registered Apr 22 15:53:22 servername multipathd: dm-9: add map (uevent) Apr 22 15:53:22 servername multipathd: dm-9: devmap already registered Apr 22 15:53:22 servername kernel: get_page_tbl ctx=0xffff810623d041c0 (253:6): bits=2, mask=0x3, num=20480, max=2048 0 Then the ocfs2 has an issue. Apr 22 15:53:23 servername kernel: (ocfs2cmt,4773,6):ocfs2_commit_cache:191 ERROR: status = -5 Apr 22 15:53:23 servername kernel: (ocfs2cmt,4773,6):ocfs2_commit_thread:1799 ERROR: status = -5 Apr 22 15:53:23 servername kernel: (ocfs2cmt,4773,6):ocfs2_commit_cache:191 ERROR: status = -5 then Apr 22 15:53:23 servername kernel: s2cmt,4773,6):ocfs2<3>(ocfs2c<3>(ocfs2cmt,4773,6):ocfs2_commit_cache:191 ERROR: status = - 5 Apr 22 15:53:23 servername kernel: (ocfs2cmt,4773,6):ocfs2_commi<3>(ocfs2cm<<3>(<3>(ocfs2cmt,4773,6):ocfs2_commit_cache:191 E RROR: status = -5 Apr 22 15:53:23 servername kernel: (ocfs2<3>(ocfs2cmt,47<3>(ocf<3>(ocfs2cmt,47<3>(ocfs<3>(ocfs2cmt,4<3>(ocf<3>(ocfs<3>(ocf<3> (ocfs2cm<3>(o<3>(ocfs2cm<3>(ocf<3>(ocfs2cmt<3>(o<3>(ocfs2cmt<3>(ocfs2cm<3>(ocfs2c<3><3>(ocfs2<3>(oc<3>(ocfs2cmt,<3>(ocf<3>(oc fs2cmt,47<3>(ocf<3>(ocfs2cmt,47<3>(ocfs<3>(ocfs2c<3>(o<3>(ocfs2c<3>(oc<3>(ocfs2cmt,47<3>(o<3>(ocfs2cmt,477<3>(ocfs<3>(ocfs2c< 3>(ocf<3>(ocfs2cmt<3>(<3>(ocfs2cmt,4773<3>(oc<3>(ocfs2cmt,<3>(oc<3>(ocfs2cmt<3>(ocfs<3>(ocfs2cm<3>(oc<3>(ocfs<3>(oc<3>(ocf<3> (ocfs2cmt,<3>(oc<3>(ocfs2cmt<3>(ocfs2<3>(ocfs2<3>(<3>(ocfs2cmt,4773,<3>(oc<3>(ocfs2cmt,4773,<3>(ocfs<3>(ocfs2cmt<3>(oc<3>(ocf s2cmt,477<3>(ocf<3>(ocfs2cmt,477<3>(<3>(ocfs2cmt,<3>(oc<3>(ocfs2cmt,<3>(o<3>(ocfs2cmt<3>(ocfs<3>(ocfs2c<3>(ocf<3>(ocfs2cmt<3> (ocfs<3>(ocfs2c<3>(ocf<3>(ocfs2cmt<3>(<3>(ocfs2<3>(ocf<3>(ocfs2cmt<3>(oc<3>(ocfs2cmt<3>(oc<3>(ocfs<3>(ocfs2<3>(ocfs2c<3>(o<3> (ocfs2cmt,4<3>(ocf<3>(ocfs2<3>(oc<3>(ocfs2cm<3>(oc<3>(ocfs2cmt<3>(oc<3>(ocfs2cmt<3>(ocfs<3>(ocfs2cmt,<3>(ocfs<3>(ocfs2c<3>(oc fs2<3>(ocfs2c<3>(ocfs2c<3>(ocf Apr 22 15:53:23 servername kernel: 2cmt,4773,6):<3>(ocf<3>(ocfs2cmt,<3>(ocfs2<3>(ocfs2cmt,<3>(ocfs<3>(ocfs2cmt<3>(ocf<3>(ocfs 2cmt,47<3>(ocf<3>(ocfs2cmt,47<3>(ocfs<3>(ocfs2cmt,<3>(o<3>(ocfs2cmt,4<3>(ocf<3>(ocfs2cmt<3>(ocf<3>(ocfs2cmt<3>(ocf<3>(ocfs2cm t,<3>(ocf<3>(ocfs2cmt<3>(ocfs2<3>(ocfs2cmt<3>(<3>(ocfs2cm<3>(ocfs<3>(ocfs2cmt<3>(ocfs2<3>(ocfs2cmt<3>(oc<3>(ocfs2cmt<3>(ocfs< 3>(ocfs2<3>(ocf<3>(ocfs2cmt,4773,<3>(oc<3>(ocfs2cm<3>(ocfs2<3>(ocfs2cm<3>(oc<3>(ocfs2cmt,4773,6):<3>(<3>(ocfs2cmt<3>(oc<3>(oc fs2cm<3>(ocfs2<3>(ocfs2cmt<3>(o<3>(ocfs2cmt<3>(ocf<3>(ocfs2c<3>(ocfs2c<3>(ocfs2cmt,<3>(oc<3>(ocfs2c<3>(ocfs2cm<3>(ocfs2cmt<3> (o<3>(ocfs2cmt<3>(o<3>(ocfs2cm<3><3>(ocfs2cmt<3>(ocfs2c<3>(ocfs2cmt,<3>(o<3>(ocfs2cmt<3>(ocf<3>(ocfs2cmt<3>(ocf<3>(ocfs2cmt<3 >(o<3>(ocfs2<3>(oc<3>(ocfs2cmt,47<3>(oc<3>(ocfs2cmt,4773,6<3>(o<3>(ocfs2cm<3>(ocf<3>(ocfs2<3>(o<3>(ocfs2<3>(<3>(ocfs2cm<3>(oc <3>(ocfs<3>(ocfs2c<3>(ocfs2cmt<3>(o<3>(ocfs2cm<3>(ocf<3>(ocfs2cmt<3><3>(ocfs2cmt,<3>(o<3>(ocfs2cmt,4<3>(oc<3>(ocfs2c<3>(o<3>( ocfs2cmt,<3>(o<3>(ocfs2cmt<3>( Repeated thousands of times and bringing the server to a halt. cat /etc/multipath.conf blacklist { devnode "^sd[a]$" } ## Use user friendly names, instead of using WWIDs as names. defaults { user_friendly_names yes } multipaths { multipath { wwid 36090a058604c6a2d790b250bee4exxxx alias asvolume path_grouping_policy multibus #path_checker readsector0 path_selector "round-robin 0" failback immediate rr_weight priorities rr_min_io 10 no_path_retry 5 } } devices { device { vendor "EQLOGIC" product "100E-00" path_grouping_policy multibus getuid_callout "/sbin/scsi_id -g -u -s /block/%n" #features "1 queue_if_no_path" path_checker readsector0 path_selector "round-robin 0" failback immediate rr_min_io 10 rr_weight priorities } } cat /etc/ocfs2/cluster.conf node: ip_port = 8888 ip_address = x.x.x.x number = 9 name = servername cluster = ocfs node: ip_port = 8888 ip_address = x.x.x.x number = 109 name = servername1 cluster = ocfs more nodes in here cluster: node_count = 22 name = ocfs Cluster consists of 14 nodes. /etc/init.d/o2cb status Driver for "configfs": Loaded Filesystem "configfs": Mounted Driver for "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster ocfs: Online Heartbeat dead threshold = 61 Network idle timeout: 30000 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Active Server and package information. cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.10 (Tikanga) rpm -qa | grep multipath device-mapper-multipath-0.4.7-59.el5 rpm -qa | grep ocfs2 ocfs2-2.6.18-371.3.1.el5-1.4.10-1.el5 ocfs2-tools-1.4.4-1.el5 ocfs2console-1.4.4-1.el5 rpm -qa | grep kernel kernel-2.6.18-371.3.1.el5 modinfo ocfs2 filename: /lib/modules/2.6.18-371.3.1.el5/kernel/fs/ocfs2/ocfs2.ko license: GPL author: Oracle version: 1.4.10 description: OCFS2 1.4.10 Thu Dec 5 16:38:36 PST 2013 (build b703e5e0906b370c876b657dabe8d4c8) srcversion: 41115DB9EFDAA5735C18810 depends: ocfs2_dlm,jbd,ocfs2_nodemanager vermagic: 2.6.18-371.3.1.el5 SMP mod_unload gcc-4.1
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users