Hi Bernd, The error message shows that connection between node 1 and 2 cannot be set up. So you should make sure network is up before mounting and can be reached through port 7777.
Thanks, Joseph On 2016/10/25 3:36, Lentes, Bernd wrote: > Hi, > > i have two nodes (SLES 11 SP4 64bit), which are connected via FC to a SAN. On > the SAN i created an OCFS2 volume. > One host (let's call him 20) mounts the OCFS2 volume while booting > automatically. The other (let's call him 10) doesn't. > Here is my /etc/ocfs2/cluster.conf: > > cluster: > node_count = 2 > name = idg > > node: > ip_port = 7777 > ip_address = 192.168.100.10 > number = 1 > name = sunhb65277 > cluster = idg > > node: > ip_port = 7777 > ip_address = 192.168.100.20 > number = 2 > name = sunhb58820 > cluster = idg > > > 192.168.100.10 is host 10, 192.168.100.20 is host 20. File is identical on > both nodes. > > /etc/fstab: > /dev/disk/by-id/dm-uuid-mpath-3600c0ff00012824b04af7a5201000000 > /images ocfs2 _netdev,defaults 0 0 > > > This is the error message on 10: > > Oct 24 19:16:27 sunhb65277 kernel: [ 46.302046] OCFS2 1.5.0 > Oct 24 19:17:23 sunhb65277 kernel: [ 102.296137] > (kworker/u:0,5,3):o2net_connect_expired:1724 ERROR: no connection established > with node 2 after 60.0 seconds, giving up and returning errors. > Oct 24 19:17:23 sunhb65277 kernel: [ 102.296182] > (mount.ocfs2,6555,0):dlm_request_join:1472 ERROR: Error -107 when sending > message 510 (key 0x666c6172) to node 2 > Oct 24 19:17:23 sunhb65277 kernel: [ 102.296188] > (mount.ocfs2,6555,0):dlm_try_to_join_domain:1648 ERROR: status = -107 > Oct 24 19:17:23 sunhb65277 kernel: [ 102.296193] > (mount.ocfs2,6555,0):dlm_join_domain:1948 ERROR: status = -107 > Oct 24 19:17:23 sunhb65277 kernel: [ 102.296311] > (mount.ocfs2,6555,0):dlm_register_domain:2214 ERROR: status = -107 > Oct 24 19:17:23 sunhb65277 kernel: [ 102.296330] > (mount.ocfs2,6555,0):o2cb_cluster_connect:313 ERROR: status = -107 > Oct 24 19:17:23 sunhb65277 kernel: [ 102.296334] > (mount.ocfs2,6555,0):ocfs2_dlm_init:2995 ERROR: status = -107 > Oct 24 19:17:23 sunhb65277 kernel: [ 102.296350] > (mount.ocfs2,6555,0):ocfs2_mount_volume:1881 ERROR: status = -107 > Oct 24 19:17:23 sunhb65277 kernel: [ 102.296387] ocfs2: Unmounting device > (252,5) on (node 0) > Oct 24 19:17:23 sunhb65277 kernel: [ 102.296395] > (mount.ocfs2,6555,0):ocfs2_fill_super:1236 ERROR: status = -107 > > The error is logical. In SLES, the firewall init script is the last one > executed. I don't know why, but it seems to be normal for SuSE. > So, port 7777 is not opened when host 20 tries to connect host 10. And when > the port is opened, host 20 has already stopped connecting. > The host stuck in the init script from ocfs2 until "Network idle timeout: > 60000" has run out. > The other way (host 20 booting independent if host 10 is online or not), the > ocfs2 init script starts the mount, waits some seconds > and the host continues to boot (and the ocfs2 volume is mounted). > > What i find out already is that the node with the higher number (number 2, > host 20) tries to connect the node with the lower number (number 1,host 10) > (https://oss.oracle.com/pipermail/ocfs2-users/2009-June/003626.html). > Although i would expect that always the booting host tries to connect the > other one(s). > Host 20 also mounts automatically when host 10 is offline. > > Questions: > > Why does host 20 mount automatically and host 10 does not ? > From where does host 20 know that host 10 is trying to mount the ocfs2 volume, > because in exactly that moment host 20 tries to connect host 10 on port 7777 ? > There is no packet from host 10 visible before !?! > > Of course i could fumble on the init scripts or change the order of them, but > i would prefer having a running solution out of the box. > And i like to understand it. > > > Bernd > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users