Hi Prabu,

[193918.928968] (kworker/u128:1,51132,30):dlm_do_assert_master:1717 ERROR: 
Error -112 when sending message 502 (key 0xc3460ae7) to node 1
[193918.929004] (kworker/u128:3,63088,31):dlm_send_remote_convert_request:392 
ERROR: Error -112 when sending message 504 (key 0xc3460ae7) to node 1

The above error messages show that the link between this node and node
1 is down. So that it cannot send dlm messages.


On 2015/9/29 19:52, gjprabu wrote:
> 
> Hi Joseph,
> 
>        Our self testing purpose we reboot Node1 and Node7 this is the log 
> shows. I have cross checked configuration in /etc/ocfs2/cluster.conf and its 
> fine. Anybody help on this issue. Hope this issue related to OCFS2 not on RBD.
> 
>   /sys/kernel/config/cluster/ocfs2/node/
> [root@integ-cm2 node]# ls
> integ-ci-1  integ-cm1  integ-cm2  integ-hm2  integ-hm5  integ-hm8  integ-hm9
> 
> Also pls find missed out logs.
> 
> [  475.407086] o2dlm: Joining domain A895BC216BE641A8A7E20AA89D57E051 ( 1 3 4 
> 7 ) 4 nodes
> [  880.734421] o2dlm: Node 2 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 
> 1 2 3 4 7 ) 5 nodes
> [  892.746728] o2dlm: Node 2 leaves domain A895BC216BE641A8A7E20AA89D57E051 ( 
> 1 3 4 7 ) 4 nodes
> [  905.264066] o2dlm: Node 2 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 
> 1 2 3 4 7 ) 5 nodes
> [12313.418294] o2cb: o2dlm has evicted node 1 from domain 
> A895BC216BE641A8A7E20AA89D57E051
> [12315.042208] o2cb: o2dlm has evicted node 1 from domain 
> A895BC216BE641A8A7E20AA89D57E051
> [12315.402103] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
> [12315.402111] o2dlm: Node 4 (he) is the Recovery Master for the dead node 1 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [12315.402114] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [12320.402074] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
> [12320.402080] o2dlm: Node 4 (he) is the Recovery Master for the dead node 1 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [12320.402083] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [12698.830376] o2dlm: Node 1 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 
> 1 2 3 4 7 ) 5 nodes
> [181348.383986] o2cb: o2dlm has evicted node 7 from domain 
> A895BC216BE641A8A7E20AA89D57E051
> [181349.048120] o2cb: o2dlm has evicted node 7 from domain 
> A895BC216BE641A8A7E20AA89D57E051
> [181351.972048] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 7
> [181351.972056] o2dlm: Node 1 (he) is the Recovery Master for the dead node 7 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [181351.972059] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [181356.972035] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 7
> [181356.972040] o2dlm: Node 1 (he) is the Recovery Master for the dead node 7 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [181356.972042] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [181361.972046] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 7
> [181361.972054] o2dlm: Node 1 (he) is the Recovery Master for the dead node 7 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [181361.972057] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [181366.972049] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 7
> [181366.972056] o2dlm: Node 1 (he) is the Recovery Master for the dead node 7 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [181366.972059] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [181599.543509] o2dlm: Node 7 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 
> 1 2 3 4 7 ) 5 nodes
> [183251.706097] o2dlm: Node 7 leaves domain A895BC216BE641A8A7E20AA89D57E051 
> ( 1 2 3 4 ) 4 nodes
> [183462.532465] o2dlm: Node 7 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 
> 1 2 3 4 7 ) 5 nodes
> [183506.924225] o2dlm: Node 7 leaves domain A895BC216BE641A8A7E20AA89D57E051 
> ( 1 2 3 4 ) 4 nodes
> [183709.344072] o2dlm: Node 7 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 
> 1 2 3 4 7 ) 5 nodes
> [183905.441289] o2dlm: Node 7 leaves domain A895BC216BE641A8A7E20AA89D57E051 
> ( 1 2 3 4 ) 4 nodes
> [184103.391770] o2dlm: Node 7 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 
> 1 2 3 4 7 ) 5 nodes
> [184175.702196] o2dlm: Node 7 leaves domain A895BC216BE641A8A7E20AA89D57E051 
> ( 1 2 3 4 ) 4 nodes
> [184363.166986] o2dlm: Node 7 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 
> 1 2 3 4 7 ) 5 nodes
> *[193918.928968] (kworker/u128:1,51132,30):dlm_do_assert_master:1717 ERROR: 
> Error -112 when sending message 502 (key 0xc3460ae7) to node 1*
> *[193918.929004] 
> (kworker/u128:3,63088,31):dlm_send_remote_convert_request:392 ERROR: Error 
> -112 when sending message 504 (key 0xc3460ae7) to node 1*
> [193918.929035] o2dlm: Waiting on the death of node 1 in domain 
> A895BC216BE641A8A7E20AA89D57E051
> [193918.929083] o2cb: o2dlm has evicted node 1 from domain 
> A895BC216BE641A8A7E20AA89D57E051
> [193920.386365] o2cb: o2dlm has evicted node 1 from domain 
> A895BC216BE641A8A7E20AA89D57E051
> [193921.972105] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
> [193921.972114] o2dlm: Node 2 (he) is the Recovery Master for the dead node 1 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [193921.972116] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [193926.972056] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
> [193926.972063] o2dlm: Node 2 (he) is the Recovery Master for the dead node 1 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [193926.972066] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [193931.972054] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
> [193931.972062] o2dlm: Node 2 (he) is the Recovery Master for the dead node 1 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [193931.972065] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [193936.972101] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
> [193936.972108] o2dlm: Node 2 (he) is the Recovery Master for the dead node 1 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [193936.972110] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [193941.972066] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
> [193941.972072] o2dlm: Node 2 (he) is the Recovery Master for the dead node 1 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [193941.972075] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [193946.972077] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
> [193946.972084] o2dlm: Node 2 (he) is the Recovery Master for the dead node 1 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [193946.972086] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [193951.972107] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
> [193951.972114] o2dlm: Node 2 (he) is the Recovery Master for the dead node 1 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [193951.972116] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [193956.972073] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
> [193956.972081] o2dlm: Node 2 (he) is the Recovery Master for the dead node 1 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [193956.972084] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [193961.972075] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
> [193961.972082] o2dlm: Node 2 (he) is the Recovery Master for the dead node 1 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [193961.972084] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [193966.972051] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
> [193966.972059] o2dlm: Node 2 (he) is the Recovery Master for the dead node 1 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [193966.972062] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [193971.972115] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
> [193971.972122] o2dlm: Node 2 (he) is the Recovery Master for the dead node 1 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [193971.972124] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [193976.972103] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
> [193976.972111] o2dlm: Node 2 (he) is the Recovery Master for the dead node 1 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [193976.972114] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [194143.962241] o2dlm: Node 1 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 
> 1 2 3 4 7 ) 5 nodes
> [199847.473092] o2dlm: Node 7 leaves domain A895BC216BE641A8A7E20AA89D57E051 
> ( 1 2 3 4 ) 4 nodes
> [208215.106305] o2dlm: Node 7 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 
> 1 2 3 4 7 ) 5 nodes
> [258418.054204] o2cb: o2dlm has evicted node 7 from domain 
> A895BC216BE641A8A7E20AA89D57E051
> [258418.957738] o2cb: o2dlm has evicted node 7 from domain 
> A895BC216BE641A8A7E20AA89D57E051
> [264056.408719] o2dlm: Node 7 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 
> 1 2 3 4 7 ) 5 nodes
> [264464.605542] o2dlm: Node 7 leaves domain A895BC216BE641A8A7E20AA89D57E051 
> ( 1 2 3 4 ) 4 nodes
> [275619.497198] o2dlm: Node 7 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 
> 1 2 3 4 7 ) 5 nodes
> [426628.076148] o2cb: o2dlm has evicted node 1 from domain 
> A895BC216BE641A8A7E20AA89D57E051
> [426628.885084] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
> [426628.891170] o2dlm: Node 3 (me) is the Recovery Master for the dead node 1 
> in domain A895BC216BE641A8A7E20AA89D57E051
> [426634.182384] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051
> [427001.383315] o2dlm: Node 1 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 
> 1 2 3 4 7 ) 5 nodes
> 
> 
> 
> 
> Regards
> Prabu**
> 
> 
> 
> 
> 
>     ---- On Tue, 29 Sep 2015 15:01:40 +0530 *Joseph Qi <joseph...@huawei.com 
> <mailto:joseph...@huawei.com>>* wrote ----
> 
> 
> 
>         On 2015/9/29 15:18, gjprabu wrote:
>         > Hi Joseph,
>         >
>         > We have total 7 nodes and this problem occurs in multiple nodes 
> simultaneously not in particular one node. we checked network and its fine.
>         > When we remount the ocfs2 partition, this problem is get fixed 
> temporarily and same problem reoccurs after some time.
>         >
>         > Even we do have problem while unmountinng. umount process goes to 
> "D" stat, then i need to restart server itself. Is there any solution for 
> this issue.
>         >
>         > I have tried running fsck.ocfs2 in problematic machine but its 
> throwing error.
>         >
>         > fsck.ocfs2 1.8.0
>         > fsck.ocfs2: I/O error on channel while opening 
> "/zoho/build/downloads"
>         >
>         IMO, this can happen if the mountpoint is offline.
> 
>         >
>         > Please refer the latest logs from one node.
>         >
>         > [258418.054204] o2cb: o2dlm has evicted node 7 from domain 
> A895BC216BE641A8A7E20AA89D57E051
>         > [258418.957738] o2cb: o2dlm has evicted node 7 from domain 
> A895BC216BE641A8A7E20AA89D57E051
>         > [264056.408719] o2dlm: Node 7 joins domain 
> A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes
>         > [264464.605542] o2dlm: Node 7 leaves domain 
> A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 ) 4 nodes
>         > [275619.497198] o2dlm: Node 7 joins domain 
> A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes
>         > [426628.076148] o2cb: o2dlm has evicted node 1 from domain 
> A895BC216BE641A8A7E20AA89D57E051
>         > [426628.885084] o2dlm: Begin recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051 for node 1
>         > [426628.891170] o2dlm: Node 3 (me) is the Recovery Master for the 
> dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051
>         > [426634.182384] o2dlm: End recovery on domain 
> A895BC216BE641A8A7E20AA89D57E051
>         > [427001.383315] o2dlm: Node 1 joins domain 
> A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 7 ) 5 nodes
>         >
>         The above message shows nodes in your cluster is frequently in and 
> out.
>         I suggest you check the cluster config in each node
>         (/etc/ocfs2/cluster.conf as well as 
> /sys/kernel/config/<cluster_name>/node/).
>         I haven't used ocfs2 along with ceph rbd. So I am not sure if it has
>         relations with rbd.
> 
>         >
>         >
>         >
>         > Regards
>         > G.J
>         > **
>         >
>         >
>         >
>         > ---- On Fri, 25 Sep 2015 06:26:57 +0530 *Joseph Qi 
> <joseph...@huawei.com <mailto:joseph...@huawei.com>>* wrote ----
>         >
>         > On 2015/9/24 18:30, gjprabu wrote:
>         > > Hi All,
>         > >
>         > > Can someone tell me what kind of is this.
>         > >
>         > > Regards
>         > > Prabu GJ
>         > >
>         > >
>         > > ---- On Wed, 23 Sep 2015 18:26:13 +0530 *gjprabu 
> <gjpr...@zohocorp.com <mailto:gjpr...@zohocorp.com> 
> <mailto:gjpr...@zohocorp.com <mailto:gjpr...@zohocorp.com>>>* wrote ----
>         > >
>         > > Hi All,
>         > >
>         > > This issue we faced in locally machine also. but it is not in all 
> the client only two ocfs2 client we facing this issue.
>         > >
>         > > Regards
>         > > Prabu GJ
>         > >
>         > >
>         > >
>         > > ---- On Wed, 23 Sep 2015 17:49:51 +0530 *gjprabu 
> <gjpr...@zohocorp.com <mailto:gjpr...@zohocorp.com> 
> <mailto:gjpr...@zohocorp.com <mailto:gjpr...@zohocorp.com>> 
> <mailto:gjpr...@zohocorp.com <mailto:gjpr...@zohocorp.com>>>* wrote ----
>         > >
>         > >
>         > >
>         > > Hi All,
>         > >
>         > > We are using ocfs2 for RBD mounting and everything works fine, 
> but while writing or moving the data via the scripts after written it shows 
> below error. Please anybody help on this issue.
>         > >
>         > >
>         > >
>         > > # ls -althr
>         > > ls: cannot access MICKEYLITE_3_0_M4_1_TEST: Input/output error
>         > > ls: cannot access MICKEYLITE_3_0_M4_1_OLD: Input/output error
>         > > total 0
>         > > d????????? ? ? ? ? ? MICKEYLITE_3_0_M4_1_TEST
>         > > d????????? ? ? ? ? ? MICKEYLITE_3_0_M4_1_OLD
>         > >
>         > > _*Partition details.*_
>         > >
>         > > /dev/rbd0 ocfs2 9.6T 140G 9.5T 2% /zoho/build/downloads
>         > >
>         > > /etc/ocfs2/cluster.conf
>         > > cluster:
>         > > node_count=7
>         > > heartbeat_mode = local
>         > > name=ocfs2
>         > >
>         > > node:
>         > > ip_port = 7777
>         > > ip_address = 10.1.1.50
>         > > number = 1
>         > > name = integ-hm5
>         > > cluster = ocfs2
>         > >
>         > > node:
>         > > ip_port = 7777
>         > > ip_address = 10.1.1.51
>         > > number = 2
>         > > name = integ-hm9
>         > > cluster = ocfs2
>         > >
>         > > node:
>         > > ip_port = 7777
>         > > ip_address = 10.1.1.52
>         > > number = 3
>         > > name = integ-hm2
>         > > cluster = ocfs2
>         > >
>         > > node:
>         > > ip_port = 7777
>         > > ip_address = 10.1.1.53
>         > > number = 4
>         > > name = integ-ci-1
>         > > cluster = ocfs2
>         > > node:
>         > > ip_port = 7777
>         > > ip_address = 10.1.1.54
>         > > number = 5
>         > > name = integ-cm2
>         > > cluster = ocfs2
>         > > node:
>         > > ip_port = 7777
>         > > ip_address = 10.1.1.55
>         > > number = 6
>         > > name = integ-cm1
>         > > cluster = ocfs2
>         > > node:
>         > > ip_port = 7777
>         > > ip_address = 10.1.1.56
>         > > number = 7
>         > > name = integ-hm8
>         > > cluster = ocfs2
>         > >
>         > >
>         > > *_Error on dmesg_*
>         > >
>         > >
>         > > [516421.342393] (dlm_thread,51005,25):dlm_flush_asts:599 ERROR: 
> status = -112
>         > > [517119.689992] (httpd,64399,31):dlm_do_master_request:1383 
> ERROR: link to 1 went down!
>         > > [517119.690003] (dlm_thread,51005,25):dlm_send_proxy_ast_msg:482 
> ERROR: A895BC216BE641A8A7E20AA89D57E051: res S000000000000000000000200000000, 
> error -112 send AST to node 1
>         > > [517119.690028] (dlm_thread,51005,25):dlm_flush_asts:599 ERROR: 
> status = -112
>         > > [517119.690034] (dlm_thread,51005,25):dlm_send_proxy_ast_msg:482 
> ERROR: A895BC216BE641A8A7E20AA89D57E051: res S000000000000000000000200000000, 
> error -107 send AST to node 1
>         > > [517119.690036] (dlm_thread,51005,25):dlm_flush_asts:599 ERROR: 
> status = -107
>         > > [517119.700425] (httpd,64399,31):dlm_get_lock_resource:968 ERROR: 
> status = -112
>         > > [517517.894949] (dlm_thread,51005,25):dlm_send_proxy_ast_msg:482 
> ERROR: A895BC216BE641A8A7E20AA89D57E051: res S000000000000000000000200000000, 
> error -112 send AST to node 1
>         > > [517517.899640] (dlm_thread,51005,25):dlm_flush_asts:599 ERROR: 
> status = -112
>         > >
>         > The error messages means the connection between this node and node 
> 1 has problem.
>         > You have to check the network.
>         >
>         > >
>         > > Regards
>         > > Prabu GJ
>         > >
>         > >
>         > >
>         > > _______________________________________________
>         > > Ocfs2-users mailing list
>         > > Ocfs2-users@oss.oracle.com <mailto:Ocfs2-users@oss.oracle.com> 
> <mailto:Ocfs2-users@oss.oracle.com <mailto:Ocfs2-users@oss.oracle.com>> 
> <mailto:Ocfs2-users@oss.oracle.com <mailto:Ocfs2-users@oss.oracle.com>>
>         > > https://oss.oracle.com/mailman/listinfo/ocfs2-users
>         > >
>         > >
>         > >
>         > >
>         > > _______________________________________________
>         > > Ocfs2-users mailing list
>         > > Ocfs2-users@oss.oracle.com <mailto:Ocfs2-users@oss.oracle.com> 
> <mailto:Ocfs2-users@oss.oracle.com <mailto:Ocfs2-users@oss.oracle.com>>
>         > > https://oss.oracle.com/mailman/listinfo/ocfs2-users
>         > >
>         >
>         >
>         >
> 
> 
> 



_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to