Re: [ceph-users] ceph same rbd on multiple client

gjprabu Fri, 23 Oct 2015 01:37:24 -0700

Hi Henrik,



Thanks for your reply, Still we are facing same issue. we found this dmesg logs 
and this is known logs because our self made down node1 and made up,  this is 
showing in logs and other then we didn't found error message. Even we do have 
problem while unmounting. umount process goes to "D" stat and  fsck through 
fsck.ocfs2: I/O error. If required to run any other command pls let me know. 



ocfs2 version

debugfs.ocfs2 1.8.0



# cat /etc/sysconfig/o2cb

#

# This is a configuration file for automatic startup of the O2CB

# driver.  It is generated by running /etc/init.d/o2cb configure.

# On Debian based systems the preferred method is running

# 'dpkg-reconfigure ocfs2-tools'.

#



# O2CB_STACK: The name of the cluster stack backing O2CB.

O2CB_STACK=o2cb



# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.

O2CB_BOOTCLUSTER=ocfs2



# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.

O2CB_HEARTBEAT_THRESHOLD=31



# O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered 
dead.

O2CB_IDLE_TIMEOUT_MS=30000



# O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent

O2CB_KEEPALIVE_DELAY_MS=2000



# O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts

O2CB_RECONNECT_DELAY_MS=2000



# fsck.ocfs2 -fy /home/build/downloads/

fsck.ocfs2 1.8.0

fsck.ocfs2: I/O error on channel while opening "/zoho/build/downloads/"



dmesg logs



[ 4229.886284] o2dlm: Joining domain A895BC216BE641A8A7E20AA89D57E051 ( 5 ) 1 
nodes

[ 4251.437451] o2dlm: Node 3 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 3 
5 ) 2 nodes

[ 4267.836392] o2dlm: Node 1 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 1 
3 5 ) 3 nodes

[ 4292.755589] o2dlm: Node 2 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 1 
2 3 5 ) 4 nodes

[ 4306.262165] o2dlm: Node 4 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 1 
2 3 4 5 ) 5 nodes

[316476.505401] (kworker/u192:0,95923,0):dlm_do_assert_master:1717 ERROR: Error 
-112 when sending message 502 (key 0xc3460ae7) to node 1

[316476.505470] o2cb: o2dlm has evicted node 1 from domain 
A895BC216BE641A8A7E20AA89D57E051

[316480.437231] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316480.442389] o2cb: o2dlm has evicted node 1 from domain 
A895BC216BE641A8A7E20AA89D57E051

[316480.442412] (kworker/u192:0,95923,20):dlm_begin_reco_handler:2765 
A895BC216BE641A8A7E20AA89D57E051: dead_node previously set to 1, node 3 
changing it to 1

[316480.541237] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316480.541241] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316485.542733] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316485.542740] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316485.542742] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316490.544535] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316490.544538] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316490.544539] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316495.546356] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316495.546362] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316495.546364] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316500.548135] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316500.548139] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316500.548140] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316505.549947] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316505.549951] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316505.549952] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316510.551734] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316510.551739] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316510.551740] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316515.553543] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316515.553547] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316515.553548] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316520.555337] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316520.555341] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316520.555343] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316525.557131] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316525.557136] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316525.557153] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316530.558952] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316530.558955] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316530.558957] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[316535.560781] o2dlm: Begin recovery on domain 
A895BC216BE641A8A7E20AA89D57E051 for node 1

[316535.560789] o2dlm: Node 3 (he) is the Recovery Master for the dead node 1 
in domain A895BC216BE641A8A7E20AA89D57E051

[316535.560792] o2dlm: End recovery on domain A895BC216BE641A8A7E20AA89D57E051

[319419.525609] o2dlm: Node 1 joins domain A895BC216BE641A8A7E20AA89D57E051 ( 1 
2 3 4 5 ) 5 nodes







ps -auxxxxx | grep umount

root     32083 21.8  0.0 125620  2828 pts/14   D+   19:37   0:18 umount 
/home/build/repository

root     32196  0.0  0.0 112652  2264 pts/8    S+   19:38   0:00 grep 
--color=auto umount





cat /proc/32083/stack 

[&lt;ffffffff8132ad7d&gt;] o2net_send_message_vec+0x71d/0xb00

[&lt;ffffffff81352148&gt;] dlm_send_remote_unlock_request.isra.2+0x128/0x410

[&lt;ffffffff813527db&gt;] dlmunlock_common+0x3ab/0x9e0

[&lt;ffffffff81353088&gt;] dlmunlock+0x278/0x800

[&lt;ffffffff8131f765&gt;] o2cb_dlm_unlock+0x35/0x50

[&lt;ffffffff8131ecfe&gt;] ocfs2_dlm_unlock+0x1e/0x30

[&lt;ffffffff812a8776&gt;] ocfs2_drop_lock.isra.29.part.30+0x1f6/0x700

[&lt;ffffffff812ae40d&gt;] ocfs2_simple_drop_lockres+0x2d/0x40

[&lt;ffffffff8129b43c&gt;] ocfs2_dentry_lock_put+0x5c/0x80

[&lt;ffffffff8129b4a2&gt;] ocfs2_dentry_iput+0x42/0x1d0

[&lt;ffffffff81204dc2&gt;] __dentry_kill+0x102/0x1f0

[&lt;ffffffff81205294&gt;] shrink_dentry_list+0xe4/0x2a0

[&lt;ffffffff81205aa8&gt;] shrink_dcache_parent+0x38/0x90

[&lt;ffffffff81205b16&gt;] do_one_tree+0x16/0x50

[&lt;ffffffff81206e9f&gt;] shrink_dcache_for_umount+0x2f/0x90

[&lt;ffffffff811efb15&gt;] generic_shutdown_super+0x25/0x100

[&lt;ffffffff811eff57&gt;] kill_block_super+0x27/0x70

[&lt;ffffffff811f02a9&gt;] deactivate_locked_super+0x49/0x60

[&lt;ffffffff811f089e&gt;] deactivate_super+0x4e/0x70

[&lt;ffffffff8120da83&gt;] cleanup_mnt+0x43/0x90

[&lt;ffffffff8120db22&gt;] __cleanup_mnt+0x12/0x20

[&lt;ffffffff81093ba4&gt;] task_work_run+0xc4/0xe0

[&lt;ffffffff81013c67&gt;] do_notify_resume+0x97/0xb0

[&lt;ffffffff817d2ee7&gt;] int_signal+0x12/0x17

[&lt;ffffffffffffffff&gt;] 0xffffffffffffffff





Regards

G.J










 ---- On Fri, 23 Oct 2015 13:41:19 +0530 Henrik Korkuc &lt;li...@kirneh.eu&gt; 
wrote ----




can you paste dmesg and system logs? I am using 3 node OCFS2 with RBD and had 
no problems.

 

 On 15-10-23 08:40, gjprabu wrote:






 _______________________________________________ 

ceph-users mailing list 

ceph-users@lists.ceph.com 

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


Hi Frederic,



           Can you give me some solution, we are spending more time to solve 
this issue.



Regards

Prabu










---- On Thu, 15 Oct 2015 17:14:13 +0530 Tyler Bishop 
&lt;tyler.bis...@beyondhosting.net&gt; wrote ----




I don't know enough on ocfs to help.  Sounds like you have unconccurent writes 
though

Sent from TypeMail

On Oct 15, 2015, at 1:53 AM, gjprabu &lt;gjpr...@zohocorp.com&gt; wrote:

Hi Tyler,



   Can please send me the next setup action to be taken on this issue.



Regards

Prabu






---- On Wed, 14 Oct 2015 13:43:29 +0530 gjprabu &lt;gjpr...@zohocorp.com&gt; 
wrote ----




Hi Tyler,



         Thanks for your reply. We have disabled rbd_cache but still issue is 
persist. Please find our configuration file.



# cat /etc/ceph/ceph.conf

[global]

fsid = 944fa0af-b7be-45a9-93ff-b9907cfaee3f

mon_initial_members = integ-hm5, integ-hm6, integ-hm7

mon_host = 192.168.112.192,192.168.112.193,192.168.112.194

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

filestore_xattr_use_omap = true

osd_pool_default_size = 2



[mon]

mon_clock_drift_allowed = .500



[client]

rbd_cache = false



--------------------------------------------------------------------------------------



 cluster 944fa0af-b7be-45a9-93ff-b9907cfaee3f

     health HEALTH_OK

     monmap e2: 3 mons at 
{integ-hm5=192.168.112.192:6789/0,integ-hm6=192.168.112.193:6789/0,integ-hm7=192.168.112.194:6789/0}

            election epoch 480, quorum 0,1,2 integ-hm5,integ-hm6,integ-hm7

     osdmap e49780: 2 osds: 2 up, 2 in

      pgmap v2256565: 190 pgs, 2 pools, 1364 GB data, 410 kobjects

            2559 GB used, 21106 GB / 24921 GB avail

                 190 active+clean

  client io 373 kB/s rd, 13910 B/s wr, 103 op/s





Regards

Prabu



 ---- On Tue, 13 Oct 2015 19:59:38 +0530 Tyler Bishop 
&lt;tyler.bis...@beyondhosting.net&gt; wrote ----




You need to disable RBD caching.







 Tyler Bishop
Chief Technical Officer
 513-299-7108 x10
 
tyler.bis...@beyondhosting.net

 
 
If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited.

 









From: "gjprabu" &lt;gjpr...@zohocorp.com&gt;

To: "Frédéric Nass" &lt;frederic.n...@univ-lorraine.fr&gt;

Cc: "&lt;ceph-users@lists.ceph.com&gt;" &lt;ceph-users@lists.ceph.com&gt;, 
"Siva Sokkumuthu" &lt;sivaku...@zohocorp.com&gt;, "Kamal Kannan 
Subramani(kamalakannan)" &lt;ka...@manageengine.com&gt;

Sent: Tuesday, October 13, 2015 9:11:30 AM

Subject: Re: [ceph-users] ceph same rbd on multiple client




Hi ,




 We have CEPH  RBD with OCFS2 mounted servers. we are facing i/o errors 
simultaneously while move the folder using one nodes in the same disk other 
nodes data replicating with below said error (Copying is not having any 
problem). Workaround if we remount the partition this issue get resolved but 
after sometime problem again reoccurred. please help on this issue.



Note : We have total 5 Nodes, here two nodes working fine other nodes are 
showing like below input/output error on moved data's.



ls -althr 

ls: cannot access LITE_3_0_M4_1_TEST: Input/output error 

ls: cannot access LITE_3_0_M4_1_OLD: Input/output error 

total 0 

d????????? ? ? ? ? ? LITE_3_0_M4_1_TEST 

d????????? ? ? ? ? ? LITE_3_0_M4_1_OLD 



Regards

Prabu






---- On Fri, 22 May 2015 17:33:04 +0530 Frédéric Nass 
&lt;frederic.n...@univ-lorraine.fr&gt; wrote ----




Hi,



Waiting for CephFS, you can use clustered filesystem like OCFS2 or GFS2 on top 
of RBD mappings so that each host can access the same device and clustered 
filesystem.



Regards,



Frédéric.



Le 21/05/2015 16:10, gjprabu a écrit :





-- Frédéric Nass Sous direction des Infrastructures, Direction du Numérique, 
Université de Lorraine. Tél : 03.83.68.53.83
_______________________________________________ 

ceph-users mailing list 

ceph-users@lists.ceph.com 

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


Hi All,



        We are using rbd and map the same rbd image to the rbd device on two 
different client but i can't see the data until i umount and mount -a 
partition. Kindly share the solution for this issue.



Example

create rbd image named foo

map foo to /dev/rbd0 on server A,   mount /dev/rbd0 to /mnt

map foo to /dev/rbd0 on server B,   mount /dev/rbd0 to /mnt



Regards

Prabu








_______________________________________________ ceph-users mailing list 
ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






_______________________________________________

ceph-users mailing list

ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





















 

_______________________________________________ ceph-users mailing list 
ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph same rbd on multiple client

Reply via email to