Re: [ceph-users] ceph new osd addition and client disconnected

gjprabu Tue, 03 Nov 2015 00:02:55 -0800

Hi Taylor,



   Details are below.



ceph -s

    cluster 944fa0af-b7be-45a9-93ff-b9907cfaee3f

     health HEALTH_OK

     monmap e2: 3 mons at 
{integ-hm5=192.168.112.192:6789/0,integ-hm6=192.168.112.193:6789/0,integ-hm7=192.168.112.194:6789/0}

            election epoch 526, quorum 0,1,2 integ-hm5,integ-hm6,integ-hm7

     osdmap e50127: 3 osds: 3 up, 3 in

      pgmap v2923439: 190 pgs, 2 pools, 3401 GB data, 920 kobjects

            6711 GB used, 31424 GB / 40160 GB avail

                 190 active+clean

 client io 35153 kB/s rd, 1912 kB/s wr, 672 op/s



Client is automatically unmounted in our cause.



Is it possible to change the PG_num in the production setup.



Journal stored on SATA 7.2k RPM  6GPS and 1gb network interface.



We are not configured Public and cluster as a separate network and it will be 
transferable via same LAN. Do we need to do this setup for better performance.



Also what is beter i/o operation setting for the crush map.





Still we are getting errors in ceph osd logs ,what need to done for this error.



2015-11-03 13:04:18.809488 7f387019c700  0 bad crc in data 3742210963 != exp 
924878202

2015-11-03 13:04:18.812911 7f387019c700  0 -- 192.168.112.231:6800/49908 
&gt;&gt; 192.168.112.192:0/1457324982 pipe(0x170d2000 sd=44 :6800 s=0 pgs=0 
cs=0 l=0 c=0x1b18bf40).accept peer addr is really 192.168.112.192:0/1457324982 
(socket is 192.168.112.192:47128/0)





Regards

Prabu










---- On Tue, 03 Nov 2015 12:50:40 +0530 Chris Taylor &lt;ctay...@eyonic.com&gt; 
wrote ----




On 2015-11-02 10:19 pm, gjprabu wrote: 



&gt; Hi Taylor, 

&gt; 

&gt; I have checked DNS name and all host resolve to the correct IP. MTU 

&gt; size is 1500 in switch level configuration done. There is no firewall/ 

&gt; selinux is running currently. 

&gt; 

&gt; Also we would like to know below query's which already in the thread. 

&gt; 

&gt; Regards 

&gt; Prabu 

&gt; 

&gt; ---- On Tue, 03 Nov 2015 11:20:07 +0530 CHRIS TAYLOR 

&gt; &lt;ctay...@eyonic.com&gt; wrote ---- 

&gt; 

&gt; I would double check the network configuration on the new node. 

&gt; Including hosts files and DNS names. Do all the host names resolve to 

&gt; the correct IP addresses from all hosts? 

&gt; 

&gt; "... 192.168.112.231:6800/49908 &gt;&gt; 192.168.113.42:0/599324131 ..." 

&gt; 

&gt; Looks like the communication between subnets is a problem. Is 

&gt; xxx.xxx.113.xxx a typo? If that's correct, check MTU sizes. Are they 

&gt; configured correctly on the switch and all NICs? 

&gt; 

&gt; Is there any iptables/firewall rules that could be blocking traffic 

&gt; between hosts? 

&gt; 

&gt; Hope that helps, 

&gt; 

&gt; Chris 

&gt; 

&gt; On 2015-11-02 9:18 pm, gjprabu wrote: 

&gt; 

&gt; Hi, 

&gt; 

&gt; Anybody please help me on this issue. 

&gt; 

&gt; Regards 

&gt; Prabu 

&gt; 

&gt; ---- On Mon, 02 Nov 2015 17:54:27 +0530 GJPRABU 
&lt;gjpr...@zohocorp.com&gt; 

&gt; wrote ---- 

&gt; 

&gt; Hi Team, 

&gt; 

&gt; We have ceph setup with 2 OSD and replica 2 and it is mounted with 

&gt; ocfs2 clients and its working. When we added new osd all the clients 

&gt; rbd mapped device disconnected and got hanged by running rbd ls or rbd 

&gt; map command. We waited for long hours to scale the new osd size but 

&gt; peering not completed event data sync finished, but client side issue 

&gt; was persist and thought to try old osd service stop/start, after some 

&gt; time rbd mapped automatically using existing map script. 

&gt; 

&gt; After service stop/start in old osd again 3rd OSD rebuild and back 

&gt; filling started and after some time clients rbd mapped device 

&gt; disconnected and got hanged by running rbd ls or rbd map command. We 

&gt; thought to wait till to finished data sync in 3'rd OSD and its 

&gt; completed, even though client side rbd not mapped. After we restarted 

&gt; all mon and osd service and client side issue got fixed and mounted 

&gt; rbd. We suspected some issue in our setup. also attached logs for your 

&gt; reference. 

&gt; 



What does 'ceph -s' look like? is the cluster HEALTH_OK? 



&gt; 

&gt; Something we are missing in our setup i don't know, highly appreciated 

&gt; if anybody help us to solve this issue. 

&gt; 

&gt; Before new osd.2 addition : 

&gt; 

&gt; osd.0 - size : 13T and used 2.7 T 

&gt; osd.1 - size : 13T and used 2.7 T 

&gt; 

&gt; After new osd addition : 

&gt; osd.0 size : 13T and used 1.8T 

&gt; osd.1 size : 13T and used 2.1T 

&gt; osd.2 size : 15T and used 2.5T 

&gt; 

&gt; rbd ls 

&gt; repo / integrepository (pg_num: 126) 

&gt; rbd / integdownloads (pg_num: 64) 

&gt; 

&gt; Also we would like to know few clarifications . 

&gt; 

&gt; If any new osd will be added whether all client will be unmounted 

&gt; automatically . 

&gt; 



Clients do not need to unmount images when OSDs are added. 



&gt; While add new osd can we access ( read / write ) from client machines ? 

&gt; 



Clients still have read/write access to RBD images in the cluster while 

adding OSDs and during recovery. 



&gt; How much data will be added in new osd - without change any repilca / 

&gt; pg_num ? 

&gt; 



The data will re-balance between OSDs automatically. I found having more 

PGs help distribute the load more evenly. 



&gt; How long to take finish this process ? 



Depends greatly on the hardware and configuration. Whether Journals on 

SSD or spinning disks, network connectivity, max_backfills, etc. 



&gt; 

&gt; If we missed any common configuration - please share the same . 



I don't see any configuration for public and cluster networks. If you 

are sharing the same network for clients and object replication/recovery 

the cluster re-balancing data between OSDs could cause problems with the 

client traffic. 



Take a look at: 

http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/ 



&gt; 

&gt; ceph.conf 

&gt; [global] 

&gt; fsid = 944fa0af-b7be-45a9-93ff-b9907cfaee3f 

&gt; mon_initial_members = integ-hm5, integ-hm6, integ-hm7 

&gt; mon_host = 192.168.112.192,192.168.112.193,192.168.112.194 

&gt; auth_cluster_required = cephx 

&gt; auth_service_required = cephx 

&gt; auth_client_required = cephx 

&gt; filestore_xattr_use_omap = true 

&gt; osd_pool_default_size = 2 

&gt; 

&gt; [mon] 

&gt; mon_clock_drift_allowed = .500 

&gt; 

&gt; [client] 

&gt; rbd_cache = false 

&gt; 

&gt; Current Logs from new osd also attached old logs. 

&gt; 

&gt; 2015-11-02 12:47:48.481641 7f386f691700 0 bad crc in data 3889133030 != 

&gt; exp 2857248268 

&gt; 2015-11-02 12:47:48.482230 7f386f691700 0 -- 192.168.112.231:6800/49908 

&gt; &gt;&gt; 192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0 
cs=0 l=0 c=0xc510580).accept peer addr is really 192.168.113.42:0/599324131 
(socket is 192.168.113.42:42530/0) 

&gt; 2015-11-02 12:47:48.483951 7f386f691700 0 bad crc in data 3192803598 != 

&gt; exp 1083014631 

&gt; 2015-11-02 12:47:48.484512 7f386f691700 0 -- 192.168.112.231:6800/49908 

&gt; &gt;&gt; 192.168.113.42:0/599324131 pipe(0x170ea000 sd=28 :6800 s=0 pgs=0 
cs=0 l=0 c=0xc516f60).accept peer addr is really 192.168.113.42:0/599324131 
(socket is 192.168.113.42:42531/0) 

&gt; 2015-11-02 12:47:48.486284 7f386f691700 0 bad crc in data 133120597 != 

&gt; exp 393328400 

&gt; 2015-11-02 12:47:48.486777 7f386f691700 0 -- 192.168.112.231:6800/49908 

&gt; &gt;&gt; 192.168.113.42:0/599324131 pipe(0x16a18000 sd=28 :6800 s=0 pgs=0 
cs=0 l=0 c=0xc514620).accept peer addr is really 192.168.113.42:0/599324131 
(socket is 192.168.113.42:42532/0) 

&gt; 2015-11-02 12:47:48.488624 7f386f691700 0 bad crc in data 3299720069 != 

&gt; exp 211350069 

&gt; 2015-11-02 12:47:48.489100 7f386f691700 0 -- 192.168.112.231:6800/49908 

&gt; &gt;&gt; 192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0 
cs=0 l=0 c=0xc513860).accept peer addr is really 192.168.113.42:0/599324131 
(socket is 192.168.113.42:42533/0) 

&gt; 2015-11-02 12:47:48.490911 7f386f691700 0 bad crc in data 2381447347 != 

&gt; exp 1177846878 

&gt; 2015-11-02 12:47:48.491390 7f386f691700 0 -- 192.168.112.231:6800/49908 

&gt; &gt;&gt; 192.168.113.42:0/599324131 pipe(0x170ea000 sd=28 :6800 s=0 pgs=0 
cs=0 l=0 c=0xc513700).accept peer addr is really 192.168.113.42:0/599324131 
(socket is 192.168.113.42:42534/0) 

&gt; 2015-11-02 12:47:48.493167 7f386f691700 0 bad crc in data 2093712440 != 

&gt; exp 2175112954 

&gt; 2015-11-02 12:47:48.493682 7f386f691700 0 -- 192.168.112.231:6800/49908 

&gt; &gt;&gt; 192.168.113.42:0/599324131 pipe(0x16a18000 sd=28 :6800 s=0 pgs=0 
cs=0 l=0 c=0xc514200).accept peer addr is really 192.168.113.42:0/599324131 
(socket is 192.168.113.42:42535/0) 

&gt; 2015-11-02 12:47:48.495150 7f386f691700 0 bad crc in data 3047197039 != 

&gt; exp 38098198 

&gt; 2015-11-02 12:47:48.495679 7f386f691700 0 -- 192.168.112.231:6800/49908 

&gt; &gt;&gt; 192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0 
cs=0 l=0 c=0xc510b00).accept peer addr is really 192.168.113.42:0/599324131 
(socket is 192.168.113.42:42536/0) 

&gt; 2015-11-02 12:47:48.497259 7f386f691700 0 bad crc in data 1400444622 != 

&gt; exp 2648291990 

&gt; 2015-11-02 12:47:48.497756 7f386f691700 0 -- 192.168.112.231:6800/49908 

&gt; &gt;&gt; 192.168.113.42:0/599324131 pipe(0x170ea000 sd=28 :6800 s=0 pgs=0 
cs=0 l=0 c=0x17f7b700).accept peer addr is really 192.168.113.42:0/599324131 
(socket is 192.168.113.42:42537/0) 

&gt; 2015-11-02 13:02:00.439025 7f386f691700 0 bad crc in data 4159064831 != 

&gt; exp 903679865 

&gt; 2015-11-02 13:02:00.441337 7f386f691700 0 -- 192.168.112.231:6800/49908 

&gt; &gt;&gt; 192.168.113.42:0/599324131 pipe(0x16a18000 sd=28 :6800 s=0 pgs=0 
cs=0 l=0 c=0x17f7e5c0).accept peer addr is really 192.168.113.42:0/599324131 
(socket is 192.168.113.42:43128/0) 

&gt; 2015-11-02 13:02:00.442756 7f386f691700 0 bad crc in data 1134831440 != 

&gt; exp 892008036 

&gt; 2015-11-02 13:02:00.443369 7f386f691700 0 -- 192.168.112.231:6800/49908 

&gt; &gt;&gt; 192.168.113.42:0/599324131 pipe(0x170d2000 sd=28 :6800 s=0 pgs=0 
cs=0 l=0 c=0x17f7ee00).accept peer addr is really 192.168.113.42:0/599324131 
(socket is 192.168.113.42:43129/0) 

&gt; 2015-11-02 13:08:43.272527 7f387049f700 0 -- 192.168.112.231:6800/49908 

&gt; &gt;&gt; 192.168.112.115:0/4256128918 pipe(0x170ea000 sd=33 :6800 s=0 
pgs=0 cs=0 l=0 c=0x17f7e1a0).accept peer addr is really 
192.168.112.115:0/4256128918 (socket is 192.168.112.115:51660/0) 

&gt; 

&gt; Regards 

&gt; Prabu 

&gt; 

&gt; Regards 

&gt; G.J 

&gt; 

&gt; _______________________________________________ 

&gt; ceph-users mailing list 

&gt; ceph-users@lists.ceph.com 

&gt; http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [1] 

&gt; 

&gt; _______________________________________________ 

&gt; ceph-users mailing list 

&gt; ceph-users@lists.ceph.com 

&gt; http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [1] 







Links: 

------ 

[1] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph new osd addition and client disconnected

Reply via email to