Re: [ceph-users] Journal symlink broken / Ceph 0.94.5 / CentOS 6.7
Hi Loic, searched around for possible udev bugs, and then tried to run "yum update". Udev did have a fresh update with the following version diffs; udev-147-2.63.el6_7.1.x86_64 --> udev-147-2.63.el6_7.1.x86_64 from what i can see this update fixes stuff related to symbolic links / external devices. /dev/sdc sits on external eSata. So... https://rhn.redhat.com/errata/RHBA-2015-1382.html will reboot tonight and get back :-) /jesper ***' I guess that's the problem you need to solve : why /dev/sdc does not generate udev events (different driver than /dev/sda maybe ?). Once it does, Ceph should work. A workaround could be to add somethink like: ceph-disk-udev 3 sdc3 sdc ceph-disk-udev 4 sdc4 sdc in /etc/rc.local. On 17/12/2015 12:01, Jesper Thorhauge wrote: > Nope, the previous post contained all that was in the boot.log :-( > > /Jesper > > ** > > - Den 17. dec 2015, kl. 11:53, Loic Dachary skrev: > > On 17/12/2015 11:33, Jesper Thorhauge wrote: >> Hi Loic, >> >> Sounds like something does go wrong when /dev/sdc3 shows up. Is there anyway >> i can debug this further? Log-files? Modify the .rules file...? > > Do you see traces of what happens when /dev/sdc3 shows up in boot.log ? > >> >> /Jesper >> >> >> >> The non-symlink files in /dev/disk/by-partuuid come to existence because of: >> >> * system boots >> * udev rule calls ceph-disk-udev via 95-ceph-osd.rules on /dev/sda1 >> * ceph-disk-udev creates the symlink >> /dev/disk/by-partuuid/c83b5aa5-fe77-42f6-9415-25ca0266fb7f -> ../../sdb1 >> * ceph-disk activate /dev/sda1 is mounted and finds a symlink to the journal >> journal -> /dev/disk/by-partuuid/1e9d527f-0866-4284-b77c-c1cb04c5a168 which >> does not yet exists because /dev/sdc udev rules have not been run yet >> * ceph-osd opens the journal in write mode and that creates the file >> /dev/disk/by-partuuid/1e9d527f-0866-4284-b77c-c1cb04c5a168 as a regular file >> * the file is empty and the osd fails to activate with the error you see >> (EINVAL because the file is empty) >> >> This is ok, supported and expected since there is no way to know which disk >> will show up first. >> >> When /dev/sdc shows up, the same logic will be triggered: >> >> * udev rule calls ceph-disk-udev via 95-ceph-osd.rules on /dev/sda1 >> * ceph-disk-udev creates the symlink >> /dev/disk/by-partuuid/1e9d527f-0866-4284-b77c-c1cb04c5a168 -> ../../sdc3 >> (overriding the file because ln -sf) >> * ceph-disk activate-journal /dev/sdc3 finds that >> c83b5aa5-fe77-42f6-9415-25ca0266fb7f is the data partition for that journal >> and mounts /dev/disk/by-partuuid/c83b5aa5-fe77-42f6-9415-25ca0266fb7f >> * ceph-osd opens the journal and all is well >> >> Except something goes wrong in your case, presumably because ceph-disk-udev >> is not called when /dev/sdc3 shows up ? >> >> On 17/12/2015 08:29, Jesper Thorhauge wrote: >>> Hi Loic, >>> >>> osd's are on /dev/sda and /dev/sdb, journal's is on /dev/sdc (sdc3 / sdc4). >>> >>> sgdisk for sda shows; >>> >>> Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown) >>> Partition unique GUID: E85F4D92-C8F1-4591-BD2A-AA43B80F58F6 >>> First sector: 2048 (at 1024.0 KiB) >>> Last sector: 1953525134 (at 931.5 GiB) >>> Partition size: 1953523087 sectors (931.5 GiB) >>> Attribute flags: >>> Partition name: 'ceph data' >>> >>> for sdb >>> >>> Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown) >>> Partition unique GUID: C83B5AA5-FE77-42F6-9415-25CA0266FB7F >>> First sector: 2048 (at 1024.0 KiB) >>> Last sector: 1953525134 (at 931.5 GiB) >>> Partition size: 1953523087 sectors (931.5 GiB) >>> Attribute flags: >>> Partition name: 'ceph data' >>> >>> for /dev/sdc3 >>> >>> Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown) >>> Partition unique GUID: C34D4694-B486-450D-B57F-DA24255F0072 >>> First sector: 935813120 (at 446.2 GiB) >>> Last sector: 956293119 (at 456.0 GiB) >>> Partition size: 2048 sectors (9.8 GiB) >>> Attribute flags: >>> Partition name: 'ceph journal' >>> >>> for /dev/sdc4 >>> >>> Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown) >>> Partition unique GUID: 1E9D527F-0866-4284-B77C-C1CB04C5A168 >>> First sector: 956293120 (at 456.0 GiB) >>> Last sector: 976773119 (at 465.8 GiB) >>> Partition size: 2048 sectors (9.8 GiB) >>> Attribute flags: >>> Partition name: 'ceph journal' >>> >>> 60-ceph-partuuid-workaround.rules is located in /lib/udev/rules.d, so it >>> seems correct to me. >>> >>> after a reboot, /dev/disk/by-partuuid is; >>> >>> -rw-r--r-- 1 root root 0 Dec 16 07:35 1e9d527f-0866-4284-b77c-c1cb04c5a168 >>> -rw-r--r-- 1 root root 0 Dec 16 07:35 c34d4694-b486-450d-b57f-da24255f0072 >>> lrwxrwxrwx 1 root root 10 Dec 16 07:35 c83b5aa5-fe77-42f6-9415-25ca0266fb7f >>> -> ../../sdb1 >>
[ceph-users] Problem adding a new node
Hi all, I've a 3 node ceph cluster running on Ubuntu 14.04, dell r720xd / ceph version is 0.80.10. I have 64 Gb RAM on each node and 2 x E5-2695 v2 @ 2.40Ghz (so cat /proc/cpuinfo gives me 48 processors per node), each cpu processor is 1200 Mhz and cache size is 30720 kB. 3 mon (one one each node), 2 mds (active/backup) and 11 osd per node (no raid, 3To 7200rpm drives) with 2 Intel SSD 200Go, with journals running on SSDs. Public/cluster network is 10Gb LACP. Here is my problem : yesterday I wanted to add a brand new node in my cluster (r730xd): Ceph-deploy install newnode ..ok then Ceph-deploy osd create -zap-disk newnode:/dev/sdb:/dev/sdn Etc etc with my whole set of new disks...no problem here Ceph-deploy admin newnode My cluster became unstable, with flopping ODSs (running up and down), high load average, many blocked requests... etc. here is a snapshot of ceph -s output : https://releases.cloud-omc.fr/releases/index.php/s/5sEugMTo6KJWpIX/download I managed to get the cluster back to health ok removing every nodes one by one from the crush map and finally removing the newly added host. Did I missed something to add a new node ? Why my cluster became so unusable ? I can provide any log needed. Thank you ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Kernel 4.1.x RBD very slow on writes
I hope this can help anyone who is running into the same issue as us - kernels 4.1.x appear to have terrible RBD sequential write performance. Kernels before and after are great. I tested with 4.1.6 and 4.1.15 on Ubuntu 14.04.3, ceph hammer 0.94.5 - a simple dd test yields this result: dd if=/dev/zero of=/dev/rbd0 bs=1M count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 46.3618 s, 22.6 MB/s On 3.19 and 4.2.8, quite another story: dd if=/dev/zero of=/dev/rbd0 bs=1M count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 2.18914 s, 479 MB/s -- Alex Gorbachev Storcium ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] pg states
Hi guys, Which are the PG states that the cluster is still usable like read/write ? Thanks Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg states
all the pg is active, clean is not necessary, but all the pg should be active. 在 2015年12月18日 18:01, Dan Nica 写道: Hi guys, Which are the PG states that the cluster is still usable like read/write ? Thanks Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Kernel 4.1.x RBD very slow on writes
On Fri, Dec 18, 2015 at 10:55 AM, Alex Gorbachev wrote: > I hope this can help anyone who is running into the same issue as us - > kernels 4.1.x appear to have terrible RBD sequential write performance. > Kernels before and after are great. > > I tested with 4.1.6 and 4.1.15 on Ubuntu 14.04.3, ceph hammer 0.94.5 - a > simple dd test yields this result: > > dd if=/dev/zero of=/dev/rbd0 bs=1M count=1000 1000+0 records in 1000+0 > records out 1048576000 bytes (1.0 GB) copied, 46.3618 s, 22.6 MB/s > > On 3.19 and 4.2.8, quite another story: > > dd if=/dev/zero of=/dev/rbd0 bs=1M count=1000 1000+0 records in 1000+0 > records out 1048576000 bytes (1.0 GB) copied, 2.18914 s, 479 MB/s This is due to an old regression in blk-mq. rbd was switched to blk-mq infrastructure in 4.0, the regression in blk-mq core was fixed in 4.2 by commit e6c4438ba7cb "blk-mq: fix plugging in blk_sq_make_request". It's outside of rbd and wasn't backported, so we are kind of stuck with it. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] pg stuck in peering state
Hi all, I reboot all my osd node after, I got some pg stuck in peering state. root@ceph-osd-3:/var/log/ceph# ceph -s cluster 186717a6-bf80-4203-91ed-50d54fe8dec4 health HEALTH_WARN clock skew detected on mon.ceph-osd-2 33 pgs peering 33 pgs stuck inactive 33 pgs stuck unclean Monitor clock skew detected monmap e1: 3 mons at {ceph-osd-1= 10.200.1.11:6789/0,ceph-osd-2=10.200.1.12:6789/0,ceph-osd-3=10.200.1.13:6789/0 } election epoch 14, quorum 0,1,2 ceph-osd-1,ceph-osd-2,ceph-osd-3 osdmap e66: 8 osds: 8 up, 8 in pgmap v1346: 264 pgs, 3 pools, 272 MB data, 653 objects 808 MB used, 31863 MB / 32672 MB avail 231 active+clean 33 peering root@ceph-osd-3:/var/log/ceph# root@ceph-osd-3:/var/log/ceph# ceph pg dump_stuck ok pg_stat state up up_primary acting acting_primary 4.2d peering [2,0] 2 [2,0] 2 1.57 peering [3,0] 3 [3,0] 3 1.24 peering [3,0] 3 [3,0] 3 1.52 peering [0,2] 0 [0,2] 0 1.50 peering [2,0] 2 [2,0] 2 1.23 peering [3,0] 3 [3,0] 3 4.54 peering [2,0] 2 [2,0] 2 4.19 peering [3,0] 3 [3,0] 3 1.4b peering [0,3] 0 [0,3] 0 1.49 peering [0,3] 0 [0,3] 0 0.17 peering [0,3] 0 [0,3] 0 4.17 peering [0,3] 0 [0,3] 0 4.16 peering [0,3] 0 [0,3] 0 0.10 peering [0,3] 0 [0,3] 0 1.11 peering [0,2] 0 [0,2] 0 4.b peering [0,2] 0 [0,2] 0 1.3c peering [0,3] 0 [0,3] 0 0.c peering [0,3] 0 [0,3] 0 1.3a peering [3,0] 3 [3,0] 3 0.38 peering [2,0] 2 [2,0] 2 1.39 peering [0,2] 0 [0,2] 0 4.33 peering [2,0] 2 [2,0] 2 4.62 peering [2,0] 2 [2,0] 2 4.3 peering [0,2] 0 [0,2] 0 0.6 peering [0,2] 0 [0,2] 0 0.4 peering [2,0] 2 [2,0] 2 0.3 peering [2,0] 2 [2,0] 2 1.60 peering [0,3] 0 [0,3] 0 0.2 peering [3,0] 3 [3,0] 3 4.6 peering [3,0] 3 [3,0] 3 1.30 peering [0,3] 0 [0,3] 0 1.2f peering [0,2] 0 [0,2] 0 1.2a peering [3,0] 3 [3,0] 3 root@ceph-osd-3:/var/log/ceph# root@ceph-osd-3:/var/log/ceph# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -9 4.0 root default -8 4.0 region eu-west-1 -6 2.0 datacenter eu-west-1a -2 2.0 host ceph-osd-1 0 1.0 osd.0 up 1.0 1.0 1 1.0 osd.1 up 1.0 1.0 -4 2.0 host ceph-osd-3 4 1.0 osd.4 up 1.0 1.0 5 1.0 osd.5 up 1.0 1.0 -7 2.0 datacenter eu-west-1b -3 2.0 host ceph-osd-2 2 1.0 osd.2 up 1.0 1.0 3 1.0 osd.3 up 1.0 1.0 -5 2.0 host ceph-osd-4 6 1.0 osd.6 up 1.0 1.0 7 1.0 osd.7 up 1.0 1.0 root@ceph-osd-3:/var/log/ceph# Do you have guys any idea ? Why they stay in this state ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd du
That feature was added for the Infernalis release of Ceph -- the man pages for Hammer are located here [1]. Prior to Infernalis, this site describes a procedure to accomplish roughly the same task [2]. [1] http://docs.ceph.com/docs/hammer/man/8/rbd/ [2] http://www.sebastien-han.fr/blog/2013/12/19/real-size-of-a-ceph-rbd-image/ -- Jason Dillaman - Original Message - > From: "Allen Liao" > To: ceph-users@lists.ceph.com > Sent: Friday, August 21, 2015 3:24:54 PM > Subject: [ceph-users] rbd du > Hi all, > The online manual ( http://ceph.com/docs/master/man/8/rbd/ ) for rbd has > documentation for the 'du' command. I'm running ceph 0.94.2 and that command > isn't recognized, nor is it in the man page. > Is there another command that will "calculate the provisioned and actual disk > usage of all images and associated snapshots within the specified pool?" > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg stuck in peering state
Hi Reno, "Peering", as far as I understand it, is the osds trying to talk to each other. You have approximately 1 OSD worth of pgs stuck (i.e. 264 / 8), and osd.0 appears in each of the stuck pgs, alongside either osd.2 or osd.3. I'd start by checking the comms between osd.0 and osds 2 and 3 (including the MTU). Cheers, Chris On Fri, Dec 18, 2015 at 02:50:18PM +0100, Reno Rainz wrote: > Hi all, > > I reboot all my osd node after, I got some pg stuck in peering state. > > root@ceph-osd-3:/var/log/ceph# ceph -s > cluster 186717a6-bf80-4203-91ed-50d54fe8dec4 > health HEALTH_WARN > clock skew detected on mon.ceph-osd-2 > 33 pgs peering > 33 pgs stuck inactive > 33 pgs stuck unclean > Monitor clock skew detected > monmap e1: 3 mons at {ceph-osd-1= > 10.200.1.11:6789/0,ceph-osd-2=10.200.1.12:6789/0,ceph-osd-3=10.200.1.13:6789/0 > } > election epoch 14, quorum 0,1,2 ceph-osd-1,ceph-osd-2,ceph-osd-3 > osdmap e66: 8 osds: 8 up, 8 in > pgmap v1346: 264 pgs, 3 pools, 272 MB data, 653 objects > 808 MB used, 31863 MB / 32672 MB avail > 231 active+clean > 33 peering > root@ceph-osd-3:/var/log/ceph# > > > root@ceph-osd-3:/var/log/ceph# ceph pg dump_stuck > ok > pg_stat state up up_primary acting acting_primary > 4.2d peering [2,0] 2 [2,0] 2 > 1.57 peering [3,0] 3 [3,0] 3 > 1.24 peering [3,0] 3 [3,0] 3 > 1.52 peering [0,2] 0 [0,2] 0 > 1.50 peering [2,0] 2 [2,0] 2 > 1.23 peering [3,0] 3 [3,0] 3 > 4.54 peering [2,0] 2 [2,0] 2 > 4.19 peering [3,0] 3 [3,0] 3 > 1.4b peering [0,3] 0 [0,3] 0 > 1.49 peering [0,3] 0 [0,3] 0 > 0.17 peering [0,3] 0 [0,3] 0 > 4.17 peering [0,3] 0 [0,3] 0 > 4.16 peering [0,3] 0 [0,3] 0 > 0.10 peering [0,3] 0 [0,3] 0 > 1.11 peering [0,2] 0 [0,2] 0 > 4.b peering [0,2] 0 [0,2] 0 > 1.3c peering [0,3] 0 [0,3] 0 > 0.c peering [0,3] 0 [0,3] 0 > 1.3a peering [3,0] 3 [3,0] 3 > 0.38 peering [2,0] 2 [2,0] 2 > 1.39 peering [0,2] 0 [0,2] 0 > 4.33 peering [2,0] 2 [2,0] 2 > 4.62 peering [2,0] 2 [2,0] 2 > 4.3 peering [0,2] 0 [0,2] 0 > 0.6 peering [0,2] 0 [0,2] 0 > 0.4 peering [2,0] 2 [2,0] 2 > 0.3 peering [2,0] 2 [2,0] 2 > 1.60 peering [0,3] 0 [0,3] 0 > 0.2 peering [3,0] 3 [3,0] 3 > 4.6 peering [3,0] 3 [3,0] 3 > 1.30 peering [0,3] 0 [0,3] 0 > 1.2f peering [0,2] 0 [0,2] 0 > 1.2a peering [3,0] 3 [3,0] 3 > root@ceph-osd-3:/var/log/ceph# > > > root@ceph-osd-3:/var/log/ceph# ceph osd tree > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -9 4.0 root default > -8 4.0 region eu-west-1 > -6 2.0 datacenter eu-west-1a > -2 2.0 host ceph-osd-1 > 0 1.0 osd.0 up 1.0 1.0 > 1 1.0 osd.1 up 1.0 1.0 > -4 2.0 host ceph-osd-3 > 4 1.0 osd.4 up 1.0 1.0 > 5 1.0 osd.5 up 1.0 1.0 > -7 2.0 datacenter eu-west-1b > -3 2.0 host ceph-osd-2 > 2 1.0 osd.2 up 1.0 1.0 > 3 1.0 osd.3 up 1.0 1.0 > -5 2.0 host ceph-osd-4 > 6 1.0 osd.6 up 1.0 1.0 > 7 1.0 osd.7 up 1.0 1.0 > root@ceph-osd-3:/var/log/ceph# > > Do you have guys any idea ? Why they stay in this state ? > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs: large files hang
Gregory Farnum writes: > > What's the full output of "ceph -s"? > > The only time the MDS issues these "stat" ops on objects is during MDS > replay, but the bit where it's blocked on "reached_pg" in the OSD > makes it look like your OSD is just very slow. (Which could > potentially make the MDS back up far enough to get zapped by the > monitors, but in that case it's probably some kind of misconfiguration > issue if they're all hitting it.) > -Greg > Thanks for the suggestions. Here's the current messy output of "ceph -s": cluster ab8969a6-8b3e-497a-97da-ff06a5476e12 health HEALTH_WARN 8 pgs down 15 pgs incomplete 15 pgs stuck inactive 15 pgs stuck unclean 238 requests are blocked > 32 sec monmap e1: 3 mons at {0=192.168.1.31:6789/0,1=192.168.1.32:6789/0,2=192.168.1.33:6789/0} election epoch 42334, quorum 0,1,2 0,1,2 mdsmap e78771: 1/1/1 up {0=1=up:active}, 2 up:standby, 1 up:oneshot-replay(laggy or crashed) osdmap e194472: 58 osds: 58 up, 58 in pgmap v12811210: 1464 pgs, 3 pools, 25856 GB data, 8873 kobjects 52265 GB used, 55591 GB / 105 TB avail 1447 active+clean 8 down+incomplete 7 incomplete 2 active+clean+scrubbing The spurious "oneshot-replay" mds entry was caused by a typo in the mds name when I tried earlier to do a "ceph-mds --journal-check". I'm currently trying to copy a large file off of the ceph filesystem, and it's hung after 12582912 kB. The osd log is telling me things like: 2015-12-18 09:25:22.698124 7f5c0540a700 0 log_channel(cluster) log [WRN] : slow request 3840.705492 seconds old, received at 2015-12-18 08:21:21.992542: osd_op(mds.0.14959:1257 100010a7ba7. [create 0~0,setxattr parent (293)] 0.beb25de8 ondisk+write+known_if_redirected e194470) currently reached_pg dmesg, etc., show no errors for the osd disk or anything else, and the load on the osd server is nonexistent: 09:53:01 up 17:54, 1 user, load average: 0.05, 0.43, 0.42 When logged into the osd server, I can browse around on the osd's filesystem with no sluggishness: ls /var/lib/ceph/osd/ceph-406/current 0.10c_head 0.4d_head 1.164_head 1.a0_head 2.190_head commit_op_seq 0.10_head 0.57_head 1.18a_head 1.a3_head 2.46_head meta 0.151_head 0.9a_head 1.18c_head 1.e7_head 2.4b_head nosnap 0.165_head 0.9f_head 1.191_head 1.f_head2.55_head omap 0.18b_head 0.a1_head 1.47_head 2.10a_head 2.9d_head 0.18d_head 0.a4_head 1.4c_head 2.14f_head 2.9f_head 0.192_head 0.e8_head 1.56_head 2.163_head 2.a2_head 0.1b2_head 1.10b_head 1.99_head 2.189_head 2.e6_head 0.48_head 1.150_head 1.9e_head 2.18b_head 2.e_head ifconfig shows no errors on the osd server (public or cluster network): eth0 Link encap:Ethernet HWaddr 00:25:90:67:2A:2C inet addr:192.168.1.23 Bcast:192.168.3.255 Mask:255.255.252.0 inet6 addr: fe80::225:90ff:fe67:2a2c/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:13016012 errors:1 dropped:6 overruns:0 frame:1 TX packets:12839326 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1515148248 (1.4 GiB) TX bytes:1533480424 (1.4 GiB) Interrupt:16 Memory:fa9e-faa0 eth1 Link encap:Ethernet HWaddr 00:25:90:67:2A:2D inet addr:192.168.12.23 Bcast:192.168.15.255 Mask:255.255.252.0 inet6 addr: fe80::225:90ff:fe67:2a2d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:59263760 errors:0 dropped:18476 overruns:0 frame:0 TX packets:129010105 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:60511361818 (56.3 GiB) TX bytes:173505625103 (161.5 GiB) Interrupt:17 Memory:faae-fab0 Snooping with wireshark, I see traffic between osds on the cluster network and traffic between clients, and osds on the public network. The "incomplete" pgs are associated with a dead osd that's been removed from the cluster for a long time (since before the current problem). I thought this problem might be due to something wrong in the 4.* kernel, but I've reverted the ceph cluster back to the kernel that it was using the last time I'm sure things were working (3.19.3-1.el6.elrepo.x86_64) and the behavior is the same. I'm still looking for something that might tell me what's causing the osd requests to hang. Bryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs: large files hang
On Fri, Dec 18, 2015 at 7:03 AM, Bryan Wright wrote: > Gregory Farnum writes: >> >> What's the full output of "ceph -s"? >> >> The only time the MDS issues these "stat" ops on objects is during MDS >> replay, but the bit where it's blocked on "reached_pg" in the OSD >> makes it look like your OSD is just very slow. (Which could >> potentially make the MDS back up far enough to get zapped by the >> monitors, but in that case it's probably some kind of misconfiguration >> issue if they're all hitting it.) >> -Greg >> > > Thanks for the suggestions. Here's the current messy output of "ceph -s": > > cluster ab8969a6-8b3e-497a-97da-ff06a5476e12 > health HEALTH_WARN > 8 pgs down > 15 pgs incomplete > 15 pgs stuck inactive > 15 pgs stuck unclean > 238 requests are blocked > 32 sec > monmap e1: 3 mons at > {0=192.168.1.31:6789/0,1=192.168.1.32:6789/0,2=192.168.1.33:6789/0} > election epoch 42334, quorum 0,1,2 0,1,2 > mdsmap e78771: 1/1/1 up {0=1=up:active}, 2 up:standby, 1 > up:oneshot-replay(laggy or crashed) > osdmap e194472: 58 osds: 58 up, 58 in > pgmap v12811210: 1464 pgs, 3 pools, 25856 GB data, 8873 kobjects > 52265 GB used, 55591 GB / 105 TB avail > 1447 active+clean >8 down+incomplete >7 incomplete >2 active+clean+scrubbing > > > The spurious "oneshot-replay" mds entry was caused by a typo in the mds name > when I tried earlier to do a "ceph-mds --journal-check". > > I'm currently trying to copy a large file off of the ceph filesystem, and > it's hung after 12582912 kB. The osd log is telling me things like: > > 2015-12-18 09:25:22.698124 7f5c0540a700 0 log_channel(cluster) log [WRN] : > slow request 3840.705492 seconds old, received at 2015-12-18 > 08:21:21.992542: osd_op(mds.0.14959:1257 100010a7ba7. [create > 0~0,setxattr parent (293)] 0.beb25de8 ondisk+write+known_if_redirected > e194470) currently reached_pg > > dmesg, etc., show no errors for the osd disk or anything else, and the load > on the osd server is nonexistent: > >09:53:01 up 17:54, 1 user, load average: 0.05, 0.43, 0.42 > > When logged into the osd server, I can browse around on the osd's filesystem > with no sluggishness: > > ls /var/lib/ceph/osd/ceph-406/current > 0.10c_head 0.4d_head 1.164_head 1.a0_head 2.190_head commit_op_seq > 0.10_head 0.57_head 1.18a_head 1.a3_head 2.46_head meta > 0.151_head 0.9a_head 1.18c_head 1.e7_head 2.4b_head nosnap > 0.165_head 0.9f_head 1.191_head 1.f_head2.55_head omap > 0.18b_head 0.a1_head 1.47_head 2.10a_head 2.9d_head > 0.18d_head 0.a4_head 1.4c_head 2.14f_head 2.9f_head > 0.192_head 0.e8_head 1.56_head 2.163_head 2.a2_head > 0.1b2_head 1.10b_head 1.99_head 2.189_head 2.e6_head > 0.48_head 1.150_head 1.9e_head 2.18b_head 2.e_head > > ifconfig shows no errors on the osd server (public or cluster network): > > eth0 Link encap:Ethernet HWaddr 00:25:90:67:2A:2C > inet addr:192.168.1.23 Bcast:192.168.3.255 Mask:255.255.252.0 > inet6 addr: fe80::225:90ff:fe67:2a2c/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:13016012 errors:1 dropped:6 overruns:0 frame:1 > TX packets:12839326 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:1515148248 (1.4 GiB) TX bytes:1533480424 (1.4 GiB) > Interrupt:16 Memory:fa9e-faa0 > > eth1 Link encap:Ethernet HWaddr 00:25:90:67:2A:2D > inet addr:192.168.12.23 Bcast:192.168.15.255 Mask:255.255.252.0 > inet6 addr: fe80::225:90ff:fe67:2a2d/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:59263760 errors:0 dropped:18476 overruns:0 frame:0 > TX packets:129010105 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:60511361818 (56.3 GiB) TX bytes:173505625103 (161.5 GiB) > Interrupt:17 Memory:faae-fab0 > > Snooping with wireshark, I see traffic between osds on the cluster network > and traffic between clients, and osds on the public network. > > The "incomplete" pgs are associated with a dead osd that's been removed from > the cluster for a long time (since before the current problem). Nonetheless, it's probably your down or incomplete PGs causing the issue. You can check that by seeing if seed 0.5d427a9a (out of that blocked request you mentioned) belongs to one of the dead ones. -Greg > > I thought this problem might be due to something wrong in the 4.* kernel, > but I've > reverted the ceph cluster back to the kernel that it was using the last time > I'm sure things were working (3.19.3-1.el6.elrepo.x86_64) and the behavior > is the same. > > I'm still looking for something that might tell me what's
Re: [ceph-users] Cephfs: large files hang
Gregory Farnum writes: > > Nonetheless, it's probably your down or incomplete PGs causing the > issue. You can check that by seeing if seed 0.5d427a9a (out of that > blocked request you mentioned) belongs to one of the dead ones. > -Greg Hi Greg, How would I find out which pg this seed belongs to? Also, here's part of the "ceph pg nnn query" output for one of incomplete pgs: "probing_osds": [ "107", "201", "302", "406", "504" ], "down_osds_we_would_probe": [ 102 ], "peering_blocked_by": [] osd 102 is the dead osd. Bryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs, low performances
Hi Christian, On 18/12/2015 04:16, Christian Balzer wrote: >> It seems to me very bad. > Indeed. > Firstly let me state that I don't use CephFS and have no clues how this > influences things and can/should be tuned. Ok, no problem. Anyway, thanks for your answer. ;) > That being said, the fio above running in VM (RBD) gives me 440 IOPS > against a single OSD storage server (replica 1) with 4 crappy HDDs and > on-disk journals on my test cluster (1Gb/s links). > So yeah, given your configuration that's bad. I have tried a quick test with a rados block device (size = 4GB with filesystem EXT4) mounted on the same client node (the client node where I'm testing cephfs) and the same "fio" command give me iops read/write equal to ~1400. So my problem could be "cephfs" specific, no? That being said, I don't know if it's can be a symptom but during the bench the iops are real-time displayed and the value seems to me no very constant. I can see sometimes peacks at 1800 iops and suddenly the value is 800 iops and re-turns up at ~1400 etc. > In comparison I get 3000 IOPS against a production cluster (so not idle) > with 4 storage nodes. Each with 4 100GB DC S3700 for journals and OS and 8 > SATA HDDs, Infiniband (IPoIB) connectivity for everything. > > All of this is with .80.x (Firefly) on Debian Jessie. Ok, interesting. My cluster is idle and but I have approximatively twice as less disks than your cluster and my SATA disk are directly connected on the motherboard. So, it seems to me logical that I have ~1400 and you ~3000, no? > You want to use atop on all your nodes and look for everything from disks > to network utilization. > There might be nothing obvious going on, but it needs to be ruled out. It's a detail but I have noticed that atop (on Ubuntu Trusty) don't display the % of bandwidth of my 10GbE interface. Anyway, I have tried to inspect the node cluster during the cephfs bench, but I have seen no bottleneck concerning CPU, network and disks. >> I use Ubuntu 14.04 on each server with the 3.13 kernel (it's the same >> for the client ceph where I run my bench) and I use Ceph 9.2.0 >> (Infernalis). > > I seem to recall that this particular kernel has issues, you might want to > scour the archives here. But, in my case, I use cephfs-fuse in the client node so the kernel version is not relevant I think. And I thought that the kernel version was not very important in the cluster nodes side. Am I wrong? >> On the client, cephfs is mounted via cephfs-fuse with this >> in /etc/fstab: >> >> id=cephfs,keyring=/etc/ceph/ceph.client.cephfs.keyring,client_mountpoint=/ >> /mnt/cephfs >> fuse.cephnoatime,defaults,_netdev0 0 >> >> I have 5 cluster node servers "Supermicro Motherboard X10SLM+-LN4 S1150" >> with one 1GbE port for the ceph public network and one 10GbE port for >> the ceph private network: >> > For the sake of latency (which becomes the biggest issues when you're not > exhausting CPU/DISK), you'd be better off with everything on 10GbE, unless > you need the 1GbE to connect to clients that have no 10Gb/s ports. Yes, exactly. My client is 1Gb/s only. >> - 1 x Intel Xeon E3-1265Lv3 >> - 1 SSD DC3710 Series 200GB (with partitions for the OS, the 3 >> OSD-journals and, just for ceph01, ceph02 and ceph03, the SSD contains >> too a partition for the workdir of a monitor > The 200GB DC S3700 would have been faster, but that's a moot point and not > your bottleneck for sure. > >> - 3 HD 4TB Western Digital (WD) SATA 7200rpm >> - RAM 32GB >> - NO RAID controlleur > > Which controller are you using? No controller, the 3 SATA disks of my client are directly connected on the SATA ports of the motherboard. > I recently came across an Adaptec SATA3 HBA that delivered only 176 MB/s > writes with 200GB DC S3700s as opposed to 280MB/s when used with Intel > onboard SATA-3 ports or a LSI 9211-4i HBA. Thanks for your help Christian. -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Inconsistent PG / Impossible deep-scrub
Good day everyone, I currently manage a Ceph cluster running Firefly 0.80.10, we had some maintenance which implied stopping OSD and starting them back again. This caused one of the hard drive to notice it had a bad sector and then Ceph to mark it as inconsistent. After reparing the physical issue, I went and tried ceph pg repair, no action, then I tried ceph pg deep-scrub, still no action. I verified the log of each OSD which had the PG and confirmed that nothing was logged, no repair, no deep-scrub. After trying deep-scrubbing manually other PGs, I confirmed that my requests were completely ignored. The only flag set is noout since this cluster is too small, but automatic deep-scrubs are working and are logged both in ceph.log and the OSD log. I tried restarting the monitor in charge to elect a new one and restart each affected OSD for the inconsistent PG with no success. I also tried to fix the defective object myself in case it was hanging something, now the object has the same checksum on each OSD. Is there a way to ask the OSD directly to deep-scrub without using the monitor? Is there a known issue about commands getting ignored? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg stuck in peering state
Hi Chris, Thank for your answer. All the nodes are on AWS and I didn't change security group configuration. 2015-12-18 15:41 GMT+01:00 Chris Dunlop : > Hi Reno, > > "Peering", as far as I understand it, is the osds trying to talk to each > other. > > You have approximately 1 OSD worth of pgs stuck (i.e. 264 / 8), and osd.0 > appears in each of the stuck pgs, alongside either osd.2 or osd.3. > > I'd start by checking the comms between osd.0 and osds 2 and 3 (including > the MTU). > > Cheers, > > Chris > > > On Fri, Dec 18, 2015 at 02:50:18PM +0100, Reno Rainz wrote: > > Hi all, > > > > I reboot all my osd node after, I got some pg stuck in peering state. > > > > root@ceph-osd-3:/var/log/ceph# ceph -s > > cluster 186717a6-bf80-4203-91ed-50d54fe8dec4 > > health HEALTH_WARN > > clock skew detected on mon.ceph-osd-2 > > 33 pgs peering > > 33 pgs stuck inactive > > 33 pgs stuck unclean > > Monitor clock skew detected > > monmap e1: 3 mons at {ceph-osd-1= > > > 10.200.1.11:6789/0,ceph-osd-2=10.200.1.12:6789/0,ceph-osd-3=10.200.1.13:6789/0 > > } > > election epoch 14, quorum 0,1,2 > ceph-osd-1,ceph-osd-2,ceph-osd-3 > > osdmap e66: 8 osds: 8 up, 8 in > > pgmap v1346: 264 pgs, 3 pools, 272 MB data, 653 objects > > 808 MB used, 31863 MB / 32672 MB avail > > 231 active+clean > > 33 peering > > root@ceph-osd-3:/var/log/ceph# > > > > > > root@ceph-osd-3:/var/log/ceph# ceph pg dump_stuck > > ok > > pg_stat state up up_primary acting acting_primary > > 4.2d peering [2,0] 2 [2,0] 2 > > 1.57 peering [3,0] 3 [3,0] 3 > > 1.24 peering [3,0] 3 [3,0] 3 > > 1.52 peering [0,2] 0 [0,2] 0 > > 1.50 peering [2,0] 2 [2,0] 2 > > 1.23 peering [3,0] 3 [3,0] 3 > > 4.54 peering [2,0] 2 [2,0] 2 > > 4.19 peering [3,0] 3 [3,0] 3 > > 1.4b peering [0,3] 0 [0,3] 0 > > 1.49 peering [0,3] 0 [0,3] 0 > > 0.17 peering [0,3] 0 [0,3] 0 > > 4.17 peering [0,3] 0 [0,3] 0 > > 4.16 peering [0,3] 0 [0,3] 0 > > 0.10 peering [0,3] 0 [0,3] 0 > > 1.11 peering [0,2] 0 [0,2] 0 > > 4.b peering [0,2] 0 [0,2] 0 > > 1.3c peering [0,3] 0 [0,3] 0 > > 0.c peering [0,3] 0 [0,3] 0 > > 1.3a peering [3,0] 3 [3,0] 3 > > 0.38 peering [2,0] 2 [2,0] 2 > > 1.39 peering [0,2] 0 [0,2] 0 > > 4.33 peering [2,0] 2 [2,0] 2 > > 4.62 peering [2,0] 2 [2,0] 2 > > 4.3 peering [0,2] 0 [0,2] 0 > > 0.6 peering [0,2] 0 [0,2] 0 > > 0.4 peering [2,0] 2 [2,0] 2 > > 0.3 peering [2,0] 2 [2,0] 2 > > 1.60 peering [0,3] 0 [0,3] 0 > > 0.2 peering [3,0] 3 [3,0] 3 > > 4.6 peering [3,0] 3 [3,0] 3 > > 1.30 peering [0,3] 0 [0,3] 0 > > 1.2f peering [0,2] 0 [0,2] 0 > > 1.2a peering [3,0] 3 [3,0] 3 > > root@ceph-osd-3:/var/log/ceph# > > > > > > root@ceph-osd-3:/var/log/ceph# ceph osd tree > > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT > PRIMARY-AFFINITY > > -9 4.0 root default > > -8 4.0 region eu-west-1 > > -6 2.0 datacenter eu-west-1a > > -2 2.0 host ceph-osd-1 > > 0 1.0 osd.0 up 1.0 > 1.0 > > 1 1.0 osd.1 up 1.0 > 1.0 > > -4 2.0 host ceph-osd-3 > > 4 1.0 osd.4 up 1.0 > 1.0 > > 5 1.0 osd.5 up 1.0 > 1.0 > > -7 2.0 datacenter eu-west-1b > > -3 2.0 host ceph-osd-2 > > 2 1.0 osd.2 up 1.0 > 1.0 > > 3 1.0 osd.3 up 1.0 > 1.0 > > -5 2.0 host ceph-osd-4 > > 6 1.0 osd.6 up 1.0 > 1.0 > > 7 1.0 osd.7 up 1.0 > 1.0 > > root@ceph-osd-3:/var/log/ceph# > > > > Do you have guys any idea ? Why they stay in this state ? > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg stuck in peering state
I think I was in a hurry, everything is fine now. root@ceph-osd-1:/var/log/ceph# ceph -s cluster 186717a6-bf80-4203-91ed-50d54fe8dec4 health HEALTH_OK monmap e1: 3 mons at {ceph-osd-1= 10.200.1.11:6789/0,ceph-osd-2=10.200.1.12:6789/0,ceph-osd-3=10.200.1.13:6789/0 } election epoch 14, quorum 0,1,2 ceph-osd-1,ceph-osd-2,ceph-osd-3 osdmap e66: 8 osds: 8 up, 8 in pgmap v1439: 264 pgs, 3 pools, 272 MB data, 653 objects 809 MB used, 31862 MB / 32672 MB avail 264 active+clean root@ceph-osd-1:/var/log/ceph# How I can see what's going on in the cluster, what kind of action is running ? 2015-12-18 14:50 GMT+01:00 Reno Rainz : > Hi all, > > I reboot all my osd node after, I got some pg stuck in peering state. > > root@ceph-osd-3:/var/log/ceph# ceph -s > cluster 186717a6-bf80-4203-91ed-50d54fe8dec4 > health HEALTH_WARN > clock skew detected on mon.ceph-osd-2 > 33 pgs peering > 33 pgs stuck inactive > 33 pgs stuck unclean > Monitor clock skew detected > monmap e1: 3 mons at {ceph-osd-1= > 10.200.1.11:6789/0,ceph-osd-2=10.200.1.12:6789/0,ceph-osd-3=10.200.1.13:6789/0 > } > election epoch 14, quorum 0,1,2 > ceph-osd-1,ceph-osd-2,ceph-osd-3 > osdmap e66: 8 osds: 8 up, 8 in > pgmap v1346: 264 pgs, 3 pools, 272 MB data, 653 objects > 808 MB used, 31863 MB / 32672 MB avail > 231 active+clean > 33 peering > root@ceph-osd-3:/var/log/ceph# > > > root@ceph-osd-3:/var/log/ceph# ceph pg dump_stuck > ok > pg_stat state up up_primary acting acting_primary > 4.2d peering [2,0] 2 [2,0] 2 > 1.57 peering [3,0] 3 [3,0] 3 > 1.24 peering [3,0] 3 [3,0] 3 > 1.52 peering [0,2] 0 [0,2] 0 > 1.50 peering [2,0] 2 [2,0] 2 > 1.23 peering [3,0] 3 [3,0] 3 > 4.54 peering [2,0] 2 [2,0] 2 > 4.19 peering [3,0] 3 [3,0] 3 > 1.4b peering [0,3] 0 [0,3] 0 > 1.49 peering [0,3] 0 [0,3] 0 > 0.17 peering [0,3] 0 [0,3] 0 > 4.17 peering [0,3] 0 [0,3] 0 > 4.16 peering [0,3] 0 [0,3] 0 > 0.10 peering [0,3] 0 [0,3] 0 > 1.11 peering [0,2] 0 [0,2] 0 > 4.b peering [0,2] 0 [0,2] 0 > 1.3c peering [0,3] 0 [0,3] 0 > 0.c peering [0,3] 0 [0,3] 0 > 1.3a peering [3,0] 3 [3,0] 3 > 0.38 peering [2,0] 2 [2,0] 2 > 1.39 peering [0,2] 0 [0,2] 0 > 4.33 peering [2,0] 2 [2,0] 2 > 4.62 peering [2,0] 2 [2,0] 2 > 4.3 peering [0,2] 0 [0,2] 0 > 0.6 peering [0,2] 0 [0,2] 0 > 0.4 peering [2,0] 2 [2,0] 2 > 0.3 peering [2,0] 2 [2,0] 2 > 1.60 peering [0,3] 0 [0,3] 0 > 0.2 peering [3,0] 3 [3,0] 3 > 4.6 peering [3,0] 3 [3,0] 3 > 1.30 peering [0,3] 0 [0,3] 0 > 1.2f peering [0,2] 0 [0,2] 0 > 1.2a peering [3,0] 3 [3,0] 3 > root@ceph-osd-3:/var/log/ceph# > > > root@ceph-osd-3:/var/log/ceph# ceph osd tree > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -9 4.0 root default > -8 4.0 region eu-west-1 > -6 2.0 datacenter eu-west-1a > -2 2.0 host ceph-osd-1 > 0 1.0 osd.0 up 1.0 1.0 > 1 1.0 osd.1 up 1.0 1.0 > -4 2.0 host ceph-osd-3 > 4 1.0 osd.4 up 1.0 1.0 > 5 1.0 osd.5 up 1.0 1.0 > -7 2.0 datacenter eu-west-1b > -3 2.0 host ceph-osd-2 > 2 1.0 osd.2 up 1.0 1.0 > 3 1.0 osd.3 up 1.0 1.0 > -5 2.0 host ceph-osd-4 > 6 1.0 osd.6 up 1.0 1.0 > 7 1.0 osd.7 up 1.0 1.0 > root@ceph-osd-3:/var/log/ceph# > > Do you have guys any idea ? Why they stay in this state ? > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph armhf package updates
Hello all, i seem to have a problem with the ceph version available at ports.ubuntu.com in the armhf branch. The latest available version is now infernalis 9.2, however, whenever i try to update my system, i still get the hammer version (0.94.5). I've been checking everyday, and it seems the automatic script that creates the Packages file in each directory still registers the old version. I'm not sure who should fix this, please let me know if I'm in the wrong place. Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Journal symlink broken / Ceph 0.94.5 / CentOS 6.7
Hi Loic, Damn, the updated udev didn't fix the problem :-( The rc.local workaround is also complaining; INFO:ceph-disk:Running command: /usr/bin/ceph-osd -i 0 --get-journal-uuid --osd-journal /dev/sdc3 libust[2648/2648]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305) HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device DEBUG:ceph-disk:Journal /dev/sdc3 has OSD UUID ---- INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- /dev/disk/by-partuuid/---- error: /dev/disk/by-partuuid/----: No such file or directory ceph-disk: Cannot discover filesystem type: device /dev/disk/by-partuuid/----: Command '/sbin/blkid' returned non-zero exit status 2 INFO:ceph-disk:Running command: /usr/bin/ceph-osd -i 0 --get-journal-uuid --osd-journal /dev/sdc4 libust[2687/2687]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305) HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device DEBUG:ceph-disk:Journal /dev/sdc4 has OSD UUID ---- INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- /dev/disk/by-partuuid/---- error: /dev/disk/by-partuuid/----: No such file or directory ceph-disk: Cannot discover filesystem type: device /dev/disk/by-partuuid/----: Command '/sbin/blkid' returned non-zero exit status 2 /dev/sdc1 and /dev/sdc2 contains the boot loader and OS, so driverwise i guess things are working :-) But "HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device" seems to be the underlying issue. Any thoughts? /Jesper * Hi Loic, searched around for possible udev bugs, and then tried to run "yum update". Udev did have a fresh update with the following version diffs; udev-147-2.63.el6_7.1.x86_64 --> udev-147-2.63.el6_7.1.x86_64 from what i can see this update fixes stuff related to symbolic links / external devices. /dev/sdc sits on external eSata. So... https://rhn.redhat.com/errata/RHBA-2015-1382.html will reboot tonight and get back :-) /jesper ***' I guess that's the problem you need to solve : why /dev/sdc does not generate udev events (different driver than /dev/sda maybe ?). Once it does, Ceph should work. A workaround could be to add somethink like: ceph-disk-udev 3 sdc3 sdc ceph-disk-udev 4 sdc4 sdc in /etc/rc.local. On 17/12/2015 12:01, Jesper Thorhauge wrote: > Nope, the previous post contained all that was in the boot.log :-( > > /Jesper > > ** > > - Den 17. dec 2015, kl. 11:53, Loic Dachary skrev: > > On 17/12/2015 11:33, Jesper Thorhauge wrote: >> Hi Loic, >> >> Sounds like something does go wrong when /dev/sdc3 shows up. Is there anyway >> i can debug this further? Log-files? Modify the .rules file...? > > Do you see traces of what happens when /dev/sdc3 shows up in boot.log ? > >> >> /Jesper >> >> >> >> The non-symlink files in /dev/disk/by-partuuid come to existence because of: >> >> * system boots >> * udev rule calls ceph-disk-udev via 95-ceph-osd.rules on /dev/sda1 >> * ceph-disk-udev creates the symlink >> /dev/disk/by-partuuid/c83b5aa5-fe77-42f6-9415-25ca0266fb7f -> ../../sdb1 >> * ceph-disk activate /dev/sda1 is mounted and finds a symlink to the journal >> journal -> /dev/disk/by-partuuid/1e9d527f-0866-4284-b77c-c1cb04c5a168 which >> does not yet exists because /dev/sdc udev rules have not been run yet >> * ceph-osd opens the journal in write mode and that creates the file >> /dev/disk/by-partuuid/1e9d527f-0866-4284-b77c-c1cb04c5a168 as a regular file >> * the file is empty and the osd fails to activate with the error you see >> (EINVAL because the file is empty) >> >> This is ok, supported and expected since there is no way to know which disk >> will show up first. >> >> When /dev/sdc shows up, the same logic will be triggered: >> >> * udev rule calls ceph-disk-udev via 95-ceph-osd.rules on /dev/sda1 >> * ceph-disk-udev creates the symlink >> /dev/disk/by-partuuid/1e9d527f-0866-4284-b77c-c1cb04c5a168 -> ../../sdc3 >> (overriding the file because ln -sf) >> * ceph-disk activate-journal /dev/sdc3 finds that >> c83b5aa5-fe77-42f6-9415-25ca0266fb7f is the data partition for that journal >> and mounts /dev/disk/by-partuuid/c83b5aa5-fe77-42f6-9415-25ca0266fb7f >> * ceph-osd opens the journal and all is well >> >> Except something goes wrong in your case, presumably because ceph-disk-udev >> is not called when /dev/sdc3 shows up ? >> >> On 17/12/2015 08:29, Jesper Thorhauge wrote: >>> Hi Loic, >>> >>> osd's are on /dev/sda and /dev/sdb, journal's is on /dev/s
[ceph-users] 2016 Ceph Tech Talks
Hey cephers, Before we all head off to various holiday shenanigans and befuddle our senses with rest, relaxation, and glorious meals of legend, I wanted to give you something to look forward to for 2016 in the form of Ceph Tech Talks! http://ceph.com/ceph-tech-talks/ First on the docket in January is our rescheduled talk from earlier this year discussing a PostgreSQL setup on Ceph under Mesos/Aurora with Docker. That should be a great talk that hits a lot of the questions I am frequently asked about database workloads, ceph, and containers all in one. While I haven’t solidified the specific speaker/date/time, our plans for February are to dig in to the immanent release of CephFS (hooray!) in Jewel. We’ll take a look at what awesomeness is being delivered, and where CephFS is headed next. March is wide open, so if you or someone you know would like to give a Ceph Tech Talk, I’d love to find a community volunteer to talk about a technical topic that is Ceph-related for about an hour over videoconference. Please drop me a line if this is interesting to you. In April we will once again be visiting the OpenStack Developer Summit (this time in TX), as well as working to deliver a Ceph track like we did in Tokyo. My hope is to broadcast some of this content for consumption by remote participants. Keep an eye out! If you have any questions about upcoming events or community endeavors please feel free to drop me a line. Thanks! -- Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Journal symlink broken / Ceph 0.94.5 / CentOS 6.7
Hi Jesper, The goal of the rc.local is twofold but mainly to ensure the /dev/disk/by-partuuid symlinks exists for the journals. Is it the case ? Cheers On 18/12/2015 19:50, Jesper Thorhauge wrote: > Hi Loic, > > Damn, the updated udev didn't fix the problem :-( > > The rc.local workaround is also complaining; > > INFO:ceph-disk:Running command: /usr/bin/ceph-osd -i 0 --get-journal-uuid > --osd-journal /dev/sdc3 > libust[2648/2648]: Warning: HOME environment variable not set. Disabling > LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305) > HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device > DEBUG:ceph-disk:Journal /dev/sdc3 has OSD UUID > ---- > INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- > /dev/disk/by-partuuid/---- > error: /dev/disk/by-partuuid/----: No such > file or directory > ceph-disk: Cannot discover filesystem type: device > /dev/disk/by-partuuid/----: Command > '/sbin/blkid' returned non-zero exit status 2 > INFO:ceph-disk:Running command: /usr/bin/ceph-osd -i 0 --get-journal-uuid > --osd-journal /dev/sdc4 > libust[2687/2687]: Warning: HOME environment variable not set. Disabling > LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305) > HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device > DEBUG:ceph-disk:Journal /dev/sdc4 has OSD UUID > ---- > INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- > /dev/disk/by-partuuid/---- > error: /dev/disk/by-partuuid/----: No such > file or directory > ceph-disk: Cannot discover filesystem type: device > /dev/disk/by-partuuid/----: Command > '/sbin/blkid' returned non-zero exit status 2 > > /dev/sdc1 and /dev/sdc2 contains the boot loader and OS, so driverwise i > guess things are working :-) > > But "HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device" seems > to be the underlying issue. > > Any thoughts? > > /Jesper > > * > > Hi Loic, > > searched around for possible udev bugs, and then tried to run "yum update". > Udev did have a fresh update with the following version diffs; > > udev-147-2.63.el6_7.1.x86_64 --> udev-147-2.63.el6_7.1.x86_64 > > from what i can see this update fixes stuff related to symbolic links / > external devices. /dev/sdc sits on external eSata. So... > > https://rhn.redhat.com/errata/RHBA-2015-1382.html > > will reboot tonight and get back :-) > > /jesper > > ***' > > I guess that's the problem you need to solve : why /dev/sdc does not generate > udev events (different driver than /dev/sda maybe ?). Once it does, Ceph > should work. > > A workaround could be to add somethink like: > > ceph-disk-udev 3 sdc3 sdc > ceph-disk-udev 4 sdc4 sdc > > in /etc/rc.local. > > On 17/12/2015 12:01, Jesper Thorhauge wrote: >> Nope, the previous post contained all that was in the boot.log :-( >> >> /Jesper >> >> ** >> >> - Den 17. dec 2015, kl. 11:53, Loic Dachary skrev: >> >> On 17/12/2015 11:33, Jesper Thorhauge wrote: >>> Hi Loic, >>> >>> Sounds like something does go wrong when /dev/sdc3 shows up. Is there >>> anyway i can debug this further? Log-files? Modify the .rules file...? >> >> Do you see traces of what happens when /dev/sdc3 shows up in boot.log ? >> >>> >>> /Jesper >>> >>> >>> >>> The non-symlink files in /dev/disk/by-partuuid come to existence because of: >>> >>> * system boots >>> * udev rule calls ceph-disk-udev via 95-ceph-osd.rules on /dev/sda1 >>> * ceph-disk-udev creates the symlink >>> /dev/disk/by-partuuid/c83b5aa5-fe77-42f6-9415-25ca0266fb7f -> ../../sdb1 >>> * ceph-disk activate /dev/sda1 is mounted and finds a symlink to the >>> journal journal -> >>> /dev/disk/by-partuuid/1e9d527f-0866-4284-b77c-c1cb04c5a168 which does not >>> yet exists because /dev/sdc udev rules have not been run yet >>> * ceph-osd opens the journal in write mode and that creates the file >>> /dev/disk/by-partuuid/1e9d527f-0866-4284-b77c-c1cb04c5a168 as a regular file >>> * the file is empty and the osd fails to activate with the error you see >>> (EINVAL because the file is empty) >>> >>> This is ok, supported and expected since there is no way to know which disk >>> will show up first. >>> >>> When /dev/sdc shows up, the same logic will be triggered: >>> >>> * udev rule calls ceph-disk-udev via 95-ceph-osd.rules on /dev/sda1 >>> * ceph-disk-udev creates the symlink >>> /dev/disk/by-partuuid/1e9d527f-0866-4284-b77c-c1cb04c5a168 -> ../../sdc3 >>> (overriding the file because ln -sf) >>> * ceph-disk activate-journal /dev/sdc3 finds that >>> c83b5aa5-fe77-42f6-9415-25ca0266fb7f is the data partition for that journal >>> and mounts /dev/disk/by-partuuid
[ceph-users] cephfs 'lag' / hang
I have 3 systems w/ a cephfs mounted on them. And i am seeing material 'lag'. By 'lag' i mean it hangs for little bits of time (1s, sometimes 5s). But very non repeatable. If i run time find . -type f -print0 | xargs -0 stat > /dev/null it might take ~130ms. But, it might take 10s. Once i've done it, it tends to stay @ the ~130ms, suggesting whatever data is now in cache. On the cases it hangs, if i remove the stat, its hanging on the find of one file. It might hiccup 1 or 2 times in the find across 10k files. This lag might affect e.g. 'cwd', writing a file, basically all operations. Does anyone have any suggestions? Its very irritating problem. I do no see errors in dmesg. The 3 systems w/ the filesystem mounted are running Ubuntu 15.10 w/ 4.3.0-040300-generic kernel. They are running cephfs from the kernel driver, mounted in /etc/fstab as: 10.100.10.60,10.100.10.61,10.100.10.62:/ /cephfs ceph _netdev,noauto,noatime,x-systemd.requires=network-online.target,x-systemd.automount,x-systemd.device-timeout=10,name=admin,secret=== 0 2 I have 3 mds, 1 active, 2 standby. The 3 machines are also the mons {nubo-1/-2/-3} are the ones that have the cephfs mounted. They have a 9K mtu between the systems, and i have checked with ping -s ### -M do that there are no blackholes in size... up to 8954 works, and and 8955 gives 'would fragment'. All the storage devices are 1TB Samsung SSD, and all are on sata. There is no material load on the system while this is occurring (a bit of background fs usage i guess, but its otherwise idle, just me). $ ceph status cluster b23abffc-71c4-4464-9449-3f2c9fbe1ded health HEALTH_OK monmap e1: 3 mons at {nubo-1= 10.100.10.60:6789/0,nubo-2=10.100.10.61:6789/0,nubo-3=10.100.10.62:6789/0} election epoch 1070, quorum 0,1,2 nubo-1,nubo-2,nubo-3 mdsmap e587: 1/1/1 up {0=nubo-2=up:active}, 2 up:standby osdmap e2346: 6 osds: 6 up, 6 in pgmap v113350: 840 pgs, 6 pools, 143 GB data, 104 kobjects 288 GB used, 5334 GB / 5622 GB avail 840 active+clean I've checked and the network between them is perfect: no loss, ~no latency ( << 1ms, they are adjacent on an L2 segment), as are all the osd [there are 6 osd]. ceph osd tree ID WEIGHT TYPE NAMEUP/DOWN REWEIGHT PRIMARY-AFFINITY -1 5.48996 root default -2 0.8 host nubo-1 0 0.8 osd.0 up 1.0 1.0 -3 0.8 host nubo-2 1 0.8 osd.1 up 1.0 1.0 -4 0.8 host nubo-3 2 0.8 osd.2 up 1.0 1.0 -5 0.92999 host nubo-19 3 0.92999 osd.3 up 1.0 1.0 -6 0.92999 host nubo-20 4 0.92999 osd.4 up 1.0 1.0 -7 0.92999 host nubo-21 5 0.92999 osd.5 up 1.0 1.0 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs, low performances
On 17 December 2015 at 21:36, Francois Lafont wrote: > Hi, > > I have ceph cluster currently unused and I have (to my mind) very low > performances. > I'm not an expert in benchs, here an example of quick bench: > > --- > # fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 > --name=readwrite --filename=rw.data --bs=4k --iodepth=64 --size=300MB > --readwrite=randrw --rwmixread=50 > readwrite: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, > iodepth=64 > fio-2.1.3 > > ... I am seeing the same sort of issue. If i run your 'fio' command sequence on my cephfs, i see ~120 iops. If i run it on one of the underlying osd (e.g. in /var... on the mount point of the xfs), i get ~20k iops. On the single SSD mount point it completes in ~1s. On the cephfs, it takes ~17min. I'm on Ubuntu 15.10 4.3.0-040300-generic kernel. my 'ceph -w' while this fio is running shows ~550kB/s read/write. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Kernel 4.1.x RBD very slow on writes
On Fri, Dec 18, 2015 at 9:24 PM, Alex Gorbachev wrote: > Hi Ilya, > > On Fri, Dec 18, 2015 at 11:46 AM, Ilya Dryomov wrote: >> >> On Fri, Dec 18, 2015 at 5:40 PM, Alex Gorbachev >> wrote: >> > Hi Ilya >> > >> > On Fri, Dec 18, 2015 at 6:50 AM, Ilya Dryomov >> > wrote: >> >> >> >> On Fri, Dec 18, 2015 at 10:55 AM, Alex Gorbachev >> >> >> >> wrote: >> >> > I hope this can help anyone who is running into the same issue as us >> >> > - >> >> > kernels 4.1.x appear to have terrible RBD sequential write >> >> > performance. >> >> > Kernels before and after are great. >> >> > >> >> > I tested with 4.1.6 and 4.1.15 on Ubuntu 14.04.3, ceph hammer 0.94.5 >> >> > - a >> >> > simple dd test yields this result: >> >> > >> >> > dd if=/dev/zero of=/dev/rbd0 bs=1M count=1000 1000+0 records in >> >> > 1000+0 >> >> > records out 1048576000 bytes (1.0 GB) copied, 46.3618 s, 22.6 MB/s >> >> > >> >> > On 3.19 and 4.2.8, quite another story: >> >> > >> >> > dd if=/dev/zero of=/dev/rbd0 bs=1M count=1000 1000+0 records in >> >> > 1000+0 >> >> > records out 1048576000 bytes (1.0 GB) copied, 2.18914 s, 479 MB/s >> >> >> >> This is due to an old regression in blk-mq. rbd was switched to blk-mq >> >> infrastructure in 4.0, the regression in blk-mq core was fixed in 4.2 >> >> by commit e6c4438ba7cb "blk-mq: fix plugging in blk_sq_make_request". >> >> It's outside of rbd and wasn't backported, so we are kind of stuck with >> >> it. >> > >> > >> > Thank you for answering that question, this was a huge puzzle for us. >> > So >> > the fix is 4.2, is the earliest stable 3.18? >> >> The problem was in blk-mq code. rbd started interfacing with it in >> 4.0, so anything before 4.0 wouldn't have this particular issue. > > > Thanks again - one last question - this would not affect the OSD nodes at > all, correct? It affects all devices which use blk-mq infrastructure, but only have a single hardware (or virtual) queue. The bug was basically that the queue in this case wasn't plugged, leaving little chance to merge any requests. With locally attached storage that's not the end of the world, but with rbd, which has to go over the network, you see this kind of performance drop. IIRC you still have to opt-in for scsi_mq, so if you are using the usual scsi drivers on your OSD nodes you shouldn't be affected. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Journal symlink broken / Ceph 0.94.5 / CentOS 6.7
Hi Loic, Getting closer! lrwxrwxrwx 1 root root 10 Dec 18 19:43 1e9d527f-0866-4284-b77c-c1cb04c5a168 -> ../../sdc4 lrwxrwxrwx 1 root root 10 Dec 18 19:43 c34d4694-b486-450d-b57f-da24255f0072 -> ../../sdc3 lrwxrwxrwx 1 root root 10 Dec 18 19:42 c83b5aa5-fe77-42f6-9415-25ca0266fb7f -> ../../sdb1 lrwxrwxrwx 1 root root 10 Dec 18 19:42 e85f4d92-c8f1-4591-bd2a-aa43b80f58f6 -> ../../sda1 So symlinks are now working! Activating an OSD is a different story :-( "ceph-disk -vv activate /dev/sda1" gives me; INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- /dev/sda1 INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs DEBUG:ceph-disk:Mounting /dev/sda1 on /var/lib/ceph/tmp/mnt.A99cDp with options noatime,inode64 INFO:ceph-disk:Running command: /bin/mount -t xfs -o noatime,inode64 -- /dev/sda1 /var/lib/ceph/tmp/mnt.A99cDp DEBUG:ceph-disk:Cluster uuid is 07b5c90b-6cae-40c0-93b2-31e0ebad7315 INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid DEBUG:ceph-disk:Cluster name is ceph DEBUG:ceph-disk:OSD uuid is e85f4d92-c8f1-4591-bd2a-aa43b80f58f6 DEBUG:ceph-disk:OSD id is 6 DEBUG:ceph-disk:Initializing OSD... INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/tmp/mnt.A99cDp/activate.monmap got monmap epoch 6 INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster ceph --mkfs --mkkey -i 6 --monmap /var/lib/ceph/tmp/mnt.A99cDp/activate.monmap --osd-data /var/lib/ceph/tmp/mnt.A99cDp --osd-journal /var/lib/ceph/tmp/mnt.A99cDp/journal --osd-uuid e85f4d92-c8f1-4591-bd2a-aa43b80f58f6 --keyring /var/lib/ceph/tmp/mnt.A99cDp/keyring HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device 2015-12-18 21:58:12.489357 7f266d7b0800 -1 journal check: ondisk fsid ---- doesn't match expected e85f4d92-c8f1-4591-bd2a-aa43b80f58f6, invalid (someone else's?) journal HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device 2015-12-18 21:58:12.680566 7f266d7b0800 -1 filestore(/var/lib/ceph/tmp/mnt.A99cDp) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory 2015-12-18 21:58:12.865810 7f266d7b0800 -1 created object store /var/lib/ceph/tmp/mnt.A99cDp journal /var/lib/ceph/tmp/mnt.A99cDp/journal for osd.6 fsid 07b5c90b-6cae-40c0-93b2-31e0ebad7315 2015-12-18 21:58:12.865844 7f266d7b0800 -1 auth: error reading file: /var/lib/ceph/tmp/mnt.A99cDp/keyring: can't open /var/lib/ceph/tmp/mnt.A99cDp/keyring: (2) No such file or directory 2015-12-18 21:58:12.865910 7f266d7b0800 -1 created new key in keyring /var/lib/ceph/tmp/mnt.A99cDp/keyring INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup init DEBUG:ceph-disk:Marking with init system sysvinit DEBUG:ceph-disk:Authorizing OSD key... INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring auth add osd.6 -i /var/lib/ceph/tmp/mnt.A99cDp/keyring osd allow * mon allow profile osd Error EINVAL: entity osd.6 exists but key does not match ERROR:ceph-disk:Failed to activate DEBUG:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.A99cDp INFO:ceph-disk:Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.A99cDp Traceback (most recent call last): File "/usr/sbin/ceph-disk", line 2994, in main() File "/usr/sbin/ceph-disk", line 2972, in main args.func(args) File "/usr/sbin/ceph-disk", line 2178, in main_activate init=args.mark_init, File "/usr/sbin/ceph-disk", line 1954, in mount_activate (osd_id, cluster) = activate(path, activate_key_template, init) File "/usr/sbin/ceph-disk", line 2153, in activate keyring=keyring, File "/usr/sbin/ceph-disk", line 1756, in auth_key 'mon', 'allow profile osd', File "/usr/sbin/ceph-disk", line 323, in command_check_call return subprocess.check_call(arguments) File "/usr/lib64/python2.6/subprocess.py", line 505, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/bin/ceph', '--cluster', 'ceph', '--name', 'client.bootstrap-osd', '--keyring', '/var/lib/ceph/bootstrap-osd/ceph.keyring', 'auth', 'add', 'osd.6', '-i', '/var/lib/ceph/tmp/mnt.A99cDp/keyring', 'osd', 'allow *', 'mon', 'allow profile osd']' returned non-zero exit status 22 Thanks! /Jesper *** Hi Jesper, The goal of the rc.local is twofold but mainly to ensure the /dev/disk/by-partuuid symlinks exists for the journals. Is it the case ? Cheers On 18/12/2015 19:50, Jesper Thorhauge wrote: > Hi Loic, > > Damn, th
Re: [ceph-users] Journal symlink broken / Ceph 0.94.5 / CentOS 6.7
On 18/12/2015 22:09, Jesper Thorhauge wrote: > Hi Loic, > > Getting closer! > > lrwxrwxrwx 1 root root 10 Dec 18 19:43 1e9d527f-0866-4284-b77c-c1cb04c5a168 > -> ../../sdc4 > lrwxrwxrwx 1 root root 10 Dec 18 19:43 c34d4694-b486-450d-b57f-da24255f0072 > -> ../../sdc3 > lrwxrwxrwx 1 root root 10 Dec 18 19:42 c83b5aa5-fe77-42f6-9415-25ca0266fb7f > -> ../../sdb1 > lrwxrwxrwx 1 root root 10 Dec 18 19:42 e85f4d92-c8f1-4591-bd2a-aa43b80f58f6 > -> ../../sda1 > > So symlinks are now working! Activating an OSD is a different story :-( > > "ceph-disk -vv activate /dev/sda1" gives me; > > INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- /dev/sda1 > INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. > --lookup osd_mount_options_xfs > INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. > --lookup osd_fs_mount_options_xfs > DEBUG:ceph-disk:Mounting /dev/sda1 on /var/lib/ceph/tmp/mnt.A99cDp with > options noatime,inode64 > INFO:ceph-disk:Running command: /bin/mount -t xfs -o noatime,inode64 -- > /dev/sda1 /var/lib/ceph/tmp/mnt.A99cDp > DEBUG:ceph-disk:Cluster uuid is 07b5c90b-6cae-40c0-93b2-31e0ebad7315 > INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph > --show-config-value=fsid > DEBUG:ceph-disk:Cluster name is ceph > DEBUG:ceph-disk:OSD uuid is e85f4d92-c8f1-4591-bd2a-aa43b80f58f6 > DEBUG:ceph-disk:OSD id is 6 > DEBUG:ceph-disk:Initializing OSD... > INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name > client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon > getmap -o /var/lib/ceph/tmp/mnt.A99cDp/activate.monmap > got monmap epoch 6 > INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster ceph --mkfs > --mkkey -i 6 --monmap /var/lib/ceph/tmp/mnt.A99cDp/activate.monmap --osd-data > /var/lib/ceph/tmp/mnt.A99cDp --osd-journal > /var/lib/ceph/tmp/mnt.A99cDp/journal --osd-uuid > e85f4d92-c8f1-4591-bd2a-aa43b80f58f6 --keyring > /var/lib/ceph/tmp/mnt.A99cDp/keyring > HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device > 2015-12-18 21:58:12.489357 7f266d7b0800 -1 journal check: ondisk fsid > ---- doesn't match expected > e85f4d92-c8f1-4591-bd2a-aa43b80f58f6, invalid (someone else's?) journal > HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device > HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device > HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device > 2015-12-18 21:58:12.680566 7f266d7b0800 -1 > filestore(/var/lib/ceph/tmp/mnt.A99cDp) could not find > 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory > 2015-12-18 21:58:12.865810 7f266d7b0800 -1 created object store > /var/lib/ceph/tmp/mnt.A99cDp journal /var/lib/ceph/tmp/mnt.A99cDp/journal for > osd.6 fsid 07b5c90b-6cae-40c0-93b2-31e0ebad7315 > 2015-12-18 21:58:12.865844 7f266d7b0800 -1 auth: error reading file: > /var/lib/ceph/tmp/mnt.A99cDp/keyring: can't open > /var/lib/ceph/tmp/mnt.A99cDp/keyring: (2) No such file or directory > 2015-12-18 21:58:12.865910 7f266d7b0800 -1 created new key in keyring > /var/lib/ceph/tmp/mnt.A99cDp/keyring > INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. > --lookup init > DEBUG:ceph-disk:Marking with init system sysvinit > DEBUG:ceph-disk:Authorizing OSD key... > INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name > client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring auth > add osd.6 -i /var/lib/ceph/tmp/mnt.A99cDp/keyring osd allow * mon allow > profile osd > Error EINVAL: entity osd.6 exists but key does not match > ERROR:ceph-disk:Failed to activate > DEBUG:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.A99cDp > INFO:ceph-disk:Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.A99cDp > Traceback (most recent call last): > File "/usr/sbin/ceph-disk", line 2994, in > main() > File "/usr/sbin/ceph-disk", line 2972, in main > args.func(args) > File "/usr/sbin/ceph-disk", line 2178, in main_activate > init=args.mark_init, > File "/usr/sbin/ceph-disk", line 1954, in mount_activate > (osd_id, cluster) = activate(path, activate_key_template, init) > File "/usr/sbin/ceph-disk", line 2153, in activate > keyring=keyring, > File "/usr/sbin/ceph-disk", line 1756, in auth_key > 'mon', 'allow profile osd', > File "/usr/sbin/ceph-disk", line 323, in command_check_call > return subprocess.check_call(arguments) > File "/usr/lib64/python2.6/subprocess.py", line 505, in check_call > raise CalledProcessError(retcode, cmd) > subprocess.CalledProcessError: Command '['/usr/bin/ceph', '--cluster', > 'ceph', '--name', 'client.bootstrap-osd', '--keyring', > '/var/lib/ceph/bootstrap-osd/ceph.keyring', 'auth', 'add', 'osd.6', '-i', > '/var/lib/ceph/tmp/mnt.A99cDp/keyring', 'osd', 'allow *', 'mon', 'allow > profile osd']' returned non-zero exit status 22 This is a different problem, osd.6
Re: [ceph-users] cephfs, low performances
On 18 December 2015 at 15:48, Don Waterloo wrote: > > > On 17 December 2015 at 21:36, Francois Lafont wrote: > >> Hi, >> >> I have ceph cluster currently unused and I have (to my mind) very low >> performances. >> I'm not an expert in benchs, here an example of quick bench: >> >> --- >> # fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 >> --name=readwrite --filename=rw.data --bs=4k --iodepth=64 --size=300MB >> --readwrite=randrw --rwmixread=50 >> readwrite: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, >> iodepth=64 >> fio-2.1.3 >> >> ... > > I am seeing the same sort of issue. > If i run your 'fio' command sequence on my cephfs, i see ~120 iops. > If i run it on one of the underlying osd (e.g. in /var... on the mount > point of the xfs), i get ~20k iops. > > > If i run: rbd -p mypool create speed-test-image --size 1000 rbd -p mypool bench-write speed-test-image I get bench-write io_size 4096 io_threads 16 bytes 1073741824 pattern seq SEC OPS OPS/SEC BYTES/SEC 1 79053 79070.82 323874082.50 2144340 72178.81 295644410.60 3221975 73997.57 303094057.34 elapsed:10 ops: 262144 ops/sec: 26129.32 bytes/sec: 107025708.32 which is *much* faster than the cephfs. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and ceph infernalis 9.2.0
I ran into a similar problem while in the middle of upgrading from Hammer (0.94.5) to Infernalis (9.2.0). I decided to try rebuilding one of the OSDs by using 'ceph-disk prepare /dev/sdb' and it never comes up: root@b3:~# ceph daemon osd.10 status { "cluster_fsid": "----", "osd_fsid": "----", "whoami": 10, "state": "booting", "oldest_map": 25804, "newest_map": 25904, "num_pgs": 0 } Here's what is written to /var/log/ceph/osd/ceph-osd.10.log: 2015-12-18 16:09:48.928462 7fd5e2bec940 0 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 6866 2015-12-18 16:09:48.931387 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs in /var/lib/ceph/tmp/mnt.IOnlxY 2015-12-18 16:09:48.931417 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs fsid is already set to ---- 2015-12-18 16:09:48.931422 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) write_version_stamp 4 2015-12-18 16:09:48.932671 7fd5e2bec940 0 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) backend xfs (magic 0x58465342) 2015-12-18 16:09:48.934953 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) leveldb db exists/created 2015-12-18 16:09:48.935082 7fd5e2bec940 1 journal _open /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 11: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-12-18 16:09:48.935218 7fd5e2bec940 -1 journal check: ondisk fsid ---- doesn't match expected ----, invalid (someone else's?) journal 2015-12-18 16:09:48.935227 7fd5e2bec940 1 journal close /var/lib/ceph/tmp/mnt.IOnlxY/journal 2015-12-18 16:09:48.935452 7fd5e2bec940 1 journal _open /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 11: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-12-18 16:09:48.935771 7fd5e2bec940 0 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkjournal created journal on /var/lib/ceph/tmp/mnt.IOnlxY/journal 2015-12-18 16:09:48.935803 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs done in /var/lib/ceph/tmp/mnt.IOnlxY 2015-12-18 16:09:48.935919 7fd5e2bec940 0 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) backend xfs (magic 0x58465342) 2015-12-18 16:09:48.936548 7fd5e2bec940 0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-12-18 16:09:48.936559 7fd5e2bec940 0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2015-12-18 16:09:48.936588 7fd5e2bec940 0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: splice is supported 2015-12-18 16:09:48.938319 7fd5e2bec940 0 genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-12-18 16:09:48.938394 7fd5e2bec940 0 xfsfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: extsize is supported and your kernel >= 3.5 2015-12-18 16:09:48.940420 7fd5e2bec940 0 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2015-12-18 16:09:48.940646 7fd5e2bec940 1 journal _open /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 17: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-12-18 16:09:48.940865 7fd5e2bec940 1 journal _open /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 17: 1072693248 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-12-18 16:09:48.941270 7fd5e2bec940 1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) upgrade 2015-12-18 16:09:48.941389 7fd5e2bec940 -1 filestore(/var/lib/ceph/tmp/mnt.IOnlxY) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory 2015-12-18 16:09:48.945392 7fd5e2bec940 1 journal close /var/lib/ceph/tmp/mnt.IOnlxY/journal 2015-12-18 16:09:48.946175 7fd5e2bec940 -1 created object store /var/lib/ceph/tmp/mnt.IOnlxY journal /var/lib/ceph/tmp/mnt.IOnlxY/journal for osd.10 fsid ---- 2015-12-18 16:09:48.946269 7fd5e2bec940 -1 auth: error reading file: /var/lib/ceph/tmp/mnt.IOnlxY/keyring: can't open /var/lib/ceph/tmp/mnt.IOnlxY/keyring: (2) No such file or directory 2015-12-18 16:09:48.946623 7fd5e2bec940 -1 created new key in keyring /var/lib/ceph/tmp/mnt.IOnlxY/keyring 2015-12-18 16:09:50.698753 7fb5db130940 0 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 7045 2015-12-18 16:09:50.745427 7fb5db130940 0 filestore(/var/lib/ceph/osd/ceph-10) backend xfs (magic 0x58465342) 2015-12-18 16:09:50.745978 7fb5db130940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-12-18 16:09:50.745987 7fb5db130940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-10) detect_features
Re: [ceph-users] OSDs stuck in booting state on CentOS 7.2.1511 and ceph infernalis 9.2.0
Bryan, I rebooted another host which wasn't updated to CentOS 7.2 and those OSDs also failed to come out of booting state. I thought I'd restarted each OSD host after upgrading them to infernalis but I must have been mistaken and after running ceph tell osd.* version I saw we were on a mix of v0.94.1, v0.94.2, v0.94.4, and v0.94.5. I've downgraded the two hosts we were having problems with to hammer v0.94.5 and once the cluster is happy again we will try upgrading again. Good luck. Bob On Fri, Dec 18, 2015 at 3:21 PM, Stillwell, Bryan < bryan.stillw...@twcable.com> wrote: > I ran into a similar problem while in the middle of upgrading from Hammer > (0.94.5) to Infernalis (9.2.0). I decided to try rebuilding one of the > OSDs by using 'ceph-disk prepare /dev/sdb' and it never comes up: > > root@b3:~# ceph daemon osd.10 status > { > "cluster_fsid": "----", > "osd_fsid": "----", > "whoami": 10, > "state": "booting", > "oldest_map": 25804, > "newest_map": 25904, > "num_pgs": 0 > } > > Here's what is written to /var/log/ceph/osd/ceph-osd.10.log: > > 2015-12-18 16:09:48.928462 7fd5e2bec940 0 ceph version 9.2.0 > (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 6866 > 2015-12-18 16:09:48.931387 7fd5e2bec940 1 > filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs in /var/lib/ceph/tmp/mnt.IOnlxY > 2015-12-18 16:09:48.931417 7fd5e2bec940 1 > filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs fsid is already set to > ---- > 2015-12-18 16:09:48.931422 7fd5e2bec940 1 > filestore(/var/lib/ceph/tmp/mnt.IOnlxY) write_version_stamp 4 > 2015-12-18 16:09:48.932671 7fd5e2bec940 0 > filestore(/var/lib/ceph/tmp/mnt.IOnlxY) backend xfs (magic 0x58465342) > 2015-12-18 16:09:48.934953 7fd5e2bec940 1 > filestore(/var/lib/ceph/tmp/mnt.IOnlxY) leveldb db exists/created > 2015-12-18 16:09:48.935082 7fd5e2bec940 1 journal _open > /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 11: 1072693248 bytes, block size > 4096 bytes, directio = 1, aio = 1 > 2015-12-18 16:09:48.935218 7fd5e2bec940 -1 journal check: ondisk fsid > ---- doesn't match > expected ----, invalid (someone else's?) > journal > 2015-12-18 16:09:48.935227 7fd5e2bec940 1 journal close > /var/lib/ceph/tmp/mnt.IOnlxY/journal > 2015-12-18 16:09:48.935452 7fd5e2bec940 1 journal _open > /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 11: 1072693248 bytes, block size > 4096 bytes, directio = 1, aio = 1 > 2015-12-18 16:09:48.935771 7fd5e2bec940 0 > filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkjournal created journal on > /var/lib/ceph/tmp/mnt.IOnlxY/journal > 2015-12-18 16:09:48.935803 7fd5e2bec940 1 > filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mkfs done in > /var/lib/ceph/tmp/mnt.IOnlxY > 2015-12-18 16:09:48.935919 7fd5e2bec940 0 > filestore(/var/lib/ceph/tmp/mnt.IOnlxY) backend xfs (magic 0x58465342) > 2015-12-18 16:09:48.936548 7fd5e2bec940 0 > genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: > FIEMAP ioctl is disabled via 'filestore fiemap' config option > 2015-12-18 16:09:48.936559 7fd5e2bec940 0 > genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: > SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option > 2015-12-18 16:09:48.936588 7fd5e2bec940 0 > genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: > splice is supported > 2015-12-18 16:09:48.938319 7fd5e2bec940 0 > genericfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: > syncfs(2) syscall fully supported (by glibc and kernel) > 2015-12-18 16:09:48.938394 7fd5e2bec940 0 > xfsfilestorebackend(/var/lib/ceph/tmp/mnt.IOnlxY) detect_features: extsize > is supported and your kernel >= 3.5 > 2015-12-18 16:09:48.940420 7fd5e2bec940 0 > filestore(/var/lib/ceph/tmp/mnt.IOnlxY) mount: enabling WRITEAHEAD journal > mode: checkpoint is not enabled > 2015-12-18 16:09:48.940646 7fd5e2bec940 1 journal _open > /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 17: 1072693248 bytes, block size > 4096 bytes, directio = 1, aio = 1 > 2015-12-18 16:09:48.940865 7fd5e2bec940 1 journal _open > /var/lib/ceph/tmp/mnt.IOnlxY/journal fd 17: 1072693248 bytes, block size > 4096 bytes, directio = 1, aio = 1 > 2015-12-18 16:09:48.941270 7fd5e2bec940 1 > filestore(/var/lib/ceph/tmp/mnt.IOnlxY) upgrade > 2015-12-18 16:09:48.941389 7fd5e2bec940 -1 > filestore(/var/lib/ceph/tmp/mnt.IOnlxY) could not find > -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory > 2015-12-18 16:09:48.945392 7fd5e2bec940 1 journal close > /var/lib/ceph/tmp/mnt.IOnlxY/journal > 2015-12-18 16:09:48.946175 7fd5e2bec940 -1 created object store > /var/lib/ceph/tmp/mnt.IOnlxY journal /var/lib/ceph/tmp/mnt.IOnlxY/journal > for osd.10 fsid ---- > 2015-12-18 16:09:48.946269 7fd5e2bec940 -1 auth: error reading file: > /var/lib/ceph/tmp/mnt.IOnlxY/keyring: can't open