Re: [ceph-users] radosgw: scrub causing slow requests in the md log
I'm now running the three relevant OSDs with that patch. (Recompiled, replaced /usr/lib64/rados-classes/libcls_log.so with the new version, then restarted the osds). It's working quite well, trimming 10 entries at a time instead of 1000, and no more timeouts. Do you think it would be worth decreasing this hardcoded value in ceph proper? -- Dan On Wed, Jun 21, 2017 at 3:51 PM, Casey Bodley wrote: > That patch looks reasonable. You could also try raising the values of > osd_op_thread_suicide_timeout and filestore_op_thread_suicide_timeout on > that osd in order to trim more at a time. > > > On 06/21/2017 09:27 AM, Dan van der Ster wrote: >> >> Hi Casey, >> >> I managed to trim up all shards except for that big #54. The others >> all trimmed within a few seconds. >> >> But 54 is proving difficult. It's still going after several days, and >> now I see that the 1000-key trim is indeed causing osd timeouts. I've >> manually compacted the relevant osd leveldbs, but haven't found any >> way to speed up the trimming. It's now going at ~1-2Hz, so 1000 trims >> per op locks things up for quite awhile. >> >> I'm thinking of running those ceph-osd's with this patch: >> >> # git diff >> diff --git a/src/cls/log/cls_log.cc b/src/cls/log/cls_log.cc >> index 89745bb..7dcd933 100644 >> --- a/src/cls/log/cls_log.cc >> +++ b/src/cls/log/cls_log.cc >> @@ -254,7 +254,7 @@ static int cls_log_trim(cls_method_context_t hctx, >> bufferlist *in, bufferlist *o >> to_index = op.to_marker; >> } >> >> -#define MAX_TRIM_ENTRIES 1000 >> +#define MAX_TRIM_ENTRIES 10 >> size_t max_entries = MAX_TRIM_ENTRIES; >> >> int rc = cls_cxx_map_get_vals(hctx, from_index, log_index_prefix, >> max_entries, &keys); >> >> >> What do you think? >> >> -- Dan >> >> >> >> >> On Mon, Jun 19, 2017 at 5:32 PM, Casey Bodley wrote: >>> >>> Hi Dan, >>> >>> That's good news that it can remove 1000 keys at a time without hitting >>> timeouts. The output of 'du' will depend on when the leveldb compaction >>> runs. If you do find that compaction leads to suicide timeouts on this >>> osd >>> (you would see a lot of 'leveldb:' output in the log), consider running >>> offline compaction by adding 'leveldb compact on mount = true' to the osd >>> config and restarting. >>> >>> Casey >>> >>> >>> On 06/19/2017 11:01 AM, Dan van der Ster wrote: On Thu, Jun 15, 2017 at 7:56 PM, Casey Bodley wrote: > > On 06/14/2017 05:59 AM, Dan van der Ster wrote: >> >> Dear ceph users, >> >> Today we had O(100) slow requests which were caused by deep-scrubbing >> of the metadata log: >> >> 2017-06-14 11:07:55.373184 osd.155 >> [2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d >> deep-scrub starts >> ... >> 2017-06-14 11:22:04.143903 osd.155 >> [2001:1458:301:24::100:d]:6837/3817268 8276 : cluster [WRN] slow >> request 480.140904 seconds old, received at 2017-06-14 >> 11:14:04.002913: osd_op(client.3192010.0:11872455 24.be8b305d >> meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54 [call log.add] snapc >> 0=[] ondisk+write+known_if_redirected e7752) currently waiting for >> scrub >> ... >> 2017-06-14 11:22:06.729306 osd.155 >> [2001:1458:301:24::100:d]:6837/3817268 8277 : cluster [INF] 24.1d >> deep-scrub ok >> >> We have log_meta: true, log_data: false on this (our only) region [1], >> which IIRC we setup to enable indexless buckets. >> >> I'm obviously unfamiliar with rgw meta and data logging, and have a >> few questions: >> >> 1. AFAIU, it is used by the rgw multisite feature. Is it safe to >> turn >> it off when not using multisite? > > > It's a good idea to turn that off, yes. > > First, make sure that you have configured a default > realm/zonegroup/zone: > > $ radosgw-admin realm default --rgw-realm (you can > determine > realm name from 'radosgw-admin realm list') > $ radosgw-admin zonegroup default --rgw-zonegroup default > $ radosgw-admin zone default --rgw-zone default > Thanks. This had already been done, as confirmed with radosgw-admin realm get-default. > Then you can modify the zonegroup (aka region): > > $ radosgw-admin zonegroup get > zonegroup.json > $ sed -i 's/log_meta": "true/log_meta":"false/' zonegroup.json > $ radosgw-admin zonegroup set < zonegroup.json > > Then commit the updated period configuration: > > $ radosgw-admin period update --commit > > Verify that the resulting period contains "log_meta": "false". Take > care > with future radosgw-admin commands on the zone/zonegroup, as they may > revert > log_meta back to true [1]. > Great, this worked. FYI (and for others trying this in future), the period update --commit blocks all rgws for ~30s while they reload the realm. >> 2. I started dumping the output of radosgw-adm
Re: [ceph-users] Transitioning to Intel P4600 from P3700 Journals
> Keep in mind that 1.6TB P4600 is going to last about as long as your 400GB > P3700, so if wear-out is a concern, don't put more stress on them. > I've been looking at the 2T ones, but it's about the same as the 400G P3700 > Also the P4600 is only slightly faster in writes than the P3700, so that's > where putting more workload onto them is going to be a notable issue. The latency is somewhat worse than the P3700. When you're talking journal device latency will be more important than bandwidth, specially on small and/or sync writes. > >> I've seen some talk on here regarding this, but wanted to throw an idea >> around. I was okay throwing away 280GB of fast capacity for the purpose of >> providing reliable journals. But with as much free capacity as we'd have >> with a 4600, maybe I could use that extra capacity as a cache tier for >> writes on an rbd ec pool. If I wanted to go that route, I'd probably >> replace several existing 3700s with 4600s to get additional cache capacity. >> But, that sounds risky... >> > Risky as in high failure domain concentration and as mentioned above a > cache-tier with obvious inline journals and thus twice the bandwidth needs > will likely eat into the write speed capacity of the journals. I tend to agree. Also the cache tier only starts to be interesting if it's big enough overall... If you have to keep promoting/demoting because it's full it'll kill the whole cluster very quickly. > > If (and seems to be a big IF) you can find them, the Samsung PM1725a 1.6TB > seems to be a) cheaper and b) at 2GB/s write speed more likely to be > suitable for double duty. > Similar (slightly better on paper) endurance than then P4600, so keep that > in mind, too. As I'm more than happy for the 400G size, and given the price of the P4600 2T, for slightly more (10%) I'm considering the P4800X. This is for a full SSD cluster. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw: scrub causing slow requests in the md log
On Wed, Jun 21, 2017 at 4:16 PM, Peter Maloney wrote: > On 06/14/17 11:59, Dan van der Ster wrote: >> Dear ceph users, >> >> Today we had O(100) slow requests which were caused by deep-scrubbing >> of the metadata log: >> >> 2017-06-14 11:07:55.373184 osd.155 >> [2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d >> deep-scrub starts >> ... >> 2017-06-14 11:22:04.143903 osd.155 >> [2001:1458:301:24::100:d]:6837/3817268 8276 : cluster [WRN] slow >> request 480.140904 seconds old, received at 2017-06-14 >> 11:14:04.002913: osd_op(client.3192010.0:11872455 24.be8b305d >> meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54 [call log.add] snapc >> 0=[] ondisk+write+known_if_redirected e7752) currently waiting for >> scrub >> ... >> 2017-06-14 11:22:06.729306 osd.155 >> [2001:1458:301:24::100:d]:6837/3817268 8277 : cluster [INF] 24.1d >> deep-scrub ok > > This looks just like my problem in my thread on ceph-devel "another > scrub bug? blocked for > 10240.948831 secs" except that your scrub > eventually finished (mine ran hours before I stopped it manually), and > I'm not using rgw. > > Sage commented that it is likely related to snaps being removed at some > point and interacting with scrub. > > Restarting the osd that is mentioned there (osd.155 in your case) will > fix it for now. And tuning scrub changes the way it behaves (defaults > make it happen more rarely than what I had before). In my case it's not related to snaps -- there are no snaps (or trimming) in a (normal) set of rgw pools. My problem is about the cls_log class, which tries to do a lot of work in one op, timing out the osds. Well, the *real* problem in my case is about this rgw mdlog, which can grow unboundedly, then eventually become un-scrubbable, leading to this huge amount of cleanup to be done. -- dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 答复: Can't start ceph-mon through systemctl start ceph-mon@.service after upgrading from Hammer to Jewel
I set mon_data to “/home/ceph/software/ceph/var/lib/ceph/mon”, and its owner has always been “ceph” since we were running Hammer. And I also tried to set the permission to “777”, it also didn’t work. 发件人: Linh Vu [mailto:v...@unimelb.edu.au] 发送时间: 2017年6月22日 14:26 收件人: 许雪寒; ceph-users@lists.ceph.com 主题: Re: [ceph-users] Can't start ceph-mon through systemctl start ceph-mon@.service after upgrading from Hammer to Jewel Permissions of your mon data directory under /var/lib/ceph/mon/ might have changed as part of Hammer -> Jewel upgrade. Have you had a look there? From: ceph-users on behalf of 许雪寒 Sent: Thursday, 22 June 2017 3:32:45 PM To: ceph-users@lists.ceph.com Subject: [ceph-users] Can't start ceph-mon through systemctl start ceph-mon@.service after upgrading from Hammer to Jewel Hi, everyone. I upgraded one of our ceph clusters from Hammer to Jewel. After upgrading, I can’t start ceph-mon through “systemctl start ceph-mon@ceph1”, while, on the other hand, I can start ceph-mon, either as user ceph or root, if I directly call “/usr/bin/ceph-mon –cluster ceph –id ceph1 –setuser ceph –setgroup ceph”. I looked “/var/log/messages”, and find that the reason systemctl can’t start ceph-mon is that ceph-mon can’t access its configured data directory. Why ceph-mon can’t access its data directory when its called by systemctl? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Transitioning to Intel P4600 from P3700 Journals
Hi, One of the benefits of PCIe NVMe is that it does not take a disk slot, resulting in a higher density. For example a 6048R-E1CR36N with 3x PCIe NVMe yields 36 OSDs per servers (12 OSD per NVMe) where it yields 30 OSDs per server if using SATA SSDs (6 OSDs per SSD). Since you say that you used 10% of P3700 endurance in 1 year (7.3PB endurance, so 0.73PB/year), so a 400GB P3600 would work for 3 years. Maybe good enough until BlueStore is more stable. Cheers, Maxime On Thu, 22 Jun 2017 at 03:59 Christian Balzer wrote: > > Hello, > > Hmm, gmail client not grokking quoting these days? > > On Wed, 21 Jun 2017 20:40:48 -0500 Brady Deetz wrote: > > > On Jun 21, 2017 8:15 PM, "Christian Balzer" wrote: > > > > On Wed, 21 Jun 2017 19:44:08 -0500 Brady Deetz wrote: > > > > > Hello, > > > I'm expanding my 288 OSD, primarily cephfs, cluster by about 16%. I > have > > 12 > > > osd nodes with 24 osds each. Each osd node has 2 P3700 400GB NVMe PCIe > > > drives providing 10GB journals for groups of 12 6TB spinning rust > drives > > > and 2x lacp 40gbps ethernet. > > > > > > Our hardware provider is recommending that we start deploying P4600 > drives > > > in place of our P3700s due to availability. > > > > > Welcome to the club and make sure to express your displeasure about > > Intel's "strategy" to your vendor. > > > > The P4600s are a poor replacement for P3700s and also still just > > "announced" according to ARK. > > > > Are you happy with your current NVMes? > > Firstly as in, what is their wearout, are you expecting them to easily > > survive 5 years at the current rate? > > Secondly, how about speed? with 12 HDDs and 1GB/s write capacity of the > > NVMe I'd expect them to not be a bottleneck in nearly all real life > > situations. > > > > Keep in mind that 1.6TB P4600 is going to last about as long as your > 400GB > > P3700, so if wear-out is a concern, don't put more stress on them. > > > > > > Oddly enough, the Intel tools are telling me that we've only used about > 10% > > of each drive's endurance over the past year. This honestly surprises me > > due to our workload, but maybe I'm thinking my researchers are doing more > > science than they actually are. > > > That's pretty impressive still, but also lets you do numbers as to what > kind of additional load you _may_ be able to consider, obviously not more > than twice the current amount to stay within 5 years before wearing > them out. > > > > > > Also the P4600 is only slightly faster in writes than the P3700, so > that's > > where putting more workload onto them is going to be a notable issue. > > > > > I've seen some talk on here regarding this, but wanted to throw an idea > > > around. I was okay throwing away 280GB of fast capacity for the > purpose of > > > providing reliable journals. But with as much free capacity as we'd > have > > > with a 4600, maybe I could use that extra capacity as a cache tier for > > > writes on an rbd ec pool. If I wanted to go that route, I'd probably > > > replace several existing 3700s with 4600s to get additional cache > > capacity. > > > But, that sounds risky... > > > > > Risky as in high failure domain concentration and as mentioned above a > > cache-tier with obvious inline journals and thus twice the bandwidth > needs > > will likely eat into the write speed capacity of the journals. > > > > > > Agreed. On the topic of journals and double bandwidth, am I correct in > > thinking that btrfs (as insane as it may be) does not require double > > bandwidth like xfs? Furthermore with bluestore being close to stable, > will > > my architecture need to change? > > > BTRFS at this point is indeed a bit insane, given the current levels of > support, issues (search the ML archives) and future developments. > And you'll still wind up with double writes most likely, IIRC. > > These aspects of Bluestore have been discussed here recently, too. > Your SSD/NVMe space requirements will go down, but if you want to have the > same speeds and more importantly low latencies you'll wind up with all > writes going through them again, so endurance wise you're still in that > "Lets make SSDs great again" hellhole. > > > > > If (and seems to be a big IF) you can find them, the Samsung PM1725a > 1.6TB > > seems to be a) cheaper and b) at 2GB/s write speed more likely to be > > suitable for double duty. > > Similar (slightly better on paper) endurance than then P4600, so keep > that > > in mind, too. > > > > > > My vendor is an HPC vendor so /maybe/ they have access to these elusive > > creatures. In which case, how many do you want? Haha > > > I was just looking at availability with a few google searches, our current > needs are amply satisfied with S37xx SSDs, no need for NVMes really. > But as things are going, maybe I'll be forced to Optane and friends simply > by lack of alternatives. > > Christian > -- > Christian BalzerNetwork/Systems Engineer > ch...@gol.com Rakuten Communications > _
Re: [ceph-users] Transitioning to Intel P4600 from P3700 Journals
On 22-6-2017 03:59, Christian Balzer wrote: >> Agreed. On the topic of journals and double bandwidth, am I correct in >> thinking that btrfs (as insane as it may be) does not require double >> bandwidth like xfs? Furthermore with bluestore being close to stable, will >> my architecture need to change? >> > BTRFS at this point is indeed a bit insane, given the current levels of > support, issues (search the ML archives) and future developments. > And you'll still wind up with double writes most likely, IIRC. > > These aspects of Bluestore have been discussed here recently, too. > Your SSD/NVMe space requirements will go down, but if you want to have the > same speeds and more importantly low latencies you'll wind up with all > writes going through them again, so endurance wise you're still in that > "Lets make SSDs great again" hellhole. Please note that I know little about btrfs, but its sister ZFS can include caching/log devices transparent in its architecture. And even better, they are allowed to fail without much problems. :) Now the problem I have is that first Ceph journals the writes to its log, then hands the write over to ZFS, where its gets logged again. So that are 2 writes, (and in the case of ZFS, they only get read iff the filesystems had a crash) Bad thing about ZFS is that the journal log need not be very big: about 5 sec of max required diskwrites. I have 'm a 1Gb and they never filled up yet. But the used bandwidth is going to doubled due to double the amount of writes. If logging of btrfs is anything like this, then you have to look at how you architecture the filesystems/devices underlying Ceph. --WjW ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Does CephFS support SELinux?
Hi, Does CephFS support SELinux? I have this issue with OpenShift (with SELinux) + CephFS: http://lists.openshift.redhat.com/openshift-archives/users/2017-June/msg00116.html Best regards, Stéphane -- Stéphane Klein blog: http://stephane-klein.info cv : http://cv.stephane-klein.info Twitter: http://twitter.com/klein_stephane ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Does CephFS support SELinux?
On Thu, Jun 22, 2017 at 10:25 AM, Stéphane Klein wrote: > Hi, > > Does CephFS support SELinux? > > I have this issue with OpenShift (with SELinux) + CephFS: > http://lists.openshift.redhat.com/openshift-archives/users/2017-June/msg00116.html We do test running CephFS server and client bits on machines where selinux is enabled, but we don't test doing selinux stuff inside the filesystem (setting labels etc). As far as I know, the comments in http://tracker.ceph.com/issues/13231 are still relevant. John > Best regards, > Stéphane > -- > Stéphane Klein > blog: http://stephane-klein.info > cv : http://cv.stephane-klein.info > Twitter: http://twitter.com/klein_stephane > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Does CephFS support SELinux?
2017-06-22 11:48 GMT+02:00 John Spray : > On Thu, Jun 22, 2017 at 10:25 AM, Stéphane Klein > wrote: > > Hi, > > > > Does CephFS support SELinux? > > > > I have this issue with OpenShift (with SELinux) + CephFS: > > http://lists.openshift.redhat.com/openshift-archives/users/ > 2017-June/msg00116.html > > We do test running CephFS server and client bits on machines where > selinux is enabled, but we don't test doing selinux stuff inside the > filesystem (setting labels etc). As far as I know, the comments in > http://tracker.ceph.com/issues/13231 are still relevant. > > # mount -t ceph ceph-test-1:6789:/ /mnt/mycephfs -o name=admin,secretfile=/etc/ceph/admin.secret # touch /mnt/mycephfs/foo # ls /mnt/mycephfs/ -lZ -rw-r--r-- root root ?foo # chcon system_u:object_r:admin_home_t:s0 /mnt/mycephfs/foo chcon: failed to change context of ‘/mnt/mycephfs/foo’ to ‘system_u:object_r:admin_home_t:s0’: Operation not supported Then SELinux isn't supported with CephFS volume :( ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] FW: radosgw: stale/leaked bucket index entries
Looks like I’ve now got a consistent repro scenario, please find the gory details here http://tracker.ceph.com/issues/20380 Thanks! On 20/06/17, 2:04 PM, "Pavan Rallabhandi" wrote: Hi Orit, No, we do not use multi-site. Thanks, -Pavan. From: Orit Wasserman Date: Tuesday, 20 June 2017 at 12:49 PM To: Pavan Rallabhandi Cc: "ceph-users@lists.ceph.com" Subject: EXT: Re: [ceph-users] FW: radosgw: stale/leaked bucket index entries Hi Pavan, On Tue, Jun 20, 2017 at 8:29 AM, Pavan Rallabhandi wrote: Trying one more time with ceph-users On 19/06/17, 11:07 PM, "Pavan Rallabhandi" wrote: On many of our clusters running Jewel (10.2.5+), am running into a strange problem of having stale bucket index entries left over for (some of the) objects deleted. Though it is not reproducible at will, it has been pretty consistent of late and am clueless at this point for the possible reasons to happen so. The symptoms are that the actual delete operation of an object is reported successful in the RGW logs, but a bucket list on the container would still show the deleted object. An attempt to download/stat of the object appropriately results in a failure. No failures are seen in the respective OSDs where the bucket index object is located. And rebuilding the bucket index by running ‘radosgw-admin bucket check –fix’ would fix the issue. Though I could simulate the problem by instrumenting the code, to not to have invoked `complete_del` on the bucket index op https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L8793, but that call is always seem to be made unless there is a cascading error from the actual delete operation of the object, which doesn’t seem to be the case here. I wanted to know the possible reasons where the bucket index would be left in such limbo, any pointers would be much appreciated. FWIW, we are not sharding the buckets and very recently I’ve seen this happen with buckets having as low as < 10 objects, and we are using swift for all the operations. Do you use multisite? Regards, Orit Thanks, -Pavan. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] VMware + CEPH Integration
> -Original Message- > From: Adrian Saul [mailto:adrian.s...@tpgtelecom.com.au] > Sent: 19 June 2017 06:54 > To: n...@fisk.me.uk; 'Alex Gorbachev' > Cc: 'ceph-users' > Subject: RE: [ceph-users] VMware + CEPH Integration > > > Hi Alex, > > > > Have you experienced any problems with timeouts in the monitor action > > in pacemaker? Although largely stable, every now and again in our > > cluster the FS and Exportfs resources timeout in pacemaker. There's no > > mention of any slow requests or any peering..etc from the ceph logs so it's > a bit of a mystery. > > Yes - we have that in our setup which is very similar. Usually I find it > related > to RBD device latency due to scrubbing or similar but even when tuning > some of that down we still get it randomly. > > The most annoying part is that once it comes up, having to use "resource > cleanup" to try and remove the failed usually has more impact than the > actual error. Are you using Stonith? Pacemaker should be able to recover from any sort of failure as long as it can bring the cluster into a known state. I'm still struggling to get to the bottom of it in our environment. When it happens, every RBD on the same client host seems to hang, but all other hosts are fine. This seems to suggest it's not a Ceph cluster issue/performance, as this would affect the majority of RBD's and not just ones on a single client. > Confidentiality: This email and any attachments are confidential and may be > subject to copyright, legal or some other professional privilege. They are > intended solely for the attention and use of the named addressee(s). They > may only be copied, distributed or disclosed with the consent of the > copyright owner. If you have received this email by mistake or by breach of > the confidentiality clause, please notify the sender immediately by return > email and delete or destroy all copies of the email. Any confidentiality, > privilege or copyright is not waived or lost because this email has been sent > to you by mistake. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Transitioning to Intel P4600 from P3700 Journals
Cristian and everyone else have expertly responded to the SSD capabilities, pros, and cons so I'll ignore that. I believe you were saying that it was risky to swap out your existing journals to a new journal device. That is actually a very simple operation that can be scripted to only take minutes per node with no risk to data. You just stop the osd, flush the journal, delete the old journal partition, create the new partition with the same guid, initialize the journal, and start the osd. On Wed, Jun 21, 2017, 8:44 PM Brady Deetz wrote: > Hello, > I'm expanding my 288 OSD, primarily cephfs, cluster by about 16%. I have > 12 osd nodes with 24 osds each. Each osd node has 2 P3700 400GB NVMe PCIe > drives providing 10GB journals for groups of 12 6TB spinning rust drives > and 2x lacp 40gbps ethernet. > > Our hardware provider is recommending that we start deploying P4600 drives > in place of our P3700s due to availability. > > I've seen some talk on here regarding this, but wanted to throw an idea > around. I was okay throwing away 280GB of fast capacity for the purpose of > providing reliable journals. But with as much free capacity as we'd have > with a 4600, maybe I could use that extra capacity as a cache tier for > writes on an rbd ec pool. If I wanted to go that route, I'd probably > replace several existing 3700s with 4600s to get additional cache capacity. > But, that sounds risky... > > What do you guys think? > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD OSD's Dual Use
I wouldn't see this as problematic at all. As long as you're watching the disk utilizations and durability, those are the only factors that would eventually tell you that they are busy enough. On Thu, Jun 22, 2017, 1:36 AM Ashley Merrick wrote: > Hello, > > > Currently have a pool of SSD's running as a Cache in front of a EC Pool. > > > The cache is very under used and the SSD's spend most time idle, would > like to create a small SSD Pool for a selection of very small RBD disk's as > scratch disks within the OS, should I expect any issues running the two > pool's (Cache + RBD Data) on the same set of SSD's? > > > ,Ashley > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 答复: Can't start ceph-mon through systemctl start ceph-mon@.service after upgrading from Hammer to Jewel
Did you previously edit the init scripts to look in your custom location? Those could have been overwritten. As was mentioned, Jewel changed what user the daemon runs as, but you said that you tested running the daemon manually under the ceph user? Was this without sudo? It used to run as root under Hammer and would have needed to be chown'd recursively to allow the ceph user to run it. On Thu, Jun 22, 2017, 4:39 AM 许雪寒 wrote: > I set mon_data to “/home/ceph/software/ceph/var/lib/ceph/mon”, and its > owner has always been “ceph” since we were running Hammer. > And I also tried to set the permission to “777”, it also didn’t work. > > > 发件人: Linh Vu [mailto:v...@unimelb.edu.au] > 发送时间: 2017年6月22日 14:26 > 收件人: 许雪寒; ceph-users@lists.ceph.com > 主题: Re: [ceph-users] Can't start ceph-mon through systemctl start > ceph-mon@.service > after upgrading from Hammer to Jewel > > Permissions of your mon data directory under /var/lib/ceph/mon/ might have > changed as part of Hammer -> Jewel upgrade. Have you had a look there? > > From: ceph-users on behalf of 许雪寒 < > xuxue...@360.cn> > Sent: Thursday, 22 June 2017 3:32:45 PM > To: ceph-users@lists.ceph.com > Subject: [ceph-users] Can't start ceph-mon through systemctl start > ceph-mon@.service after upgrading from Hammer to Jewel > > Hi, everyone. > > I upgraded one of our ceph clusters from Hammer to Jewel. After upgrading, > I can’t start ceph-mon through “systemctl start ceph-mon@ceph1”, while, > on the other hand, I can start ceph-mon, either as user ceph or root, if I > directly call “/usr/bin/ceph-mon –cluster ceph –id ceph1 –setuser ceph > –setgroup ceph”. I looked “/var/log/messages”, and find that the reason > systemctl can’t start ceph-mon is that ceph-mon can’t access its configured > data directory. Why ceph-mon can’t access its data directory when its > called by systemctl? > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Config parameters for system tuning
Looking at the sources, the config values were in Hammer but not Jewel. for jounral config i recommend journal_queue_max_ops journal_queue_max_bytes be removed from the docs: http://docs.ceph.com/docs/master/rados/configuration/journal-ref/ Also for the added filestore throttling params: filestore_queue_max_delay_multiple filestore_queue_high_delay_multiple filestore_queue_low_threshhold filestore_queue_high_threshhold again it will be good to update the docs: http://docs.ceph.com/docs/master/rados/configuration/filestore-config-ref/ I guess all eyes are on Bluestore now :) Maged Mokhtar PetaSAN -- From: "Maged Mokhtar" Sent: Wednesday, June 21, 2017 12:33 AM To: Subject: [ceph-users] Config parameters for system tuning Hi, 1) I am trying to set some of the following config values which seems to be present in most config examples relating to performance tuning: journal_queue_max_ops journal_queue_max_bytes filestore_queue_committing_max_bytes filestore_queue_committing_max_ops I am using 10.2.7 but not able to set these parameters either via conf file or injections, also ceph --show-config does not list them. Have they been deprecated and should be ignored ? 2) For osd_op_threads i have seen some examples (not the official docs) fixing this to the number of cpu cores, is this the best recommendation or can could we use more threads than cores ? Cheers Maged Mokhtar PetaSAN ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw: scrub causing slow requests in the md log
On 06/22/2017 04:00 AM, Dan van der Ster wrote: I'm now running the three relevant OSDs with that patch. (Recompiled, replaced /usr/lib64/rados-classes/libcls_log.so with the new version, then restarted the osds). It's working quite well, trimming 10 entries at a time instead of 1000, and no more timeouts. Do you think it would be worth decreasing this hardcoded value in ceph proper? -- Dan I do, yeah. At least, the trim operation should be able to pass in its own value for that. I opened a ticket for that at http://tracker.ceph.com/issues/20382. I'd also like to investigate using the ObjectStore's OP_OMAP_RMKEYRANGE operation to trim a range of keys in a single osd op, instead of generating a different op for each key. I have a PR that does this at https://github.com/ceph/ceph/pull/15183. But it's still hard to guarantee that leveldb can process the entire range inside of the suicide timeout. Casey On Wed, Jun 21, 2017 at 3:51 PM, Casey Bodley wrote: That patch looks reasonable. You could also try raising the values of osd_op_thread_suicide_timeout and filestore_op_thread_suicide_timeout on that osd in order to trim more at a time. On 06/21/2017 09:27 AM, Dan van der Ster wrote: Hi Casey, I managed to trim up all shards except for that big #54. The others all trimmed within a few seconds. But 54 is proving difficult. It's still going after several days, and now I see that the 1000-key trim is indeed causing osd timeouts. I've manually compacted the relevant osd leveldbs, but haven't found any way to speed up the trimming. It's now going at ~1-2Hz, so 1000 trims per op locks things up for quite awhile. I'm thinking of running those ceph-osd's with this patch: # git diff diff --git a/src/cls/log/cls_log.cc b/src/cls/log/cls_log.cc index 89745bb..7dcd933 100644 --- a/src/cls/log/cls_log.cc +++ b/src/cls/log/cls_log.cc @@ -254,7 +254,7 @@ static int cls_log_trim(cls_method_context_t hctx, bufferlist *in, bufferlist *o to_index = op.to_marker; } -#define MAX_TRIM_ENTRIES 1000 +#define MAX_TRIM_ENTRIES 10 size_t max_entries = MAX_TRIM_ENTRIES; int rc = cls_cxx_map_get_vals(hctx, from_index, log_index_prefix, max_entries, &keys); What do you think? -- Dan On Mon, Jun 19, 2017 at 5:32 PM, Casey Bodley wrote: Hi Dan, That's good news that it can remove 1000 keys at a time without hitting timeouts. The output of 'du' will depend on when the leveldb compaction runs. If you do find that compaction leads to suicide timeouts on this osd (you would see a lot of 'leveldb:' output in the log), consider running offline compaction by adding 'leveldb compact on mount = true' to the osd config and restarting. Casey On 06/19/2017 11:01 AM, Dan van der Ster wrote: On Thu, Jun 15, 2017 at 7:56 PM, Casey Bodley wrote: On 06/14/2017 05:59 AM, Dan van der Ster wrote: Dear ceph users, Today we had O(100) slow requests which were caused by deep-scrubbing of the metadata log: 2017-06-14 11:07:55.373184 osd.155 [2001:1458:301:24::100:d]:6837/3817268 7387 : cluster [INF] 24.1d deep-scrub starts ... 2017-06-14 11:22:04.143903 osd.155 [2001:1458:301:24::100:d]:6837/3817268 8276 : cluster [WRN] slow request 480.140904 seconds old, received at 2017-06-14 11:14:04.002913: osd_op(client.3192010.0:11872455 24.be8b305d meta.log.8d4fcb63-c314-4f9a-b3b3-0e61719ec258.54 [call log.add] snapc 0=[] ondisk+write+known_if_redirected e7752) currently waiting for scrub ... 2017-06-14 11:22:06.729306 osd.155 [2001:1458:301:24::100:d]:6837/3817268 8277 : cluster [INF] 24.1d deep-scrub ok We have log_meta: true, log_data: false on this (our only) region [1], which IIRC we setup to enable indexless buckets. I'm obviously unfamiliar with rgw meta and data logging, and have a few questions: 1. AFAIU, it is used by the rgw multisite feature. Is it safe to turn it off when not using multisite? It's a good idea to turn that off, yes. First, make sure that you have configured a default realm/zonegroup/zone: $ radosgw-admin realm default --rgw-realm (you can determine realm name from 'radosgw-admin realm list') $ radosgw-admin zonegroup default --rgw-zonegroup default $ radosgw-admin zone default --rgw-zone default Thanks. This had already been done, as confirmed with radosgw-admin realm get-default. Then you can modify the zonegroup (aka region): $ radosgw-admin zonegroup get > zonegroup.json $ sed -i 's/log_meta": "true/log_meta":"false/' zonegroup.json $ radosgw-admin zonegroup set < zonegroup.json Then commit the updated period configuration: $ radosgw-admin period update --commit Verify that the resulting period contains "log_meta": "false". Take care with future radosgw-admin commands on the zone/zonegroup, as they may revert log_meta back to true [1]. Great, this worked. FYI (and for others trying this in future), the period update --commit blocks all rgws for ~30s while they reload the realm. 2. I started dumping the output of radosgw-admin mdlog list,
Re: [ceph-users] radosgw: scrub causing slow requests in the md log
On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley wrote: > > On 06/22/2017 04:00 AM, Dan van der Ster wrote: >> >> I'm now running the three relevant OSDs with that patch. (Recompiled, >> replaced /usr/lib64/rados-classes/libcls_log.so with the new version, >> then restarted the osds). >> >> It's working quite well, trimming 10 entries at a time instead of >> 1000, and no more timeouts. >> >> Do you think it would be worth decreasing this hardcoded value in ceph >> proper? >> >> -- Dan > > > I do, yeah. At least, the trim operation should be able to pass in its own > value for that. I opened a ticket for that at > http://tracker.ceph.com/issues/20382. > > I'd also like to investigate using the ObjectStore's OP_OMAP_RMKEYRANGE > operation to trim a range of keys in a single osd op, instead of generating > a different op for each key. I have a PR that does this at > https://github.com/ceph/ceph/pull/15183. But it's still hard to guarantee > that leveldb can process the entire range inside of the suicide timeout. I wonder if that would help. Here's what I've learned today: * two of the 3 relevant OSDs have something screwy with their leveldb. The primary and 3rd replica are ~quick at trimming for only a few hundred keys, whilst the 2nd OSD is very very fast always. * After manually compacting the two slow OSDs, they are fast again for just a few hundred trims. So I'm compacting, trimming, ..., in a loop now. * I moved the omaps to SSDs -- doesn't help. (iostat confirms this is not IO bound). * CPU util on the slow OSDs gets quite high during the slow trimming. * perf top is below [1]. leveldb::Block::Iter::Prev and leveldb::InternalKeyComparator::Compare are notable. * The always fast OSD shows no leveldb functions in perf top while trimming. I've tried bigger leveldb cache and block sizes, compression on and off, and played with the bloom size up to 14 bits -- none of these changes make any difference. At this point I'm not confident this trimming will ever complete -- there are ~20 million records to remove at maybe 1Hz. How about I just delete the meta.log object? Would this use a different, perhaps quicker, code path to remove those omap keys? Thanks! Dan [1] 4.92% libtcmalloc.so.4.2.6;5873e42b (deleted) [.] 0x00023e8d 4.47% libc-2.17.so [.] __memcmp_sse4_1 4.13% libtcmalloc.so.4.2.6;5873e42b (deleted) [.] 0x000273bb 3.81% libleveldb.so.1.0.7 [.] leveldb::Block::Iter::Prev 3.07% libc-2.17.so [.] __memcpy_ssse3_back 2.84% [kernel] [k] port_inb 2.77% libstdc++.so.6.0.19 [.] std::string::_M_mutate 2.75% libstdc++.so.6.0.19 [.] std::string::append 2.53% libleveldb.so.1.0.7 [.] leveldb::InternalKeyComparator::Compare 1.32% libtcmalloc.so.4.2.6;5873e42b (deleted) [.] 0x00023e77 0.85% [kernel] [k] _raw_spin_lock 0.80% libleveldb.so.1.0.7 [.] leveldb::Block::Iter::Next 0.77% libtcmalloc.so.4.2.6;5873e42b (deleted) [.] 0x00023a05 0.67% libleveldb.so.1.0.7 [.] leveldb::MemTable::KeyComparator::operator() 0.61% libtcmalloc.so.4.2.6;5873e42b (deleted) [.] 0x00023a09 0.58% libleveldb.so.1.0.7 [.] leveldb::MemTableIterator::Prev 0.51% [kernel] [k] __schedule 0.48% libruby.so.2.1.0 [.] ruby_yyparse ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw: scrub causing slow requests in the md log
On 06/22/2017 10:40 AM, Dan van der Ster wrote: On Thu, Jun 22, 2017 at 4:25 PM, Casey Bodley wrote: On 06/22/2017 04:00 AM, Dan van der Ster wrote: I'm now running the three relevant OSDs with that patch. (Recompiled, replaced /usr/lib64/rados-classes/libcls_log.so with the new version, then restarted the osds). It's working quite well, trimming 10 entries at a time instead of 1000, and no more timeouts. Do you think it would be worth decreasing this hardcoded value in ceph proper? -- Dan I do, yeah. At least, the trim operation should be able to pass in its own value for that. I opened a ticket for that at http://tracker.ceph.com/issues/20382. I'd also like to investigate using the ObjectStore's OP_OMAP_RMKEYRANGE operation to trim a range of keys in a single osd op, instead of generating a different op for each key. I have a PR that does this at https://github.com/ceph/ceph/pull/15183. But it's still hard to guarantee that leveldb can process the entire range inside of the suicide timeout. I wonder if that would help. Here's what I've learned today: * two of the 3 relevant OSDs have something screwy with their leveldb. The primary and 3rd replica are ~quick at trimming for only a few hundred keys, whilst the 2nd OSD is very very fast always. * After manually compacting the two slow OSDs, they are fast again for just a few hundred trims. So I'm compacting, trimming, ..., in a loop now. * I moved the omaps to SSDs -- doesn't help. (iostat confirms this is not IO bound). * CPU util on the slow OSDs gets quite high during the slow trimming. * perf top is below [1]. leveldb::Block::Iter::Prev and leveldb::InternalKeyComparator::Compare are notable. * The always fast OSD shows no leveldb functions in perf top while trimming. I've tried bigger leveldb cache and block sizes, compression on and off, and played with the bloom size up to 14 bits -- none of these changes make any difference. At this point I'm not confident this trimming will ever complete -- there are ~20 million records to remove at maybe 1Hz. How about I just delete the meta.log object? Would this use a different, perhaps quicker, code path to remove those omap keys? Thanks! Dan [1] 4.92% libtcmalloc.so.4.2.6;5873e42b (deleted) [.] 0x00023e8d 4.47% libc-2.17.so [.] __memcmp_sse4_1 4.13% libtcmalloc.so.4.2.6;5873e42b (deleted) [.] 0x000273bb 3.81% libleveldb.so.1.0.7 [.] leveldb::Block::Iter::Prev 3.07% libc-2.17.so [.] __memcpy_ssse3_back 2.84% [kernel] [k] port_inb 2.77% libstdc++.so.6.0.19 [.] std::string::_M_mutate 2.75% libstdc++.so.6.0.19 [.] std::string::append 2.53% libleveldb.so.1.0.7 [.] leveldb::InternalKeyComparator::Compare 1.32% libtcmalloc.so.4.2.6;5873e42b (deleted) [.] 0x00023e77 0.85% [kernel] [k] _raw_spin_lock 0.80% libleveldb.so.1.0.7 [.] leveldb::Block::Iter::Next 0.77% libtcmalloc.so.4.2.6;5873e42b (deleted) [.] 0x00023a05 0.67% libleveldb.so.1.0.7 [.] leveldb::MemTable::KeyComparator::operator() 0.61% libtcmalloc.so.4.2.6;5873e42b (deleted) [.] 0x00023a09 0.58% libleveldb.so.1.0.7 [.] leveldb::MemTableIterator::Prev 0.51% [kernel] [k] __schedule 0.48% libruby.so.2.1.0 [.] ruby_yyparse Hi Dan, Removing an object will try to delete all of its keys at once, which should be much faster. It's also very likely to hit your suicide timeout, so you'll have to keep retrying until it stops killing your osd. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)
After some testing (doing heavy IO on a rdb-based VM with hung_task_timeout_secs=1 while manually requesting deep-scrubs on the underlying pgs (as determined via rados ls->osdmaptool), I don’t think scrubbing is the cause. At least, I can’t make it happen this way… although I can’t *always* make it happen whileeither. I will continue testing as above, but suggestions on improved test methodology are welcome. We occasionally see blocked requests in a running log (ceph –w > log), but not correlated with hung VM IO. Scrubbing doesn’t seem correlated either. -- Eric On 6/21/17, 2:55 PM, "Jason Dillaman" wrote: Do your VMs or OSDs show blocked requests? If you disable scrub or restart the blocked OSD, does the issue go away? If yes, it most likely is this issue [1]. [1] http://tracker.ceph.com/issues/20041 On Wed, Jun 21, 2017 at 3:33 PM, Hall, Eric wrote: > The VMs are using stock Ubuntu14/16 images so yes, there is the default “/sbin/fstrim –all” in /etc/cron.weekly/fstrim. > > -- > Eric > > On 6/21/17, 1:58 PM, "Jason Dillaman" wrote: > > Are some or many of your VMs issuing periodic fstrims to discard > unused extents? > > On Wed, Jun 21, 2017 at 2:36 PM, Hall, Eric wrote: > > After following/changing all suggested items (turning off exclusive-lock > > (and associated object-map and fast-diff), changing host cache behavior, > > etc.) this is still a blocking issue for many uses of our OpenStack/Ceph > > installation. > > > > > > > > We have upgraded Ceph to 10.2.7, are running 4.4.0-62 or later kernels on > > all storage, compute hosts, and VMs, with libvirt 1.3.1 on compute hosts. > > Have also learned quite a bit about producing debug logs. ;) > > > > > > > > I’ve followed the related threads since March with bated breath, but still > > find no resolution. > > > > > > > > Previously, I got pulled away before I could produce/report discussed debug > > info, but am back on the case now. Please let me know how I can help > > diagnose and resolve this problem. > > > > > > > > Any assistance appreciated, > > > > -- > > > > Eric > > > > > > > > On 3/28/17, 3:05 AM, "Marius Vaitiekunas" > > wrote: > > > > > > > > > > > > > > > > On Mon, Mar 27, 2017 at 11:17 PM, Peter Maloney > > wrote: > > > > I can't guarantee it's the same as my issue, but from that it sounds the > > same. > > > > Jewel 10.2.4, 10.2.5 tested > > hypervisors are proxmox qemu-kvm, using librbd > > 3 ceph nodes with mon+osd on each > > > > -faster journals, more disks, bcache, rbd_cache, fewer VMs on ceph, iops > > and bw limits on client side, jumbo frames, etc. all improve/smooth out > > performance and mitigate the hangs, but don't prevent it. > > -hangs are usually associated with blocked requests (I set the complaint > > time to 5s to see them) > > -hangs are very easily caused by rbd snapshot + rbd export-diff to do > > incremental backup (one snap persistent, plus one more during backup) > > -when qemu VM io hangs, I have to kill -9 the qemu process for it to > > stop. Some broken VMs don't appear to be hung until I try to live > > migrate them (live migrating all VMs helped test solutions) > > > > Finally I have a workaround... disable exclusive-lock, object-map, and > > fast-diff rbd features (and restart clients via live migrate). > > (object-map and fast-diff appear to have no effect on dif or export-diff > > ... so I don't miss them). I'll file a bug at some point (after I move > > all VMs back and see if it is still stable). And one other user on IRC > > said this solved the same problem (also using rbd snapshots). > > > > And strangely, they don't seem to hang if I put back those features, > > until a few days later (making testing much less easy...but now I'm very > > sure removing them prevents the issue) > > > > I hope this works for you (and maybe gets some attention from devs too), > > so you don't waste months like me. > > > > > > On 03/27/17 19:31, Hall, Eric wrote: > >> In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel), > >> using libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and > >> ceph hosts, we occasionally see hung processes (usually during boot, but > >> otherwise as well), with errors reported in the ins
[ceph-users] Squeezing Performance of CEPH
Hi everybody, I want to squeeze all the performance of CEPH (we are using jewel 10.2.7). We are testing a testing environment with 2 nodes having the same configuration: * CentOS 7.3 * 24 CPUs (12 for real in hyper threading) * 32Gb of RAM * 2x 100Gbit/s ethernet cards * 2x OS dedicated in raid SSD Disks * 4x OSD SSD Disks SATA 6Gbit/s We are already expecting the following bottlenecks: * [ SATA speed x n° disks ] = 24Gbit/s * [ Networks speed x n° bonded cards ] = 200Gbit/s So the minimum between them is 24 Gbit/s per node (not taking in account protocol loss). 24Gbit/s per node x2 = 48Gbit/s of maximum hypotetical theorical gross speed. Here are the tests: ///IPERF2/// Tests are quite good scoring 88% of the bottleneck. Note: iperf2 can use only 1 connection from a bond.(it's a well know issue). [ ID] Interval Transfer Bandwidth [ 12] 0.0-10.0 sec 9.55 GBytes 8.21 Gbits/sec [ 3] 0.0-10.0 sec 10.3 GBytes 8.81 Gbits/sec [ 5] 0.0-10.0 sec 9.54 GBytes 8.19 Gbits/sec [ 7] 0.0-10.0 sec 9.52 GBytes 8.18 Gbits/sec [ 6] 0.0-10.0 sec 9.96 GBytes 8.56 Gbits/sec [ 8] 0.0-10.0 sec 12.1 GBytes 10.4 Gbits/sec [ 9] 0.0-10.0 sec 12.3 GBytes 10.6 Gbits/sec [ 10] 0.0-10.0 sec 10.2 GBytes 8.80 Gbits/sec [ 11] 0.0-10.0 sec 9.34 GBytes 8.02 Gbits/sec [ 4] 0.0-10.0 sec 10.3 GBytes 8.82 Gbits/sec [SUM] 0.0-10.0 sec 103 GBytes 88.6 Gbits/sec ///RADOS BENCH Take in consideration the maximum hypotetical speed of 48Gbit/s tests (due to disks bottleneck), tests are not good enought. * Average MB/s in write is almost 5-7Gbit/sec (12,5% of the mhs) * Average MB/s in seq read is almost 24Gbit/sec (50% of the mhs) * Average MB/s in random read is almost 27Gbit/se (56,25% of the mhs). Here are the reports. Write: # rados bench -p scbench 10 write --no-cleanup Total time run: 10.229369 Total writes made: 1538 Write size: 4194304 Object size:4194304 Bandwidth (MB/sec): 601.406 Stddev Bandwidth: 357.012 Max bandwidth (MB/sec): 1080 Min bandwidth (MB/sec): 204 Average IOPS: 150 Stddev IOPS:89 Max IOPS: 270 Min IOPS: 51 Average Latency(s): 0.106218 Stddev Latency(s): 0.198735 Max latency(s): 1.87401 Min latency(s): 0.0225438 sequential read: # rados bench -p scbench 10 seq Total time run: 2.054359 Total reads made: 1538 Read size:4194304 Object size: 4194304 Bandwidth (MB/sec): 2994.61 Average IOPS 748 Stddev IOPS: 67 Max IOPS: 802 Min IOPS: 707 Average Latency(s): 0.0202177 Max latency(s): 0.223319 Min latency(s): 0.00589238 random read: # rados bench -p scbench 10 rand Total time run: 10.036816 Total reads made: 8375 Read size:4194304 Object size: 4194304 Bandwidth (MB/sec): 3337.71 Average IOPS: 834 Stddev IOPS: 78 Max IOPS: 927 Min IOPS: 741 Average Latency(s): 0.0182707 Max latency(s): 0.257397 Min latency(s): 0.00469212 // It's seems like that there are some bottleneck somewhere that we are understimating. Can you help me to found it? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Obtaining perf counters/stats from krbd client
Hi Ceph users, We are currently using the Ceph kernel client module (krbd) in our deployment and we were looking to determine if there are ways by which we can obtain perf counters, log dumps, etc from such a deployment. Has anybody been able to obtain such stats? It looks like the libvirt interface allows for an admin socket to be configured on the client ( http://docs.ceph.com/docs/master/rbd/libvirt/#configuring-ceph) into which you can issue commands, but is this specific to the librbd implementation? Thanks, Prashant -- Prashant Murthy Sr Director, Software Engineering | Salesforce Mobile: 919-961-3041 -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Squeezing Performance of CEPH
Hello Massimiliano, Based on the configuration below, it appears you have 8 SSDs total (2 nodes with 4 SSDs each)? I'm going to assume you have 3x replication and are you using filestore, so in reality you are writing 3 copies and doing full data journaling for each copy, for 6x writes per client write. Taking this into account, your per-SSD throughput should be somewhere around: Sequential write: ~600 * 3 (copies) * 2 (journal write per copy) / 8 (ssds) = ~450MB/s Sequential read ~3000 / 8 (ssds) = ~375MB/s Random read ~3337 / 8 (ssds) = ~417MB/s These numbers are pretty reasonable for SATA based SSDs, though the read throughput is a little low. You didn't include the model of SSD, but if you look at Intel's DC S3700 which is a fairly popular SSD for ceph: https://www.intel.com/content/www/us/en/solid-state-drives/ssd-dc-s3700-spec.html Sequential read is up to ~500MB/s and Sequential write speeds up to 460MB/s. Not too far off from what you are seeing. You might try playing with readahead on the OSD devices to see if that improves things at all. Still, unless I've missed something these numbers aren't terrible. Mark On 06/22/2017 12:19 PM, Massimiliano Cuttini wrote: Hi everybody, I want to squeeze all the performance of CEPH (we are using jewel 10.2.7). We are testing a testing environment with 2 nodes having the same configuration: * CentOS 7.3 * 24 CPUs (12 for real in hyper threading) * 32Gb of RAM * 2x 100Gbit/s ethernet cards * 2x OS dedicated in raid SSD Disks * 4x OSD SSD Disks SATA 6Gbit/s We are already expecting the following bottlenecks: * [ SATA speed x n° disks ] = 24Gbit/s * [ Networks speed x n° bonded cards ] = 200Gbit/s So the minimum between them is 24 Gbit/s per node (not taking in account protocol loss). 24Gbit/s per node x2 = 48Gbit/s of maximum hypotetical theorical gross speed. Here are the tests: ///IPERF2/// Tests are quite good scoring 88% of the bottleneck. Note: iperf2 can use only 1 connection from a bond.(it's a well know issue). [ ID] Interval Transfer Bandwidth [ 12] 0.0-10.0 sec 9.55 GBytes 8.21 Gbits/sec [ 3] 0.0-10.0 sec 10.3 GBytes 8.81 Gbits/sec [ 5] 0.0-10.0 sec 9.54 GBytes 8.19 Gbits/sec [ 7] 0.0-10.0 sec 9.52 GBytes 8.18 Gbits/sec [ 6] 0.0-10.0 sec 9.96 GBytes 8.56 Gbits/sec [ 8] 0.0-10.0 sec 12.1 GBytes 10.4 Gbits/sec [ 9] 0.0-10.0 sec 12.3 GBytes 10.6 Gbits/sec [ 10] 0.0-10.0 sec 10.2 GBytes 8.80 Gbits/sec [ 11] 0.0-10.0 sec 9.34 GBytes 8.02 Gbits/sec [ 4] 0.0-10.0 sec 10.3 GBytes 8.82 Gbits/sec [SUM] 0.0-10.0 sec 103 GBytes 88.6 Gbits/sec ///RADOS BENCH Take in consideration the maximum hypotetical speed of 48Gbit/s tests (due to disks bottleneck), tests are not good enought. * Average MB/s in write is almost 5-7Gbit/sec (12,5% of the mhs) * Average MB/s in seq read is almost 24Gbit/sec (50% of the mhs) * Average MB/s in random read is almost 27Gbit/se (56,25% of the mhs). Here are the reports. Write: # rados bench -p scbench 10 write --no-cleanup Total time run: 10.229369 Total writes made: 1538 Write size: 4194304 Object size:4194304 Bandwidth (MB/sec): 601.406 Stddev Bandwidth: 357.012 Max bandwidth (MB/sec): 1080 Min bandwidth (MB/sec): 204 Average IOPS: 150 Stddev IOPS:89 Max IOPS: 270 Min IOPS: 51 Average Latency(s): 0.106218 Stddev Latency(s): 0.198735 Max latency(s): 1.87401 Min latency(s): 0.0225438 sequential read: # rados bench -p scbench 10 seq Total time run: 2.054359 Total reads made: 1538 Read size:4194304 Object size: 4194304 Bandwidth (MB/sec): 2994.61 Average IOPS 748 Stddev IOPS: 67 Max IOPS: 802 Min IOPS: 707 Average Latency(s): 0.0202177 Max latency(s): 0.223319 Min latency(s): 0.00589238 random read: # rados bench -p scbench 10 rand Total time run: 10.036816 Total reads made: 8375 Read size:4194304 Object size: 4194304 Bandwidth (MB/sec): 3337.71 Average IOPS: 834 Stddev IOPS: 78 Max IOPS: 927 Min IOPS: 741 Average Latency(s): 0.0182707 Max latency(s): 0.257397 Min latency(s): 0.00469212 // It's seems like that there are some bottleneck somewhere that we are understimating. Can you help me to found it? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://
Re: [ceph-users] Squeezing Performance of CEPH
Hello, Also as Mark put, one minute your testing bandwidth capacity, next minute your testing disk capacity. No way is a small set of SSD’s going to be able to max your current bandwidth, even if you removed the CEPH / Journal overhead. I would say the speeds you are getting are what you should expect , see with many other setups. ,Ashley Sent from my iPhone On 23 Jun 2017, at 12:42 AM, Mark Nelson mailto:mnel...@redhat.com>> wrote: Hello Massimiliano, Based on the configuration below, it appears you have 8 SSDs total (2 nodes with 4 SSDs each)? I'm going to assume you have 3x replication and are you using filestore, so in reality you are writing 3 copies and doing full data journaling for each copy, for 6x writes per client write. Taking this into account, your per-SSD throughput should be somewhere around: Sequential write: ~600 * 3 (copies) * 2 (journal write per copy) / 8 (ssds) = ~450MB/s Sequential read ~3000 / 8 (ssds) = ~375MB/s Random read ~3337 / 8 (ssds) = ~417MB/s These numbers are pretty reasonable for SATA based SSDs, though the read throughput is a little low. You didn't include the model of SSD, but if you look at Intel's DC S3700 which is a fairly popular SSD for ceph: https://www.intel.com/content/www/us/en/solid-state-drives/ssd-dc-s3700-spec.html Sequential read is up to ~500MB/s and Sequential write speeds up to 460MB/s. Not too far off from what you are seeing. You might try playing with readahead on the OSD devices to see if that improves things at all. Still, unless I've missed something these numbers aren't terrible. Mark On 06/22/2017 12:19 PM, Massimiliano Cuttini wrote: Hi everybody, I want to squeeze all the performance of CEPH (we are using jewel 10.2.7). We are testing a testing environment with 2 nodes having the same configuration: * CentOS 7.3 * 24 CPUs (12 for real in hyper threading) * 32Gb of RAM * 2x 100Gbit/s ethernet cards * 2x OS dedicated in raid SSD Disks * 4x OSD SSD Disks SATA 6Gbit/s We are already expecting the following bottlenecks: * [ SATA speed x n° disks ] = 24Gbit/s * [ Networks speed x n° bonded cards ] = 200Gbit/s So the minimum between them is 24 Gbit/s per node (not taking in account protocol loss). 24Gbit/s per node x2 = 48Gbit/s of maximum hypotetical theorical gross speed. Here are the tests: ///IPERF2/// Tests are quite good scoring 88% of the bottleneck. Note: iperf2 can use only 1 connection from a bond.(it's a well know issue). [ ID] Interval Transfer Bandwidth [ 12] 0.0-10.0 sec 9.55 GBytes 8.21 Gbits/sec [ 3] 0.0-10.0 sec 10.3 GBytes 8.81 Gbits/sec [ 5] 0.0-10.0 sec 9.54 GBytes 8.19 Gbits/sec [ 7] 0.0-10.0 sec 9.52 GBytes 8.18 Gbits/sec [ 6] 0.0-10.0 sec 9.96 GBytes 8.56 Gbits/sec [ 8] 0.0-10.0 sec 12.1 GBytes 10.4 Gbits/sec [ 9] 0.0-10.0 sec 12.3 GBytes 10.6 Gbits/sec [ 10] 0.0-10.0 sec 10.2 GBytes 8.80 Gbits/sec [ 11] 0.0-10.0 sec 9.34 GBytes 8.02 Gbits/sec [ 4] 0.0-10.0 sec 10.3 GBytes 8.82 Gbits/sec [SUM] 0.0-10.0 sec 103 GBytes 88.6 Gbits/sec ///RADOS BENCH Take in consideration the maximum hypotetical speed of 48Gbit/s tests (due to disks bottleneck), tests are not good enought. * Average MB/s in write is almost 5-7Gbit/sec (12,5% of the mhs) * Average MB/s in seq read is almost 24Gbit/sec (50% of the mhs) * Average MB/s in random read is almost 27Gbit/se (56,25% of the mhs). Here are the reports. Write: # rados bench -p scbench 10 write --no-cleanup Total time run: 10.229369 Total writes made: 1538 Write size: 4194304 Object size:4194304 Bandwidth (MB/sec): 601.406 Stddev Bandwidth: 357.012 Max bandwidth (MB/sec): 1080 Min bandwidth (MB/sec): 204 Average IOPS: 150 Stddev IOPS:89 Max IOPS: 270 Min IOPS: 51 Average Latency(s): 0.106218 Stddev Latency(s): 0.198735 Max latency(s): 1.87401 Min latency(s): 0.0225438 sequential read: # rados bench -p scbench 10 seq Total time run: 2.054359 Total reads made: 1538 Read size:4194304 Object size: 4194304 Bandwidth (MB/sec): 2994.61 Average IOPS 748 Stddev IOPS: 67 Max IOPS: 802 Min IOPS: 707 Average Latency(s): 0.0202177 Max latency(s): 0.223319 Min latency(s): 0.00589238 random read: # rados bench -p scbench 10 rand Total time run: 10.036816 Total reads made: 8375 Read size:4194304 Object size: 4194304 Bandwidth (MB/sec): 3337.71 Average IOPS: 834 Stddev IOPS: 78 Max IOPS: 927 Min IOPS: 741 Average Latency(s): 0.0182707 Max latency(s): 0.257397 Min latency(s): 0.00469212 // It's
Re: [ceph-users] Squeezing Performance of CEPH
Generally you can measure your bottleneck via a tool like atop/collectl/sysstat and see how busy (ie %busy, %util ) your resources are: cpu/disks/net. As was pointed out, in your case you will most probably have maxed out on your disks. But the above tools should help as you grow and tune your cluster. Cheers, Maged Mokhtar PetaSAN On 2017-06-22 19:19, Massimiliano Cuttini wrote: > Hi everybody, > > I want to squeeze all the performance of CEPH (we are using jewel 10.2.7). > We are testing a testing environment with 2 nodes having the same > configuration: > > * CentOS 7.3 > * 24 CPUs (12 for real in hyper threading) > * 32Gb of RAM > * 2x 100Gbit/s ethernet cards > * 2x OS dedicated in raid SSD Disks > * 4x OSD SSD Disks SATA 6Gbit/s > > We are already expecting the following bottlenecks: > > * [ SATA speed x n° disks ] = 24Gbit/s > * [ Networks speed x n° bonded cards ] = 200Gbit/s > > So the minimum between them is 24 Gbit/s per node (not taking in account > protocol loss). > > 24Gbit/s per node x2 = 48Gbit/s of maximum hypotetical theorical gross speed. > > Here are the tests: > ///IPERF2/// Tests are quite good scoring 88% of the bottleneck. > Note: iperf2 can use only 1 connection from a bond.(it's a well know issue). > >> [ ID] Interval Transfer Bandwidth >> [ 12] 0.0-10.0 sec 9.55 GBytes 8.21 Gbits/sec >> [ 3] 0.0-10.0 sec 10.3 GBytes 8.81 Gbits/sec >> [ 5] 0.0-10.0 sec 9.54 GBytes 8.19 Gbits/sec >> [ 7] 0.0-10.0 sec 9.52 GBytes 8.18 Gbits/sec >> [ 6] 0.0-10.0 sec 9.96 GBytes 8.56 Gbits/sec >> [ 8] 0.0-10.0 sec 12.1 GBytes 10.4 Gbits/sec >> [ 9] 0.0-10.0 sec 12.3 GBytes 10.6 Gbits/sec >> [ 10] 0.0-10.0 sec 10.2 GBytes 8.80 Gbits/sec >> [ 11] 0.0-10.0 sec 9.34 GBytes 8.02 Gbits/sec >> [ 4] 0.0-10.0 sec 10.3 GBytes 8.82 Gbits/sec >> [SUM] 0.0-10.0 sec 103 GBytes 88.6 Gbits/sec > > ///RADOS BENCH > > Take in consideration the maximum hypotetical speed of 48Gbit/s tests (due to > disks bottleneck), tests are not good enought. > > * Average MB/s in write is almost 5-7Gbit/sec (12,5% of the mhs) > * Average MB/s in seq read is almost 24Gbit/sec (50% of the mhs) > * Average MB/s in random read is almost 27Gbit/se (56,25% of the mhs). > > Here are the reports. > Write: > >> # rados bench -p scbench 10 write --no-cleanup >> Total time run: 10.229369 >> Total writes made: 1538 >> Write size: 4194304 >> Object size:4194304 >> Bandwidth (MB/sec): 601.406 >> Stddev Bandwidth: 357.012 >> Max bandwidth (MB/sec): 1080 >> Min bandwidth (MB/sec): 204 >> Average IOPS: 150 >> Stddev IOPS:89 >> Max IOPS: 270 >> Min IOPS: 51 >> Average Latency(s): 0.106218 >> Stddev Latency(s): 0.198735 >> Max latency(s): 1.87401 >> Min latency(s): 0.0225438 > > sequential read: > >> # rados bench -p scbench 10 seq >> Total time run: 2.054359 >> Total reads made: 1538 >> Read size:4194304 >> Object size: 4194304 >> Bandwidth (MB/sec): 2994.61 >> Average IOPS 748 >> Stddev IOPS: 67 >> Max IOPS: 802 >> Min IOPS: 707 >> Average Latency(s): 0.0202177 >> Max latency(s): 0.223319 >> Min latency(s): 0.00589238 > > random read: > >> # rados bench -p scbench 10 rand >> Total time run: 10.036816 >> Total reads made: 8375 >> Read size:4194304 >> Object size: 4194304 >> Bandwidth (MB/sec): 3337.71 >> Average IOPS: 834 >> Stddev IOPS: 78 >> Max IOPS: 927 >> Min IOPS: 741 >> Average Latency(s): 0.0182707 >> Max latency(s): 0.257397 >> Min latency(s): 0.00469212 > > // > > It's seems like that there are some bottleneck somewhere that we are > understimating. > Can you help me to found it? > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Squeezing Performance of CEPH
On 22/06/2017 19:19, Massimiliano Cuttini wrote: > We are already expecting the following bottlenecks: > > * [ SATA speed x n° disks ] = 24Gbit/s > * [ Networks speed x n° bonded cards ] = 200Gbit/s 6Gbps SATA does not mean you can read 6Gbps from that device ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mon Create currently at the state of probing
David, SUCCESS!! Thank you so much! I rebuilt the node because I could not install Jewel over the remnants of Kraken. So, while I did install Jewel I am not convinced that was the solution. I did something that I had not tried under the Kraken attempts that solved the problem. For future_me here was the solution. Removed all references to r710e from the ceph.conf on ceph-deploy node in the original deployment folder home/cephadminaccount/ceph-cluster/ceph.conf “Ceph-deploy –overwrite-conf config push r710a r710b r710c” etc to all nodes including the ceph-deploy node so it is now in the /etc/ceph/ceph.conf “Ceph-deploy install --release jewel r710e” “Ceph-deploy admin r710e” “sudo chmod +r /etc/ceph/ceph.client.admin.keyring” Run on node r710e “ceph-deploy mon create r710e” Node was created but still had the very same probing errors. Ugh. Then I went to home/cephadminaccount/ceph-cluster/ceph.conf and added r710e back in just the way it was before and pushed it to all nodes. “Ceph-deploy –overwrite-conf config push r710a r710b r710c” etc “Sudo reboot” on r710g don’t know if this was necessary. When it came up ceph -s was good. Rebooted r710e for good measure. Did not reboot r710f. I am wondering if I had just pushed the ceph.conf back out in the first place, would it have solved the problem. That is for another day. -Jim From: David Turner [mailto:drakonst...@gmail.com] Sent: Wednesday, June 21, 2017 4:19 PM To: Jim Forde Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Mon Create currently at the state of probing You can specify an option in ceph-deploy to tell it which release of ceph to install, jewel, kraken, hammer, etc. `ceph-deploy --release jewel` would pin the command to using jewel instead of kraken. While running a mixed environment is supported, it should always be tested before assuming it will work for you in production. The Mons are quick enough to upgrade, I always do them together. Following I upgrade half of my OSDs in a test environment and leave it there for a couple weeks (or until adequate testing is done) before upgrading the remaining OSDs and again waiting until the testing is done, I would probably do the MDS before the OSDs, but don't usually think about that since I don't have them in a production environment. Lastly I would test upgrading the clients (vm hosts, RGW, kernel clients, etc) and test this state the most thoroughly. In production I haven't had to worry about an upgrade taking longer than a few hours with over 60 OSD nodes, 5 mons, and a dozen clients. I just don't see a need to run in a mixed environment in production, even if it is supported. Back to your problem with adding in the mon. Do your existing mons know about the third mon, or have you removed it from their running config? It might be worth double checking their config file and restarting the daemons after you know they will pick up the correct settings. It's hard for me to help with this part as I've been lucky enough not to have any problems with the docs online for this when it's come up. I've replaced 5 mons without any issues. I didn't use ceph-deploy, except to install the packages, though and did the manual steps for it. Hopefully adding the mon back on Jewel fixes the issue. That would be the easiest outcome. I don't know that the Ceph team has tested adding upgraded mons to an old quorum. On Wed, Jun 21, 2017 at 4:52 PM Jim Forde mailto:j...@mninc.net>> wrote: David, Thanks for the reply. The scenario: Monitor node fails for whatever reason, Bad blocks in HD, or Motherboard fail, whatever. Procedure: Remove the monitor from the cluster, replace hardware, reinstall OS and add monitor to cluster. That is exactly what I did. However, my ceph-deploy node had already been upgraded to Kraken. The goal is to not use this as an upgrade path per se, but to recover from a failed monitor node in a cluster where there is an upgrade in progress. The upgrade notes for Jewel to Kraken say you may upgrade OSDs Monitors and MSDs in any order. Perhaps I am reading too much into this, but I took it as I could proceed with the upgrade at my leisure. Making sure each node is successfully upgraded before proceeding to the next node. The implication is that I can run the cluster with different version daemons (at least during the upgrade process). So that brings me to the problem at hand. What is the correct procedure for replacing a failed Monitor Node, especially if the failed Monitor is a mon_initial_member? Does it have to be the same version as the other Monitors in the cluster? I do have a public network statement in the ceph.conf file. The monitor r710e is listed as one of the mon_initial_members in ceph.conf with the correct IP address, but the error message is: “[r710e][WARNIN] r710e is not defined in `mon initial members`” Also “[r710e][WARNIN] monitor r710e does not exist in monmap” Should I manually inject r710e in the monmap?
[ceph-users] osd down but the service is up
Hi All I am recently testing a new ceph cluster with SSD as journal. ceph -v ceph version 10.2.7 cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.4 Beta (Maipo) I followed http://ceph.com/geen-categorie/ceph-recover-osds-after-ssd-journal-failure/ to replace the journal drive. (for testing) All the other ceph service are running but the osd@0 got crashed. #systemctl -l status ceph-osd@0 ● ceph-osd@0.service - Ceph object storage daemon Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: disabled) Active: activating (auto-restart) (Result: signal) since Thu 2017-06-22 15:44:04 EDT; 1s ago Process: 9580 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=killed, signal=ABRT) Process: 9535 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS) Main PID: 9580 (code=killed, signal=ABRT) Jun 22 15:44:04 tinsfsceph01.abc.ca systemd[1]: Unit ceph-osd@0.service entered failed state. Jun 22 15:44:04 tinsfsceph01.abc.ca systemd[1]: ceph-osd@0.service failed. Log file shows: --- begin dump of recent events --- 0> 2017-06-22 15:45:45.396425 7f4df5030800 -1 *** Caught signal (Aborted) ** in thread 7f4df5030800 thread_name:ceph-osd ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) 1: (()+0x91d8ea) [0x561eda3988ea] 2: (()+0xf5e0) [0x7f4df377d5e0] 3: (gsignal()+0x37) [0x7f4df1d3c1f7] 4: (abort()+0x148) [0x7f4df1d3d8e8] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x561eda4962e7] 6: (()+0x30640e) [0x561ed9d8140e] 7: (FileJournal::~FileJournal()+0x24a) [0x561eda17d7ca] 8: (JournalingObjectStore::journal_replay(unsigned long)+0xff2) [0x561eda18cc52] 9: (FileStore::mount()+0x3cd6) [0x561eda163576] 10: (OSD::init()+0x27d) [0x561ed9e21a1d] 11: (main()+0x2c55) [0x561ed9d86dc5] 12: (__libc_start_main()+0xf5) [0x7f4df1d28c05] 13: (()+0x3561e7) [0x561ed9dd11e7] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Any help? Thanks Alex ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com