Re: [ceph-users] Ceph with Clos IP fabric
Agreed. In an ideal world I would have interleaved all my compute, long term storage and processing posix. Unfortunately, business doesn't always work out so nicely so I'm left with buying and building out to match changing needs. In this case we are a small part of a larger org and have been allocated X racks in the cage, which is at this point land locked with no room to expand so it is actual floor space that's limited. Hence the necessity to go as dense as possible when adding any new capacity. Luckily ceph is flexible enough to function fine when deployed like an EMC solution, it's just muuuch cheaper and more fun to operate! Aaron On Apr 24, 2017, at 12:59 AM, Richard Hesse mailto:richard.he...@weebly.com>> wrote: It's not a requirement to build out homogeneous racks of ceph gear. Most larger places don't do that (it creates weird hot spots). If you have 5 racks of gear, you're better off spreading out servers in those 5 than just a pair of racks that are really built up. In Aaron's case, he can easily do that since he's not using a cluster network. Just be sure to dial in your crush map and failure domains with only a pair of installed cabinets. Thanks for sharing Christian! It's always good to hear about how others are using and deploying Ceph, while coming to similar and different conclusions. Also,when you say datacenter space is expensive, are you referring to power or actual floor space? Datacenter space is almost always sold by power and floor space is usually secondary. Are there markets where that's opposite? If so, those are ripe for new entrants! On Apr 23, 2017 7:56 PM, "Christian Balzer" mailto:ch...@gol.com>> wrote: Hello, Aaron pretty much stated most of what I was going to write, but to generalize things and make some points more obvious, I shall pipe up as well. On Sat, 22 Apr 2017 21:45:58 -0700 Richard Hesse wrote: > Out of curiosity, why are you taking a scale-up approach to building your > ceph clusters instead of a scale-out approach? Ceph has traditionally been > geared towards a scale-out, simple shared nothing mindset. While true, scale-out does come at a cost: a) rack space, which is mighty expensive where we want/need to be and also of limited availability in those locations. b) increased costs by having more individual servers, as in having two servers with 6 OSDs versus 1 with 12 OSDs will cost you about 30-40% more at the least (chassis, MB, PSU, NIC). And then there is the whole scale thing in general, I'm getting the impression that the majority of Ceph users have small to at best medium sized clusters, simply because they don't need all that much capacity (in terms of storage space). Case in point, our main production Ceph clusters fit into 8-10U with 3-4 HDD based OSD servers and 2-4 SSD based cache tiers, obviously at this size with everything being redundant (switches, PDU, PSU). Serving hundreds (nearly 600 atm) of VMs, with a planned peak around 800 VMs. That Ceph cluster will never have to grow beyond this size. For me Ceph (RBD) was/is a more scalable approach than DRBD, allowing for n+1 compute node deployments instead of having pairs (where one can't live migrate to outside of this pair). >These dual ToR > deploys remind me of something from EMC, not ceph. Really curious as I'd > rather have 5-6 racks of single ToR switches as opposed to three racks of > dual ToR. Is there a specific application or requirement? It's definitely > adding a lot of complexity; just wondering what the payoff is. > If you have plenty of racks, bully for you. Though personally I'd try to keep failure domains (especially when they are as large as full rack!) to something like 10% of the cluster. We're not using Ethernet for the Ceph network (IPoIB), but if we were it would be dual TORS with MC-LAG (and dual PSU, PDU) all the way. Why have a SPOF that WILL impact your system (a rack worth of data movement) in the first place? Regards, Christian > Also, why are you putting your "cluster network" on the same physical > interfaces but on separate VLANs? Traffic shaping/policing? What's your > link speed there on the hosts? (25/40gbps?) > > On Sat, Apr 22, 2017 at 12:13 PM, Aaron Bassett > mailto:aaron.bass...@nantomics.com> > > wrote: > > > FWIW, I use a CLOS fabric with layer 3 right down to the hosts and > > multiple ToRs to enable HA/ECMP to each node. I'm using Cumulus Linux's > > "redistribute neighbor" feature, which advertises a /32 for any ARP'ed > > neighbor. I set up the hosts with an IP on each physical interface and on > > an aliased looopback: lo:0. I handle the separate cluster network by adding > > a vlan to each interface and routing those separately on the ToRs with acls > > to keep traffic apart. > > > > Their documentation may help clarify a bit: > > https://docs.cumulusnetworks.com/display/DOCS/Redistribute+
[ceph-users] Hadoop with CephFS
Hello, I am using the ceph 10.2.5 release version. Is this version's cephFS support hadoop cluster requirement? (Anyone using the same) Thanks Swami ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Package Repo on Ubuntu Precise(12.04) is broken
On Mon, Apr 24, 2017 at 2:41 AM, Xiaoxi Chen wrote: > Well, I can definitely build my own, > > 1. Precise is NOT EOL on Hammer release, which was confirmed in > previous mail thread. So we still need to maintain point-in-time > hammer package for end users. Ceph Hammer is EOL > > 2. It is NOT ONLY missing 0.94.10, instead, as how we organize the > repo index(only contains latest package in index), now all 0.94.x > package on precise are not installable via apt. > I think this may be because we didn't built precise for 0.94.10 and Debian repositories do not support multi-version packages. So although other versions are there, but the latest one isn't, the repository acts as there is nothing for Precise. I would suggest an upgrade to a newer Ceph version at this point, although Precise isn't built for any newer Ceph versions, so effectively you are looking at upgrading to a newer OS as well. > > 2017-04-24 14:02 GMT+08:00 xiaoguang fan : >> If you need this deb package 0.94.10 on precise(12.04), I think you can >> build it by yourself, you can use the script make_deps.sh >> >> 2017-04-24 11:35 GMT+08:00 Xiaoxi Chen : >>> >>> Hi, >>> >>> The 0.94.10 packages were not build for Ubuntu Precise, till now. >>> What is worse, the dist discription >>> >>> (http://download.ceph.com/debian-hammer/dists/precise/main/binary-amd64/Packages) >>> doesnt contains any ceph core pacages. >>> >>> It make Precise user unable provision their ceph cluster/client. >>> Could anyone pls help to fix it? >>> >>> Xiaoxi >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?
Hello Orit, Could it be that something has changed in 10.2.5+ which is related to reading the endpoints from the zone/period config? In my master zone I have specified the endpoint with a trailing backslash (which is also escaped), however I do not define the secondary endpoint this way. Am I hitting a bug here? Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 21/04/17 09:36, Ben Morrice wrote: Hello Orit, Please find attached the output from the radosgw commands and the relevant section from ceph.conf (radosgw) bbp-gva-master is running 10.2.5 bbp-gva-secondary is running 10.2.7 Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 21/04/17 07:55, Orit Wasserman wrote: Hi Ben, On Thu, Apr 20, 2017 at 6:08 PM, Ben Morrice wrote: Hi all, I have tried upgrading one of our RGW servers from 10.2.5 to 10.2.7 (RHEL7) and authentication is in a very bad state. This installation is part of a multigw configuration, and I have just updated one host in the secondary zone (all other hosts/zones are running 10.2.5). On the 10.2.7 server I cannot authenticate as a user (normally backed by OpenStack Keystone), but even worse I can also not authenticate with an admin user. Please see [1] for the results of performing a list bucket operation with python boto (script works against rgw 10.2.5) Also, if I try to authenticate from the 'master' rgw zone with a "radosgw-admin sync status --rgw-zone=bbp-gva-master" I get: "ERROR: failed to fetch datalog info" "failed to retrieve sync info: (13) Permission denied" The above errors correlates to the errors in the log on the server running 10.2.7 (debug level 20) at [2] I'm not sure what I have done wrong or can try next? By the way, downgrading the packages from 10.2.7 to 10.2.5 returns authentication functionality Can you provide the following info: radosgw-admin period get radsogw-admin zonegroup get radsogw-admin zone get Can you provide your ceph.conf? Thanks, Orit [1] boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden encoding="UTF-8"?>SignatureDoesNotMatchtx4-0058f8c86a-3fa2959-bbp-gva-secondary3fa2959-bbp-gva-secondary-bbp-gva [2] /bbpsrvc15.cscs.ch/admin/log 2017-04-20 16:43:04.916253 7ff87c6c0700 15 calculated digest=Ofg/f/NI0L4eEG1MsGk4PsVscTM= 2017-04-20 16:43:04.916255 7ff87c6c0700 15 auth_sign=qZ3qsy7AuNCOoPMhr8yNoy5qMKU= 2017-04-20 16:43:04.916255 7ff87c6c0700 15 compare=34 2017-04-20 16:43:04.916266 7ff87c6c0700 10 failed to authorize request 2017-04-20 16:43:04.916268 7ff87c6c0700 20 handler->ERRORHANDLER: err_no=-2027 new_err_no=-2027 2017-04-20 16:43:04.916329 7ff87c6c0700 2 req 354:0.052585:s3:GET /admin/log:get_obj:op status=0 2017-04-20 16:43:04.916339 7ff87c6c0700 2 req 354:0.052595:s3:GET /admin/log:get_obj:http status=403 2017-04-20 16:43:04.916343 7ff87c6c0700 1 == req done req=0x7ff87c6ba710 op status=0 http_status=403 == 2017-04-20 16:43:04.916350 7ff87c6c0700 20 process_request() returned -2027 2017-04-20 16:43:04.916390 7ff87c6c0700 1 civetweb: 0x7ff990015610: 10.80.6.26 - - [20/Apr/2017:16:43:04 +0200] "GET /admin/log HTTP/1.1" 403 0 - - 2017-04-20 16:43:04.917212 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff9703a5440:18RGWMetaSyncShardCR: operate() 2017-04-20 16:43:04.917223 7ff9777e6700 20 rgw meta sync: incremental_sync:1544: shard_id=20 mdlog_marker=1_1492686039.901886_5551978.1 sync_marker.marker=1_1492686039.901886_5551978.1 period_marker= 2017-04-20 16:43:04.917227 7ff9777e6700 20 rgw meta sync: incremental_sync:1551: shard_id=20 syncing mdlog for shard_id=20 2017-04-20 16:43:04.917236 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.917238 7ff9777e6700 20 rgw meta sync: operate: shard_id=20: init request 2017-04-20 16:43:04.917240 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.917241 7ff9777e6700 20 rgw meta sync: operate: shard_id=20: reading shard status 2017-04-20 16:43:04.917303 7ff9777e6700 20 run: stack=0x7ff97000d420 is io blocked 2017-04-20 16:43:04.918285 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.918295 7ff9777e6700 20 rgw meta sync: operate: shard_id=20: reading shard status complete 2017-04-20 16:43:04.918307 7ff9777e6700 20 rgw meta sync: shard_id=20 marker=1_1492686039.901886_5551978.1 last_update=2017-04-20 13:00:39.0.901886s 2017-04-20 16:43:04.918316 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.918317 7ff9777e6700
Re: [ceph-users] Ceph Package Repo on Ubuntu Precise(12.04) is broken
On Mon, Apr 24, 2017 at 8:53 AM, Alfredo Deza wrote: > On Mon, Apr 24, 2017 at 2:41 AM, Xiaoxi Chen wrote: >> Well, I can definitely build my own, >> >> 1. Precise is NOT EOL on Hammer release, which was confirmed in >> previous mail thread. So we still need to maintain point-in-time >> hammer package for end users. > > Ceph Hammer is EOL > >> >> 2. It is NOT ONLY missing 0.94.10, instead, as how we organize the >> repo index(only contains latest package in index), now all 0.94.x >> package on precise are not installable via apt. >> > > I think this may be because we didn't built precise for 0.94.10 and > Debian repositories do not support multi-version packages. So although > other versions are there, but the latest one isn't, the repository > acts as there is nothing for Precise. > > I would suggest an upgrade to a newer Ceph version at this point, > although Precise isn't built for any newer Ceph versions, so > effectively you are looking at > upgrading to a newer OS as well. > >> >> 2017-04-24 14:02 GMT+08:00 xiaoguang fan : >>> If you need this deb package 0.94.10 on precise(12.04), I think you can >>> build it by yourself, you can use the script make_deps.sh >>> >>> 2017-04-24 11:35 GMT+08:00 Xiaoxi Chen : Hi, The 0.94.10 packages were not build for Ubuntu Precise, till now. What is worse, the dist discription (http://download.ceph.com/debian-hammer/dists/precise/main/binary-amd64/Packages) doesnt contains any ceph core pacages. It make Precise user unable provision their ceph cluster/client. Could anyone pls help to fix it? I've started to build the Precise packages for that version, hopefully we can get something worked out today or tomorrow. You are right that Precise wasn't EOL when that version of Hammer was released, and this was an omission on our end. Xiaoxi -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] v12.0.2 Luminous (dev) released
This is the third development checkpoint release of Luminous, the next long term stable release. Major changes from v12.0.1 -- * The original librados rados_objects_list_open (C) and objects_begin (C++) object listing API, deprecated in Hammer, has finally been removed. Users of this interface must update their software to use either the rados_nobjects_list_open (C) and nobjects_begin (C++) API or the new rados_object_list_begin (C) and object_list_begin (C++) API before updating the client-side librados library to Luminous. Object enumeration (via any API) with the latest librados version and pre-Hammer OSDs is no longer supported. Note that no in-tree Ceph services rely on object enumeration via the deprecated APIs, so only external librados users might be affected. The newest (and recommended) rados_object_list_begin (C) and object_list_begin (C++) API is only usable on clusters with the SORTBITWISE flag enabled (Jewel and later). (Note that this flag is required to be set before upgrading beyond Jewel.) * CephFS clients without the 'p' flag in their authentication capability string will no longer be able to set quotas or any layout fields. This flag previously only restricted modification of the pool and namespace fields in layouts. * CephFS directory fragmentation (large directory support) is enabled by default on new filesystems. To enable it on existing filesystems use "ceph fs set allow_dirfrags". * CephFS will generate a health warning if you have fewer standby daemons than it thinks you wanted. By default this will be 1 if you ever had a standby, and 0 if you did not. You can customize this using ``ceph fs set standby_count_wanted ``. Setting it to zero will effectively disable the health check. * The "ceph mds tell ..." command has been removed. It is superseded by "ceph tell mds. ..." * RGW introduces server side encryption of uploaded objects with 3 options for the management of encryption keys, automatic encryption (only recommended for test setups), customer provided keys similar to Amazon SSE KMS specification & using a key management service (openstack barbician) For a more detailed changelog, refer to http://ceph.com/releases/ceph-v12-0-2-luminous-dev-released/ Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-12.0.2.tar.gz * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ * For ceph-deploy, see http://docs.ceph.com/docs/master/install/install-ceph-deploy * Release sha1: 5a1b6b3269da99a18984c138c23935e5eb96f73e -- Abhishek Lekshmanan SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?
Hi Ben, On Mon, Apr 24, 2017 at 4:36 PM, Ben Morrice wrote: > Hello Orit, > > Could it be that something has changed in 10.2.5+ which is related to > reading the endpoints from the zone/period config? > I don't remember any change for endpoints config, but I will go over the changes to make sure. There were a few changes with tenant handling that may cause this regression. > In my master zone I have specified the endpoint with a trailing backslash > (which is also escaped), however I do not define the secondary endpoint this > way. Am I hitting a bug here? > Can you update the secondary endpoint and see if it helps? Please open a bug in tracker with regarding to this issue. Regards, Orit > Kind regards, > > Ben Morrice > > __ > Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 > EPFL / BBP > Biotech Campus > Chemin des Mines 9 > 1202 Geneva > Switzerland > > On 21/04/17 09:36, Ben Morrice wrote: >> >> Hello Orit, >> >> Please find attached the output from the radosgw commands and the relevant >> section from ceph.conf (radosgw) >> >> bbp-gva-master is running 10.2.5 >> >> bbp-gva-secondary is running 10.2.7 >> >> Kind regards, >> >> Ben Morrice >> >> __ >> Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 >> EPFL / BBP >> Biotech Campus >> Chemin des Mines 9 >> 1202 Geneva >> Switzerland >> >> On 21/04/17 07:55, Orit Wasserman wrote: >>> >>> Hi Ben, >>> >>> On Thu, Apr 20, 2017 at 6:08 PM, Ben Morrice wrote: Hi all, I have tried upgrading one of our RGW servers from 10.2.5 to 10.2.7 (RHEL7) and authentication is in a very bad state. This installation is part of a multigw configuration, and I have just updated one host in the secondary zone (all other hosts/zones are running 10.2.5). On the 10.2.7 server I cannot authenticate as a user (normally backed by OpenStack Keystone), but even worse I can also not authenticate with an admin user. Please see [1] for the results of performing a list bucket operation with python boto (script works against rgw 10.2.5) Also, if I try to authenticate from the 'master' rgw zone with a "radosgw-admin sync status --rgw-zone=bbp-gva-master" I get: "ERROR: failed to fetch datalog info" "failed to retrieve sync info: (13) Permission denied" The above errors correlates to the errors in the log on the server running 10.2.7 (debug level 20) at [2] I'm not sure what I have done wrong or can try next? By the way, downgrading the packages from 10.2.7 to 10.2.5 returns authentication functionality >>> >>> Can you provide the following info: >>> radosgw-admin period get >>> radsogw-admin zonegroup get >>> radsogw-admin zone get >>> >>> Can you provide your ceph.conf? >>> >>> Thanks, >>> Orit >>> [1] boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden >>> encoding="UTF-8"?>SignatureDoesNotMatchtx4-0058f8c86a-3fa2959-bbp-gva-secondary3fa2959-bbp-gva-secondary-bbp-gva [2] /bbpsrvc15.cscs.ch/admin/log 2017-04-20 16:43:04.916253 7ff87c6c0700 15 calculated digest=Ofg/f/NI0L4eEG1MsGk4PsVscTM= 2017-04-20 16:43:04.916255 7ff87c6c0700 15 auth_sign=qZ3qsy7AuNCOoPMhr8yNoy5qMKU= 2017-04-20 16:43:04.916255 7ff87c6c0700 15 compare=34 2017-04-20 16:43:04.916266 7ff87c6c0700 10 failed to authorize request 2017-04-20 16:43:04.916268 7ff87c6c0700 20 handler->ERRORHANDLER: err_no=-2027 new_err_no=-2027 2017-04-20 16:43:04.916329 7ff87c6c0700 2 req 354:0.052585:s3:GET /admin/log:get_obj:op status=0 2017-04-20 16:43:04.916339 7ff87c6c0700 2 req 354:0.052595:s3:GET /admin/log:get_obj:http status=403 2017-04-20 16:43:04.916343 7ff87c6c0700 1 == req done req=0x7ff87c6ba710 op status=0 http_status=403 == 2017-04-20 16:43:04.916350 7ff87c6c0700 20 process_request() returned -2027 2017-04-20 16:43:04.916390 7ff87c6c0700 1 civetweb: 0x7ff990015610: 10.80.6.26 - - [20/Apr/2017:16:43:04 +0200] "GET /admin/log HTTP/1.1" 403 0 - - 2017-04-20 16:43:04.917212 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff9703a5440:18RGWMetaSyncShardCR: operate() 2017-04-20 16:43:04.917223 7ff9777e6700 20 rgw meta sync: incremental_sync:1544: shard_id=20 mdlog_marker=1_1492686039.901886_5551978.1 sync_marker.marker=1_1492686039.901886_5551978.1 period_marker= 2017-04-20 16:43:04.917227 7ff9777e6700 20 rgw meta sync: incremental_sync:1551: shard_id=20 syncing mdlog for shard_id=20 2017-04-20 16:43:04.917236 7ff9777e6700 20 cr:s=0x7ff97000d420:op=0x7ff970066b80:24RGWCloneMetaLogCoroutine: operate() 2017-04-20 16:43:04.917238 7ff9777e6700 20 rgw meta sync: operate: shard_id=20: init r
[ceph-users] CEPH MON Updates Live
Hey, Quick question hopefully have tried a few Google searches but noting concrete. I am running KVM VM's using KRBD, if I add and remove CEPH mon's are the running VM's updated with this information. Or do I need to reboot the VM's for them to be provided with the change of MON's? Thanks! Sent from my iPhone ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Maintaining write performance under a steady intake of small objects
Hi everyone, so this will be a long email — it's a summary of several off-list conversations I've had over the last couple of weeks, but the TL;DR version is this question: How can a Ceph cluster maintain near-constant performance characteristics while supporting a steady intake of a large number of small objects? This is probably a very common problem, but we have a bit of a dearth of truly adequate best practices for it. To clarify, what I'm talking about is an intake on the order of millions per hour. That might sound like a lot, but if you consider an intake of 700 objects/s at 20 KiB/object, that's just 14 MB/s. That's not exactly hammering your cluster — but it amounts to 2.5 million objects created per hour. Under those circumstances, two things tend to happen: (1) There's a predictable decline in insert bandwidth. In other words, a cluster that may allow inserts at a rate of 2.5M/hr rapidly goes down to 1.8M/hr and then 1.7M/hr ... and by "rapidly" I mean hours, not days. As I understand it, this is mainly due to the FileStore's propensity to index whole directories with a readdir() call which is an linear-time operation. (2) FileStore's mitigation strategy for this is to proactively split directories so they never get so large as for readdir() to become a significant bottleneck. That's fine, but in a cluster with a steadily growing number of objects, that tends to lead to lots and lots of directory splits happening simultanously — causing inserts to slow to a crawl. For (2) there is a workaround: we can initialize a pool with an expected number of objects, set a pool max_objects quota, and disable on-demand splitting altogether by setting a negative filestore merge threshold. That way, all splitting occurs at pool creation time, and before another split were to happen, you hit the pool quota. So you never hit that brick wall causes by the thundering herd of directory splits. Of course, it also means that when you want to insert yet more objects, you need another pool — but you can handle that at the application level. It's actually a bit of a dilemma: we want directory splits to happen proactively, so that readdir() doesn't slow things down, but then we also *don't* want them to happen, because while they do, inserts flatline. (2) will likely be killed off completely by BlueStore, because there are no more directories, hence nothing to split. For (1) there really isn't a workaround that I'm aware of for FileStore. And at least preliminary testing shows that BlueStore clusters suffer from similar, if not the same, performance degradation (although, to be fair, I haven't yet seen tests under the above parameters with rocksdb and WAL on NVMe hardware). For (1) however I understand that there would be a potential solution in FileStore itself, by throwing away Ceph's own directory indexing and just rely on flat directory lookups — which should be logarithmic-time operations in both btrfs and XFS, as both use B-trees for directory indexing. But I understand that that would be a fairly massive operation that looks even less attractive to undertake with BlueStore around the corner. One suggestion that has been made (credit to Greg) was to do object packing, i.e. bunch up a lot of discrete data chunks into a single RADOS object. But in terms of distribution and lookup logic that would have to be built on top, that seems weird to me (CRUSH on top of CRUSH to find out which RADOS object a chunk belongs to, or some such?) So I'm hoping for the likes of Wido and Dan and Mark to have some alternate suggestions here: what's your take on this? Do you have suggestions for people with a constant intake of small objects? Looking forward to hearing your thoughts. Cheers, Florian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] hung rbd requests for one pool
One guest VM on my test cluster has hung for more than 24 hours while running a fio test on an RBD device, but other VMs accessing other images in the same pool are fine. I was able to reproduce the problem by running “rbd info” on the same pool as the stuck VM with some debug tracing on (see log below). How can I narrow this down further or resolve the problem? Here are a few details about the cluster: ceph version 10.2.7 Three monitors and six OSD nodes with three OSDs each Each OSD has one SSD with separate partitions for the journal and data, using XFS Clients are KVM guests using rbd devices with virtio Cluster is healthy: ceph7:~$ sudo ceph status cluster 876a19e2-7f61-4774-a6b3-eaab4004f45f health HEALTH_OK monmap e1: 3 mons at {a=192.168.206.10:6789/0,b=192.168.206.11:6789/0,c=192.168.206.12:6789/0} election epoch 6, quorum 0,1,2 a,b,c osdmap e27: 18 osds: 18 up, 18 in flags sortbitwise,require_jewel_osds pgmap v240894: 576 pgs, 2 pools, 416 GB data, 104 kobjects 1248 GB used, 2606 GB / 3854 GB avail 576 active+clean client io 2548 kB/s rd, 2632 kB/s wr, 493 op/s rd, 1121 op/s wr Log output from “rbd info” on the client node (not in a VM): ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1 2017-04-24 11:30:57.048750 7f55365c5d40 1 -- :/0 messenger.start 2017-04-24 11:30:57.049223 7f55365c5d40 1 -- :/3282647735 --> 192.168.206.11:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x55c254e1ccc0 con 0x55c254e17850 2017-04-24 11:30:57.050077 7f55365bd700 1 -- 192.168.206.17:0/3282647735 learned my addr 192.168.206.17:0/3282647735 2017-04-24 11:30:57.051040 7f551a627700 1 -- 192.168.206.17:0/3282647735 <== mon.1 192.168.206.11:6789/0 1 mon_map magic: 0 v1 473+0+0 (2270207254 0 0) 0x7f550b80 con 0x55c254e17850 2017-04-24 11:30:57.051148 7f551a627700 1 -- 192.168.206.17:0/3282647735 <== mon.1 192.168.206.11:6789/0 2 auth_reply(proto 2 0 (0) Success) v1 33+0+0 (2714966539 0 0) 0x7f551040 con 0x55c254e17850 2017-04-24 11:30:57.051328 7f551a627700 1 -- 192.168.206.17:0/3282647735 --> 192.168.206.11:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7f5504001860 con 0x55c254e17850 2017-04-24 11:30:57.052239 7f551a627700 1 -- 192.168.206.17:0/3282647735 <== mon.1 192.168.206.11:6789/0 3 auth_reply(proto 2 0 (0) Success) v1 206+0+0 (3323982069 0 0) 0x7f551040 con 0x55c254e17850 2017-04-24 11:30:57.052399 7f551a627700 1 -- 192.168.206.17:0/3282647735 --> 192.168.206.11:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 0x7f5504003370 con 0x55c254e17850 2017-04-24 11:30:57.053313 7f551a627700 1 -- 192.168.206.17:0/3282647735 <== mon.1 192.168.206.11:6789/0 4 auth_reply(proto 2 0 (0) Success) v1 393+0+0 (1107778031 0 0) 0x7f5508c0 con 0x55c254e17850 2017-04-24 11:30:57.053415 7f551a627700 1 -- 192.168.206.17:0/3282647735 --> 192.168.206.11:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x55c254e1d290 con 0x55c254e17850 2017-04-24 11:30:57.053477 7f55365c5d40 1 -- 192.168.206.17:0/3282647735 --> 192.168.206.11:6789/0 -- mon_subscribe({osdmap=0}) v2 -- ?+0 0x55c254e12df0 con 0x55c254e17850 2017-04-24 11:30:57.053851 7f551a627700 1 -- 192.168.206.17:0/3282647735 <== mon.1 192.168.206.11:6789/0 5 mon_map magic: 0 v1 473+0+0 (2270207254 0 0) 0x7f551360 con 0x55c254e17850 2017-04-24 11:30:57.054058 7f551a627700 1 -- 192.168.206.17:0/3282647735 <== mon.1 192.168.206.11:6789/0 6 osd_map(27..27 src has 1..27) v3 13035+0+0 (2602332718 0 0) 0x7f550cc0 con 0x55c254e17850 2017-04-24 11:30:57.054376 7f55365c5d40 5 librbd::AioImageRequestWQ: 0x55c254e21c10 : ictx=0x55c254e20760 2017-04-24 11:30:57.054498 7f55365c5d40 20 librbd::ImageState: 0x55c254e19330 open 2017-04-24 11:30:57.054503 7f55365c5d40 10 librbd::ImageState: 0x55c254e19330 0x55c254e19330 send_open_unlock 2017-04-24 11:30:57.054512 7f55365c5d40 10 librbd::image::OpenRequest: 0x55c254e22590 send_v2_detect_header 2017-04-24 11:30:57.054632 7f55365c5d40 1 -- 192.168.206.17:0/3282647735 --> 192.168.206.13:6802/22690 -- osd_op(client.4375.0:1 1.ba46737 rbd_id.image1 [stat] snapc 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x55c254e25d00 con 0x55c254e248d0 2017-04-24 11:30:57.056830 7f5518421700 1 -- 192.168.206.17:0/3282647735 <== osd.10 192.168.206.13:6802/22690 1 osd_op_reply(1 rbd_id.image1 [stat] v0'0 uv7 ondisk = 0) v7 133+0+16 (2025423138 0 1760854024) 0x7f54fb40 con 0x55c254e248d0 2017-04-24 11:30:57.056949 7f5512ffd700 10 librbd::image::OpenRequest: handle_v2_detect_header: r=0 2017-04-24 11:30:57.056965 7f5512ffd700 10 librbd::image::OpenRequest: 0x55c254e22590 send_v2_get_id 2017-04-24 11:30:57.057026 7f5512ffd700 1 -- 192.168.206.17:0/3282647735 --> 192.168.206.13:6802/22690 -- osd_op(client.4375.0:2 1.ba46737 rbd_id.image1 [call rbd.get_id] snapc 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f54f40021f0
Re: [ceph-users] hung rbd requests for one pool
On Mon, Apr 24, 2017 at 2:53 PM, Phil Lacroute wrote: > 2017-04-24 11:30:57.058233 7f5512ffd700 1 -- 192.168.206.17:0/3282647735 > --> 192.168.206.13:6804/22934 -- osd_op(client.4375.0:3 1.af6f1e38 > rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc > 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f54f40077f0 con > 0x7f54f40064e0 You can attempt to run "ceph daemon osd.XYZ ops" against the potentially stuck OSD to figure out what it's stuck doing. -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Package Repo on Ubuntu Precise(12.04) is broken
On Mon, Apr 24, 2017 at 10:05 AM, Alfredo Deza wrote: > On Mon, Apr 24, 2017 at 8:53 AM, Alfredo Deza wrote: >> On Mon, Apr 24, 2017 at 2:41 AM, Xiaoxi Chen wrote: >>> Well, I can definitely build my own, >>> >>> 1. Precise is NOT EOL on Hammer release, which was confirmed in >>> previous mail thread. So we still need to maintain point-in-time >>> hammer package for end users. >> >> Ceph Hammer is EOL >> >>> >>> 2. It is NOT ONLY missing 0.94.10, instead, as how we organize the >>> repo index(only contains latest package in index), now all 0.94.x >>> package on precise are not installable via apt. Can you try now? 0.94.10 for precise was just pushed out. Let me know if you get into any issues >>> >> >> I think this may be because we didn't built precise for 0.94.10 and >> Debian repositories do not support multi-version packages. So although >> other versions are there, but the latest one isn't, the repository >> acts as there is nothing for Precise. >> >> I would suggest an upgrade to a newer Ceph version at this point, >> although Precise isn't built for any newer Ceph versions, so >> effectively you are looking at >> upgrading to a newer OS as well. >> >>> >>> 2017-04-24 14:02 GMT+08:00 xiaoguang fan : If you need this deb package 0.94.10 on precise(12.04), I think you can build it by yourself, you can use the script make_deps.sh 2017-04-24 11:35 GMT+08:00 Xiaoxi Chen : > > Hi, > > The 0.94.10 packages were not build for Ubuntu Precise, till now. > What is worse, the dist discription > > (http://download.ceph.com/debian-hammer/dists/precise/main/binary-amd64/Packages) > doesnt contains any ceph core pacages. > > It make Precise user unable provision their ceph cluster/client. > Could anyone pls help to fix it? > > I've started to build the Precise packages for that version, hopefully > we can get something worked out today or tomorrow. You are right that > Precise wasn't EOL when > that version of Hammer was released, and this was an omission on our end. > > > > Xiaoxi > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] hung rbd requests for one pool
Jason, Thanks for the suggestion. That seems to show it is not the OSD that got stuck: ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1 … 2017-04-24 13:13:49.761076 7f739aefc700 1 -- 192.168.206.17:0/1250293899 --> 192.168.206.13:6804/22934 -- osd_op(client.4384.0:3 1.af6f1e38 rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f737c0077f0 con 0x7f737c0064e0 … 2017-04-24 13:14:04.756328 7f73a2880700 1 -- 192.168.206.17:0/1250293899 --> 192.168.206.13:6804/22934 -- ping magic: 0 v1 -- ?+0 0x7f7374000fc0 con 0x7f737c0064e0 ceph0:~$ sudo ceph pg map 1.af6f1e38 osdmap e27 pg 1.af6f1e38 (1.38) -> up [11,16,2] acting [11,16,2] ceph3:~$ sudo ceph daemon osd.11 ops { "ops": [], "num_ops": 0 } I repeated this a few times and it’s always the same command and same placement group that hangs, but OSD11 has no ops (and neither do OSD16 and OSD2, although I think that’s expected). Is there other tracing I should do on the OSD or something more to look at on the client? Thanks, Phil > On Apr 24, 2017, at 12:39 PM, Jason Dillaman wrote: > > On Mon, Apr 24, 2017 at 2:53 PM, Phil Lacroute > wrote: >> 2017-04-24 11:30:57.058233 7f5512ffd700 1 -- 192.168.206.17:0/3282647735 >> --> 192.168.206.13:6804/22934 -- osd_op(client.4375.0:3 1.af6f1e38 >> rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc >> 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f54f40077f0 con >> 0x7f54f40064e0 > > > You can attempt to run "ceph daemon osd.XYZ ops" against the > potentially stuck OSD to figure out what it's stuck doing. > > -- > Jason smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] hung rbd requests for one pool
On 04/24/17 22:23, Phil Lacroute wrote: > Jason, > > Thanks for the suggestion. That seems to show it is not the OSD that > got stuck: > > ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1 > … > 2017-04-24 13:13:49.761076 7f739aefc700 1 -- > 192.168.206.17:0/1250293899 --> 192.168.206.13:6804/22934 -- > osd_op(client.4384.0:3 1.af6f1e38 rbd_header.1058238e1f29 [call > rbd.get_size,call rbd.get_object_prefix] snapc 0=[] > ack+read+known_if_redirected e27) v7 -- ?+0 0x7f737c0077f0 con > 0x7f737c0064e0 > … > 2017-04-24 13:14:04.756328 7f73a2880700 1 -- > 192.168.206.17:0/1250293899 --> 192.168.206.13:6804/22934 -- ping > magic: 0 v1 -- ?+0 0x7f7374000fc0 con 0x7f737c0064e0 > > ceph0:~$ sudo ceph pg map 1.af6f1e38 > osdmap e27 pg 1.af6f1e38 (1.38) -> up [11,16,2] acting [11,16,2] > > ceph3:~$ sudo ceph daemon osd.11 ops > { > "ops": [], > "num_ops": 0 > } > > I repeated this a few times and it’s always the same command and same > placement group that hangs, but OSD11 has no ops (and neither do OSD16 > and OSD2, although I think that’s expected). > > Is there other tracing I should do on the OSD or something more to > look at on the client? > > Thanks, > Phil Does it still happen if you disable exclusive-lock, or maybe separately fast-diff and object-map? I have a similar problem where VMs with those 3 features hang and need kill -9, and without them, they never hang. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] hung rbd requests for one pool
Just to cover all the bases, is 192.168.206.13:6804 really associated with a running daemon for OSD 11? On Mon, Apr 24, 2017 at 4:23 PM, Phil Lacroute wrote: > Jason, > > Thanks for the suggestion. That seems to show it is not the OSD that got > stuck: > > ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1 > … > 2017-04-24 13:13:49.761076 7f739aefc700 1 -- 192.168.206.17:0/1250293899 > --> 192.168.206.13:6804/22934 -- osd_op(client.4384.0:3 1.af6f1e38 > rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc > 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f737c0077f0 con > 0x7f737c0064e0 > … > 2017-04-24 13:14:04.756328 7f73a2880700 1 -- 192.168.206.17:0/1250293899 > --> 192.168.206.13:6804/22934 -- ping magic: 0 v1 -- ?+0 0x7f7374000fc0 con > 0x7f737c0064e0 > > ceph0:~$ sudo ceph pg map 1.af6f1e38 > osdmap e27 pg 1.af6f1e38 (1.38) -> up [11,16,2] acting [11,16,2] > > ceph3:~$ sudo ceph daemon osd.11 ops > { > "ops": [], > "num_ops": 0 > } > > I repeated this a few times and it’s always the same command and same > placement group that hangs, but OSD11 has no ops (and neither do OSD16 and > OSD2, although I think that’s expected). > > Is there other tracing I should do on the OSD or something more to look at > on the client? > > Thanks, > Phil > > On Apr 24, 2017, at 12:39 PM, Jason Dillaman wrote: > > On Mon, Apr 24, 2017 at 2:53 PM, Phil Lacroute > wrote: > > 2017-04-24 11:30:57.058233 7f5512ffd700 1 -- 192.168.206.17:0/3282647735 > --> 192.168.206.13:6804/22934 -- osd_op(client.4375.0:3 1.af6f1e38 > rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc > 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f54f40077f0 con > 0x7f54f40064e0 > > > > You can attempt to run "ceph daemon osd.XYZ ops" against the > potentially stuck OSD to figure out what it's stuck doing. > > -- > Jason > > -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] hung rbd requests for one pool
Yes it is the correct IP and port: ceph3:~$ netstat -anp | fgrep 192.168.206.13:6804 tcp0 0 192.168.206.13:6804 0.0.0.0:* LISTEN 22934/ceph-osd I turned up the logging on the osd and I don’t think it received the request. However I also noticed a large number of TCP connections to that specific osd from the client (192.168.206.17) in CLOSE_WAIT state (131 to be exact). I think there may be a bug causing the osd not to close file descriptors. Prior to the hang I had been running tests continuously for several days so the osd process may have been accumulating open sockets. I’m still gathering information, but based on that is there anything specific that would be helpful to find the problem? Thanks, Phil > On Apr 24, 2017, at 5:01 PM, Jason Dillaman wrote: > > Just to cover all the bases, is 192.168.206.13:6804 really associated > with a running daemon for OSD 11? > > On Mon, Apr 24, 2017 at 4:23 PM, Phil Lacroute > wrote: >> Jason, >> >> Thanks for the suggestion. That seems to show it is not the OSD that got >> stuck: >> >> ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1 >> … >> 2017-04-24 13:13:49.761076 7f739aefc700 1 -- 192.168.206.17:0/1250293899 >> --> 192.168.206.13:6804/22934 -- osd_op(client.4384.0:3 1.af6f1e38 >> rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc >> 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f737c0077f0 con >> 0x7f737c0064e0 >> … >> 2017-04-24 13:14:04.756328 7f73a2880700 1 -- 192.168.206.17:0/1250293899 >> --> 192.168.206.13:6804/22934 -- ping magic: 0 v1 -- ?+0 0x7f7374000fc0 con >> 0x7f737c0064e0 >> >> ceph0:~$ sudo ceph pg map 1.af6f1e38 >> osdmap e27 pg 1.af6f1e38 (1.38) -> up [11,16,2] acting [11,16,2] >> >> ceph3:~$ sudo ceph daemon osd.11 ops >> { >>"ops": [], >>"num_ops": 0 >> } >> >> I repeated this a few times and it’s always the same command and same >> placement group that hangs, but OSD11 has no ops (and neither do OSD16 and >> OSD2, although I think that’s expected). >> >> Is there other tracing I should do on the OSD or something more to look at >> on the client? >> >> Thanks, >> Phil >> >> On Apr 24, 2017, at 12:39 PM, Jason Dillaman wrote: >> >> On Mon, Apr 24, 2017 at 2:53 PM, Phil Lacroute >> wrote: >> >> 2017-04-24 11:30:57.058233 7f5512ffd700 1 -- 192.168.206.17:0/3282647735 >> --> 192.168.206.13:6804/22934 -- osd_op(client.4375.0:3 1.af6f1e38 >> rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc >> 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f54f40077f0 con >> 0x7f54f40064e0 >> >> >> >> You can attempt to run "ceph daemon osd.XYZ ops" against the >> potentially stuck OSD to figure out what it's stuck doing. >> >> -- >> Jason >> >> > > > > -- > Jason smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] hung rbd requests for one pool
I would double-check your file descriptor limits on both sides -- OSDs and the client. 131 sockets shouldn't make a difference. Port is open on any possible firewalls you have running? On Mon, Apr 24, 2017 at 8:14 PM, Phil Lacroute wrote: > Yes it is the correct IP and port: > > ceph3:~$ netstat -anp | fgrep 192.168.206.13:6804 > tcp0 0 192.168.206.13:6804 0.0.0.0:* LISTEN > 22934/ceph-osd > > I turned up the logging on the osd and I don’t think it received the > request. However I also noticed a large number of TCP connections to that > specific osd from the client (192.168.206.17) in CLOSE_WAIT state (131 to be > exact). I think there may be a bug causing the osd not to close file > descriptors. Prior to the hang I had been running tests continuously for > several days so the osd process may have been accumulating open sockets. > > I’m still gathering information, but based on that is there anything > specific that would be helpful to find the problem? > > Thanks, > Phil > > On Apr 24, 2017, at 5:01 PM, Jason Dillaman wrote: > > Just to cover all the bases, is 192.168.206.13:6804 really associated > with a running daemon for OSD 11? > > On Mon, Apr 24, 2017 at 4:23 PM, Phil Lacroute > wrote: > > Jason, > > Thanks for the suggestion. That seems to show it is not the OSD that got > stuck: > > ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1 > … > 2017-04-24 13:13:49.761076 7f739aefc700 1 -- 192.168.206.17:0/1250293899 > --> 192.168.206.13:6804/22934 -- osd_op(client.4384.0:3 1.af6f1e38 > rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc > 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f737c0077f0 con > 0x7f737c0064e0 > … > 2017-04-24 13:14:04.756328 7f73a2880700 1 -- 192.168.206.17:0/1250293899 > --> 192.168.206.13:6804/22934 -- ping magic: 0 v1 -- ?+0 0x7f7374000fc0 con > 0x7f737c0064e0 > > ceph0:~$ sudo ceph pg map 1.af6f1e38 > osdmap e27 pg 1.af6f1e38 (1.38) -> up [11,16,2] acting [11,16,2] > > ceph3:~$ sudo ceph daemon osd.11 ops > { >"ops": [], >"num_ops": 0 > } > > I repeated this a few times and it’s always the same command and same > placement group that hangs, but OSD11 has no ops (and neither do OSD16 and > OSD2, although I think that’s expected). > > Is there other tracing I should do on the OSD or something more to look at > on the client? > > Thanks, > Phil > > On Apr 24, 2017, at 12:39 PM, Jason Dillaman wrote: > > On Mon, Apr 24, 2017 at 2:53 PM, Phil Lacroute > wrote: > > 2017-04-24 11:30:57.058233 7f5512ffd700 1 -- 192.168.206.17:0/3282647735 > --> 192.168.206.13:6804/22934 -- osd_op(client.4375.0:3 1.af6f1e38 > rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc > 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f54f40077f0 con > 0x7f54f40064e0 > > > > You can attempt to run "ceph daemon osd.XYZ ops" against the > potentially stuck OSD to figure out what it's stuck doing. > > -- > Jason > > > > > > -- > Jason > > -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph built from source, can't start ceph-mon
Anyone? On Sat, Apr 22, 2017 at 12:33 PM, Henry Ngo wrote: > I followed the install doc however after deploying the monitor, the doc > states to start the mon using Upstart. I learned through digging around > that the Upstart package is not installed using Make Install so it won't > work. I tried running "ceph-mon -i [host]" and it gives an error. Any ideas? > > http://paste.openstack.org/show/607588/ > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] All osd slow response / blocked requests upon single disk failure
Dear ceph users, I am running the following setup:- - 6 x osd servers (centos 7, mostly HP DL180se G6 with SA P410 controllers) - Each osd server has 1-2 SSD journals, each handling ~5 7.2k SATA RE disks - ceph-0.94.10 Normal operations work OK, however when a single disk failed (or abrupt 'ceph osd down'), all osds other than the ones inside the downed osd experienced slow response and blocked requests (some more than others). For example:- 2017-04-24 15:59:58.734235 7f2a62338700 0 log_channel(cluster) log [WRN] : slow request 30.571582 seconds old, received at 2017-04-24 15:59:28.162572: osd_op(client.11870166.0:118068448 rbd_data.42d93b436c6125.0577 [sparse-read 8192~4096] 1.a6422b98 ack+read e48964) currently reached_pg 2017-04-24 15:59:58.734241 7f2a62338700 0 log_channel(cluster) log [WRN] : slow request 30.569605 seconds old, received at 2017-04-24 15:59:28.164550: osd_op(client.11870166.0:118068449 rbd_data.42d93b436c6125.0577 [sparse-read 40960~8192] 1.a6422b98 ack+read e48964) currently reached_pg In contrast, a normal planned 'ceph osd in' or 'ceph osd out' from a healthy state work OK and doesn't block requests. References:- - ceph osd tree (osd.34 @ osd10 down) : https://pastebin.com/s1AaNJM1 - ceph -s (when healthy): https://pastebin.com/h0NLgbG0 - osd cluster performance during rebuild @ 15:45 - 17:30 : https://imagebin.ca/v/3KEsK0pGeOR3 - osd cluster i/o wait during rebuild @ 15:45 - 17:30 : https://imagebin.ca/v/3KErkQ4KC8sv So far I have tried reducing rebuild priority as follows, but to no avail:- ceph tell osd.* injectargs '--osd-max-backfills 1' ceph tell osd.* injectargs '--osd-recovery-max-active 1' ceph tell osd.* injectargs '--osd-recovery-op-priority 1' ceph tell osd.* injectargs '--osd-client-op-priority 63' Is this a case of some slow osd dragging others? Or my setup / hardware is substandard? Any pointers on what I should look into next, would be greatly appreciated - thanks. -- --sazli ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Package Repo on Ubuntu Precise(12.04) is broken
Hi Xiaoxi Just wanna to confirm again, according to the definition of "LTS" in ceph, Hammer suppose not EOL till Luminous is released, This is correct. before that, can we expecting hammer upgrades and packages on Precise/Other old OS will still be provided? We have all our server side ceph cluster on Jewel but the pain point is there are still a few thousands hypervisors still on Ubuntu 12.04 , thus have to maintain hammer for these old stuffs. Luminous release (and, hence, hammer EOL) is very close. Now would be a good time to test the upgrade and let us know which hammer fixes you need, if any. Nathan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com