For those following along at home, already done: http://tracker.ceph.com/issues/22440 <http://tracker.ceph.com/issues/22440#change-103779>
On Fri, Dec 15, 2017 at 1:57 AM Brad Hubbard <bhubb...@redhat.com> wrote: > On Wed, Dec 13, 2017 at 11:39 PM, Nick Fisk <n...@fisk.me.uk> wrote: > > Boom!! Fixed it. Not sure if the behavior I stumbled from is correct, but > > this has a potential to break a few things for people moving from Jewel > to > > Luminous if they potentially had a few too many PG’s. > > > > > > > > Firstly, how I stumbled across it. I whacked the logging up to max on > OSD 68 > > and saw this mentioned in the logs > > > > > > > > osd.68 106454 maybe_wait_for_max_pg withhold creation of pg 0.1cf: 403 >= > > 400 > > > > > > > > This made me search through the code for this warning string > > > > > > > > https://github.com/ceph/ceph/blob/master/src/osd/OSD.cc#L4221 > > > > > > > > Which jogged my memory about the changes in Luminous regarding max PG’s > > warning, and in particular these two config options > > > > mon_max_pg_per_osd > > > > osd_max_pg_per_osd_hard_ratio > > > > > > > > In my cluster I have just over 200 PG’s per OSD, but the node with OSD.68 > > in, has 8TB disks instead of 3TB for the rest of the cluster. This means > > these OSD’s were taking a lot more PG’s than the average would suggest. > So > > in Luminous 200x2 gives a hard limit of 400, which is what that error > > message in the log suggests is the limit. I set the > > osd_max_pg_per_osd_hard_ratio option to 3 and restarted the OSD and hey > > presto everything fell into line. > > > > > > > > Now a question. I get the idea around these settings to stop making too > many > > or pools with too many PG’s. But is it correct they can break an existing > > pool which is maybe making the new PG on an OSD due to CRUSH layout being > > modified? > > It would be good to capture this in a tracker Nick so it can be > explored in more depth. > > > > > > > > > Nick > > > > > > > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > > Nick Fisk > > Sent: 13 December 2017 11:14 > > To: 'Gregory Farnum' <gfar...@redhat.com> > > Cc: 'ceph-users' <ceph-users@lists.ceph.com> > > Subject: Re: [ceph-users] Odd object blocking IO on PG > > > > > > > > > > > > On Tue, Dec 12, 2017 at 12:33 PM Nick Fisk <n...@fisk.me.uk> wrote: > > > > > >> That doesn't look like an RBD object -- any idea who is > >> "client.34720596.1:212637720"? > > > > So I think these might be proxy ops from the cache tier, as there are > also > > block ops on one of the cache tier OSD's, but this time it actually lists > > the object name. Block op on cache tier. > > > > "description": "osd_op(client.34720596.1:212637720 17.ae78c1cf > > 17:f3831e75:::rbd_data.15a5e20238e1f29.00000000000388ad:head > [set-alloc-hint > > object_size 4194304 write_size 4194304,write 2584576~16384] snapc 0=[] > > RETRY=2 ondisk+retry+write+known_if_redirected e104841)", > > "initiated_at": "2017-12-12 16:25:32.435718", > > "age": 13996.681147, > > "duration": 13996.681203, > > "type_data": { > > "flag_point": "reached pg", > > "client_info": { > > "client": "client.34720596", > > "client_addr": "10.3.31.41:0/2600619462", > > "tid": 212637720 > > > > I'm a bit baffled at the moment what's going. The pg query (attached) is > not > > showing in the main status that it has been blocked from peering or that > > there are any missing objects. I've tried restarting all OSD's I can see > > relating to the PG in case they needed a bit of a nudge. > > > > > > > > Did that fix anything? I don't see anything immediately obvious but I'm > not > > practiced in quickly reading that pg state output. > > > > > > > > What's the output of "ceph -s"? > > > > > > > > Hi Greg, > > > > > > > > No restarting OSD’s didn’t seem to help. But I did make some progress > late > > last night. By stopping OSD.68 the cluster unlocks itself and IO can > > progress. However as soon as it starts back up, 0.1cf and a couple of > other > > PG’s again get stuck in an activating state. If I out the OSD, either > with > > it up or down, then some other PG’s seem to get hit by the same problem > as > > CRUSH moves PG mappings around to other OSD’s. > > > > > > > > So there definitely seems to be some sort of weird peering issue > somewhere. > > I have seen a very similar issue before on this cluster where after > running > > the crush reweight script to balance OSD utilization, the weight got set > too > > low and PG’s were unable to peer. I’m not convinced this is what’s > happening > > here as all the weights haven’t changed, but I’m intending to explore > this > > further just in case. > > > > > > > > With 68 down > > > > pgs: 1071783/48650631 objects degraded (2.203%) > > > > 5923 active+clean > > > > 399 active+undersized+degraded > > > > 7 active+clean+scrubbing+deep > > > > 7 active+clean+remapped > > > > > > > > With it up > > > > pgs: 0.047% pgs not active > > > > 67271/48651279 objects degraded (0.138%) > > > > 15602/48651279 objects misplaced (0.032%) > > > > 6051 active+clean > > > > 273 active+recovery_wait+degraded > > > > 4 active+clean+scrubbing+deep > > > > 4 active+remapped+backfill_wait > > > > 3 activating+remapped > > > > active+recovering+degraded > > > > > > > > PG Dump > > > > ceph pg dump | grep activatin > > > > dumped all > > > > 2.389 0 0 0 0 0 0 > > 1500 1500 activating+remapped 2017-12-13 11:08:50.990526 > > 76271'34230 106239:160310 [68,60,58,59,29,23] 68 > > [62,60,58,59,29,23] 62 76271'34230 2017-12-13 > > 09:00:08.359690 76271'34230 2017-12-10 10:05:10.931366 > > > > 0.1cf 3947 0 0 0 0 16472186880 > > 1577 1577 activating+remapped 2017-12-13 11:08:50.641034 > > 106236'7512915 106239:6176548 [34,68,8] 34 > > [34,8,53] 34 106138'7512682 2017-12-13 10:27:37.400613 > > 106138'7512682 2017-12-13 10:27:37.400613 > > > > 2.210 0 0 0 0 0 0 > > 1500 1500 activating+remapped 2017-12-13 11:08:50.686193 > > 76271'33304 106239:96797 [68,67,34,36,16,15] 68 > > [62,67,34,36,16,15] 62 76271'33304 2017-12-12 > > 00:49:21.038437 76271'33304 2017-12-10 16:05:12.751425 > > > > > > > > > > > > > >> > >> On Tue, Dec 12, 2017 at 12:36 PM, Nick Fisk <n...@fisk.me.uk> wrote: > >> > Does anyone know what this object (0.ae78c1cf) might be, it's not your > >> > normal run of the mill RBD object and I can't seem to find it in the > >> > pool using rados --all ls . It seems to be leaving the 0.1cf PG stuck > >> > in an > >> > activating+remapped state and blocking IO. Pool 0 is just a pure RBD > >> > activating+pool > >> > with a cache tier above it. There is no current mention of unfound > >> > objects or any other obvious issues. > >> > > >> > There is some backfilling going on, on another OSD which was upgraded > >> > to bluestore, which was when the issue started. But I can't see any > >> > link in the PG dump with upgraded OSD. My only thought so far is to > >> > wait for this backfilling to finish and then deep-scrub this PG and > >> > see if that reveals anything? > >> > > >> > Thanks, > >> > Nick > >> > > >> > "description": "osd_op(client.34720596.1:212637720 0.1cf 0.ae78c1cf > >> > (undecoded) > >> > ondisk+retry+write+ignore_cache+ignore_overlay+known_if_redirected > >> > e105014)", > >> > "initiated_at": "2017-12-12 17:10:50.030660", > >> > "age": 335.948290, > >> > "duration": 335.948383, > >> > "type_data": { > >> > "flag_point": "delayed", > >> > "events": [ > >> > { > >> > "time": "2017-12-12 17:10:50.030660", > >> > "event": "initiated" > >> > }, > >> > { > >> > "time": "2017-12-12 17:10:50.030692", > >> > "event": "queued_for_pg" > >> > }, > >> > { > >> > "time": "2017-12-12 17:10:50.030719", > >> > "event": "reached_pg" > >> > }, > >> > { > >> > "time": "2017-12-12 17:10:50.030727", > >> > "event": "waiting for peered" > >> > }, > >> > { > >> > "time": "2017-12-12 17:10:50.197353", > >> > "event": "reached_pg" > >> > }, > >> > { > >> > "time": "2017-12-12 17:10:50.197355", > >> > "event": "waiting for peered" > >> > > >> > _______________________________________________ > >> > ceph-users mailing list > >> > ceph-users@lists.ceph.com > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> > >> > >> -- > >> Jason > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > -- > Cheers, > Brad >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com