I set it to 100, then restarted osd26, but after recovery everything is as it was before.
On Sat, 18 Feb 2017, Shinobu Kinjo wrote: > You may need to increase ``choose_total_tries`` to more than 50 > (default) up to 100. > > - > http://docs.ceph.com/docs/master/rados/operations/crush-map/#editing-a-crush-map > > - https://github.com/ceph/ceph/blob/master/doc/man/8/crushtool.rst > > On Sat, Feb 18, 2017 at 5:25 AM, Matyas Koszik <kos...@atw.hu> wrote: > > > > I have size=2 and 3 independent nodes. I'm happy to try firefly tunables, > > but a bit scared that it would make things even worse. > > > > > > On Fri, 17 Feb 2017, Gregory Farnum wrote: > > > >> Situations that are stable lots of undersized PGs like this generally > >> mean that the CRUSH map is failing to allocate enough OSDs for certain > >> PGs. The log you have says the OSD is trying to NOTIFY the new primary > >> that the PG exists here on this replica. > >> > >> I'd guess you only have 3 hosts and are trying to place all your > >> replicas on independent boxes. Bobtail tunables have trouble with that > >> and you're going to need to pay the cost of moving to more modern > >> ones. > >> -Greg > >> > >> On Fri, Feb 17, 2017 at 5:30 AM, Matyas Koszik <kos...@atw.hu> wrote: > >> > > >> > > >> > I'm not sure what variable should I be looking at exactly, but after > >> > reading through all of them I don't see anyting supsicious, all values > >> > are > >> > 0. I'm attaching it anyway, in case I missed something: > >> > https://atw.hu/~koszik/ceph/osd26-perf > >> > > >> > > >> > I tried debugging the ceph pg query a bit more, and it seems that it > >> > gets stuck communicating with the mon - it doesn't even try to connect to > >> > the osd. This is the end of the log: > >> > > >> > 13:36:07.006224 sendmsg(3, {msg_name(0)=NULL, msg_iov(4)=[{"\7", 1}, > >> > {"\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\17\0\177\0\2\0\27\0\0\0\0\0\0\0\0\0"..., > >> > 53}, {"\1\0\0\0\6\0\0\0osdmap9\4\1\0\0\0\0\0\1", 23}, > >> > {"\255UC\211\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1", 21}], msg_controllen=0, > >> > msg_flags=0}, MSG_NOSIGNAL) = 98 > >> > 13:36:07.207010 recvfrom(3, "\10\6\0\0\0\0\0\0\0", 4096, MSG_DONTWAIT, > >> > NULL, NULL) = 9 > >> > 13:36:09.963843 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, > >> > {"9\356\246X\245\330r9", 8}], msg_controllen=0, msg_flags=0}, > >> > MSG_NOSIGNAL) = 9 > >> > 13:36:09.964340 recvfrom(3, "\0179\356\246X\245\330r9", 4096, > >> > MSG_DONTWAIT, NULL, NULL) = 9 > >> > 13:36:19.964154 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, > >> > {"C\356\246X\24\226w9", 8}], msg_controllen=0, msg_flags=0}, > >> > MSG_NOSIGNAL) = 9 > >> > 13:36:19.964573 recvfrom(3, "\17C\356\246X\24\226w9", 4096, > >> > MSG_DONTWAIT, NULL, NULL) = 9 > >> > 13:36:29.964439 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, > >> > {"M\356\246X|\353{9", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) > >> > = 9 > >> > 13:36:29.964938 recvfrom(3, "\17M\356\246X|\353{9", 4096, MSG_DONTWAIT, > >> > NULL, NULL) = 9 > >> > > >> > ... and this goes on for as long as I let it. When I kill it, I get this: > >> > RuntimeError: "None": exception "['{"prefix": > >> > "get_command_descriptions", "pgid": "6.245"}']": exception 'int' object > >> > is not iterable > >> > > >> > I restarted (again) osd26 with max debugging; after grepping for 6.245, > >> > this is the log I get: > >> > https://atw.hu/~koszik/ceph/ceph-osd.26.log.6245 > >> > > >> > Matyas > >> > > >> > > >> > On Fri, 17 Feb 2017, Tomasz Kuzemko wrote: > >> > > >> >> If the PG cannot be queried I would bet on OSD message throttler. Check > >> >> with "ceph --admin-daemon PATH_TO_ADMIN_SOCK perf dump" on each OSD > >> >> which is holding this PG if message throttler current value is not > >> >> equal max. If it is, increase the max value in ceph.conf and restart > >> >> OSD. > >> >> > >> >> -- > >> >> Tomasz Kuzemko > >> >> tomasz.kuze...@corp.ovh.com > >> >> > >> >> Dnia 17.02.2017 o godz. 01:59 Matyas Koszik <kos...@atw.hu> napisaÄš > >> >> (a): > >> >> > >> >> > > >> >> > Hi, > >> >> > > >> >> > It seems that my ceph cluster is in an erroneous state of which I > >> >> > cannot > >> >> > see right now how to get out of. > >> >> > > >> >> > The status is the following: > >> >> > > >> >> > health HEALTH_WARN > >> >> > 25 pgs degraded > >> >> > 1 pgs stale > >> >> > 26 pgs stuck unclean > >> >> > 25 pgs undersized > >> >> > recovery 23578/9450442 objects degraded (0.249%) > >> >> > recovery 45/9450442 objects misplaced (0.000%) > >> >> > crush map has legacy tunables (require bobtail, min is firefly) > >> >> > monmap e17: 3 mons at x > >> >> > election epoch 8550, quorum 0,1,2 store1,store3,store2 > >> >> > osdmap e66602: 68 osds: 68 up, 68 in; 1 remapped pgs > >> >> > flags require_jewel_osds > >> >> > pgmap v31433805: 4388 pgs, 8 pools, 18329 GB data, 4614 kobjects > >> >> > 36750 GB used, 61947 GB / 98697 GB avail > >> >> > 23578/9450442 objects degraded (0.249%) > >> >> > 45/9450442 objects misplaced (0.000%) > >> >> > 4362 active+clean > >> >> > 24 active+undersized+degraded > >> >> > 1 stale+active+undersized+degraded+remapped > >> >> > 1 active+remapped > >> >> > > >> >> > > >> >> > I tried restarting all OSDs, to no avail, it actually made things a > >> >> > bit > >> >> > worse. > >> >> > From a user point of view the cluster works perfectly (apart from that > >> >> > stale pg, which fortunately hit the pool on which I keep swap images > >> >> > only). > >> >> > > >> >> > A little background: I made the mistake of creating the cluster with > >> >> > size=2 pools, which I'm now in the process of rectifying, but that > >> >> > requires some fiddling around. I also tried moving to more optimal > >> >> > tunables (firefly), but the documentation is a bit optimistic > >> >> > with the 'up to 10%' data movement - it was over 50% in my case, so I > >> >> > reverted to bobtail immediately after I saw that number. I then > >> >> > started > >> >> > reweighing the osds in anticipation of the size=3 bump, and I think > >> >> > that's > >> >> > when this bug hit me. > >> >> > > >> >> > Right now I have a pg (6.245) that cannot even be queried - the > >> >> > command > >> >> > times out, or gives this output: https://atw.hu/~koszik/ceph/pg6.245 > >> >> > > >> >> > I queried a few other pgs that are acting up, but cannot see anything > >> >> > suspicious, other than the fact they do not have a working peer: > >> >> > https://atw.hu/~koszik/ceph/pg4.2ca > >> >> > https://atw.hu/~koszik/ceph/pg4.2e4 > >> >> > > >> >> > Health details can be found here: https://atw.hu/~koszik/ceph/health > >> >> > OSD tree: https://atw.hu/~koszik/ceph/tree (here the weight sum of > >> >> > ssd/store3_ssd seems to be off, but that has been the case for quite > >> >> > some > >> >> > time - not sure if it's related to any of this) > >> >> > > >> >> > > >> >> > I tried setting debugging to 20/20 on some of the affected osds, but > >> >> > there > >> >> > was nothing there that gave me any ideas on solving this. How should I > >> >> > continue debugging this issue? > >> >> > > >> >> > BTW, I'm runnig 10.2.5 on all of my osd/mon nodes. > >> >> > > >> >> > Thanks, > >> >> > Matyas > >> >> > > >> >> > > >> >> > _______________________________________________ > >> >> > ceph-users mailing list > >> >> > ceph-users@lists.ceph.com > >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> >> > >> > > >> > _______________________________________________ > >> > ceph-users mailing list > >> > ceph-users@lists.ceph.com > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com