On Mon, Sep 17, 2018 at 8:21 AM Graham Allan <g...@umn.edu> wrote: > > > On 09/14/2018 02:38 PM, Gregory Farnum wrote: > > On Thu, Sep 13, 2018 at 3:05 PM, Graham Allan <g...@umn.edu> wrote: > >> > >> However I do see transfer errors fetching some files out of radosgw - > the > >> transfer just hangs then aborts. I'd guess this probably due to one pg > stuck > >> down, due to a lost (failed HDD) osd. I think there is no alternative to > >> declare the osd lost, but I wish I understood better the implications > of the > >> "recovery_state" and "past_intervals" output by ceph pg query: > >> https://pastebin.com/8WrYLwVt > > > > What are you curious about here? The past intervals is listing the > > OSDs which were involved in the PG since it was last clean, then each > > acting set and the intervals it was active for. > > That's pretty much what I'm looking for, and that the pg can roll back > to an earlier interval if there were no writes, and the current osd has > to be declared lost. > > >> I find it disturbing/odd that the acting set of osds lists only 3/6 > >> available; implies that without getting one of these back it would be > >> impossible to recover the data (from 4+2 EC). However the dead osd 98 > only > >> appears in the most recent (?) interval - presumably during the flapping > >> period, during which time client writes were unlikely (radosgw > disabled). > >> > >> So if 98 were marked lost would it roll back to the prior interval? I > am not > >> certain how to interpret this information! > > > > Yes, that’s what should happen if it’s all as you outline here. > > > > It *is* quite curious that the PG apparently went active with only 4 > > members in a 4+2 system — it's supposed to require at least k+1 (here, > > 5) by default. Did you override the min_size or something? > > -Greg > > Looking back through history it seems that I *did* override the min_size > for this pool, however I didn't reduce it - it used to have min_size 2! > That made no sense to me - I think it must be an artifact of a very > early (hammer?) ec pool creation, but it pre-dates me. > > I found the documentation on what min_size should be a bit confusing > which is how I arrived at 4. Fully agree that k+1=5 makes way more sense. > > I don't think I was the only one confused by this though, eg > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026445.html > > I suppose the safest thing to do is update min_size->5 right away to > force any size-4 pgs down until they can perform recovery. I can set > force-recovery on these as well... >
Mmm, this is embarrassing but that actually doesn't quite work due to https://github.com/ceph/ceph/pull/24095, which has been on my task list but at the bottom for a while. :( So if your cluster is stable now I'd let it clean up and then change the min_size once everything is repaired. > > Is there any setting which can permit these pgs to fulfil reads while > refusing writes when active size=k? > No, that's unfortunately infeasible. -Greg
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com