On Oct 22, 2014, at 7:51 PM, Craig Lewis wrote: > On Wed, Oct 22, 2014 at 3:09 PM, Chris Kitzmiller <ckitzmil...@hampshire.edu> > wrote: >> On Oct 22, 2014, at 1:50 PM, Craig Lewis wrote: >>> Incomplete means "Ceph detects that a placement group is missing a >>> necessary period of history from its log. If you see this state, report a >>> bug, and try to start any failed OSDs that may contain the needed >>> information". >>> >>> In the PG query, it lists some OSDs that it's trying to probe: >>> "probing_osds": [ >>> "10", >>> "13", >>> "15", >>> "25"], >>> "down_osds_we_would_probe": [], >>> >>> Is one of those the OSD you replaced? If so, you might try ceph pg {pg-id} >>> mark_unfound_lost revert|delete. That command will lose data; it tells >>> Ceph to give up looking for data that it can't find, so you might want to >>> wait a bit. >> >> Yes. osd.10 was the OSD I replaced. :( I suspect that I didn't actually have >> any writes during this time and that a revert might leave me in an OK place. >> >> Looking at the query more closely I see that all of the peers (except >> osd.10) have the same value for >> last_update/last_complete/last_scrub/last_deep_scrub except that the peer >> entry on osd.10 has 0 values for everything. It's as if all my OSDs are >> believing in the ghost of this PG on osd.10. I'd like to revert I just want >> to make sure that I'm going to revert to the sane value and not the 0 value. > > I've never (successfully) used mark_unfound_lost, so I can't say exactly > what'll happen. revert should be what you need, but I don't know if it's > going to revert to the point in time before whatever hole in the history > happened, or if it will just give up on the portions of history that it > doesn't have.
Huh. So I tried `ceph pg 3.222 mark_unfound_lost revert` and it told me "pg has no unfound objects" and indeed: "num_objects_unfound": 0, On one of the peers, osd.25 (which isn't in the acting set now and was up+in the whole time) it reports: "stat_sum": { "num_bytes": 7080120320, "num_objects": 1697, "num_object_clones": 0, "num_object_copies": 3394, "num_objects_missing_on_primary": 0, "num_objects_degraded": 0, "num_objects_unfound": 0, "num_objects_dirty": 1697, "num_whiteouts": 0, "num_read": 72828, "num_read_kb": 8794722, "num_write": 32405, "num_write_kb": 11424120, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 1687, "num_bytes_recovered": 7038177280, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0}, So, is it the 10 objects which are dirty but not recovered which are giving me trouble? What can be done to correct these PGs? _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com