On Tue, Feb 13, 2018 at 8:41 AM Graham Allan <g...@umn.edu> wrote: > I'm replying to myself here, but it's probably worth mentioning that > after this started, I did bring back the failed host, though with "ceph > osd weight 0" to avoid more data movement. > > For inconsistent pgs containing unfound objects, the output of "ceph pg > <n> query" does then show the original osd being queried for objects, > and indeed if I dig through the filesystem I find the same 0-byte files > dated from 2015-2016. > > This strongly implies to me that data loss occurred a long time in the > past and is not related to the osd host going down - this only triggered > the problem being found.
I would assume that too, but unless you had scrubbing disabled then it should have been discovered long ago; I don’t understand how it could have stayed hidden. Did you change any other settings recently? Or, what is this EC pool being used for, and what are the EC settings? Having a bunch of empty files is not surprising if the objects are smaller than the chunk/stripe size — then just the primary and the parity locations would actually have data for them. > > Graham > > On 02/12/2018 06:26 PM, Graham Allan wrote: > > Hi, > > > > For the past few weeks I've been seeing a large number of pgs on our > > main erasure coded pool being flagged inconsistent, followed by them > > becoming active+recovery_wait+inconsistent with unfound objects. The > > cluster is currently running luminous 12.2.2 but has in the past also > > run its way through firefly, hammer and jewel. > > > > Here's a sample object from "ceph pg list_missing" (there are 150 > > unfound objects in this particular pg): > > > > ceph health detail shows: > >> pg 70.467 is stuck unclean for 1004525.715896, current state > >> active+recovery_wait+inconsistent, last acting [449,233,336,323,259,193] > > > > ceph pg 70.467 list_missing: > >> { > >> "oid": { > >> "oid": > >> > "default.323253.6_20150226/Downloads/linux-nvme-HEAD-5aa2ffa/include/config/via/fir.h", > >> > >> "key": "", > >> "snapid": -2, > >> "hash": 628294759, > >> "max": 0, > >> "pool": 70, > >> "namespace": "" > >> }, > >> "need": "73222'132227", > >> "have": "0'0", > >> "flags": "none", > >> "locations": [ > >> "193(5)", > >> "259(4)", > >> "449(0)" > >> ] > >> }, > > > > When I trace through the filesystem on each OSD, I find the associated > > file present on each OSD but with size 0 bytes. > > > > Interestingly, for the 3 OSDs for which "list_missing" shows locations > > above (193,259,449), the timestamp of the 0-byte file is recent (within > > last few weeks). For the other 3 OSDs (233,336,323), it's in the distant > > past (08/2015 and 02/2016). > > > > All the unfound objects I've checked on this pg show the same pattern, > > along with the "have" epoch showing as "0'0". > > > > Other than the potential data loss being disturbing, I wonder why this > > showed up so suddenly? > > > > It seems to have been triggered by one OSD host failing over a long > > weekend. By the time we looked at it on Monday, the cluster had > > re-balanced enough data that I decided to simply leave it - we had long > > wanted to evacuate a first host to convert to a newer OS release, as > > well as Bluestore. Perhaps this was a bad choice, but the cluster > > recovery appeared to be proceeding normally, and was apparently complete > > a few days later. It was only around a week later that the unfound > > objects started. > > > > All the unfound object file fragments I've tracked down so far have > > their older members with timestamps in the same mid-2015 to mid-2016 > > period. I could be wrong but this really seems like a long-standing > > problem has just been unearthed. I wonder if it could be connected to > > this thread from early 2016, concerning a problem on the same cluster: > > > > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008120.html > > > > It's a long thread, but the 0-byte files sound very like the "orphaned > > files" in that thread - related to performing a directory split while > > handling links on a filename with the special long filename handling... > > > > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008317.html > > > > However unlike that thread, I'm not finding any other files with > > duplicate names in the hierarchy. > > > > I'm not sure there's much else I can do besides record the names of any > > unfound objects before resorting to "mark_unfound_lost delete" - any > > suggestions for further research? > > > > Thanks, > > > > Graham > > -- > Graham Allan > Minnesota Supercomputing Institute - g...@umn.edu > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com