On Tue, Feb 13, 2018 at 8:41 AM Graham Allan <g...@umn.edu> wrote:

> I'm replying to myself here, but it's probably worth mentioning that
> after this started, I did bring back the failed host, though with "ceph
> osd weight 0" to avoid more data movement.
>
> For inconsistent pgs containing unfound objects, the output of "ceph pg
> <n> query" does then show the original osd being queried for objects,
> and indeed if I dig through the filesystem I find the same 0-byte files
> dated from 2015-2016.
>
> This strongly implies to me that data loss occurred a long time in the
> past and is not related to the osd host going down - this only triggered
> the problem being found.


I would assume that too, but unless you had scrubbing disabled then it
should have been discovered long ago; I don’t understand how it could have
stayed hidden. Did you change any other settings recently?

Or, what is this EC pool being used for, and what are the EC settings?
Having a bunch of empty files is not surprising if the objects are smaller
than the chunk/stripe size — then just the primary and the parity locations
would actually have data for them.



>
> Graham
>
> On 02/12/2018 06:26 PM, Graham Allan wrote:
> > Hi,
> >
> > For the past few weeks I've been seeing a large number of pgs on our
> > main erasure coded pool being flagged inconsistent, followed by them
> > becoming active+recovery_wait+inconsistent with unfound objects. The
> > cluster is currently running luminous 12.2.2 but has in the past also
> > run its way through firefly, hammer and jewel.
> >
> > Here's a sample object from "ceph pg list_missing" (there are 150
> > unfound objects in this particular pg):
> >
> > ceph health detail shows:
> >>     pg 70.467 is stuck unclean for 1004525.715896, current state
> >> active+recovery_wait+inconsistent, last acting [449,233,336,323,259,193]
> >
> > ceph pg 70.467 list_missing:
> >>         {
> >>             "oid": {
> >>                 "oid":
> >>
> "default.323253.6_20150226/Downloads/linux-nvme-HEAD-5aa2ffa/include/config/via/fir.h",
> >>
> >>                 "key": "",
> >>                 "snapid": -2,
> >>                 "hash": 628294759,
> >>                 "max": 0,
> >>                 "pool": 70,
> >>                 "namespace": ""
> >>             },
> >>             "need": "73222'132227",
> >>             "have": "0'0",
> >>             "flags": "none",
> >>             "locations": [
> >>                 "193(5)",
> >>                 "259(4)",
> >>                 "449(0)"
> >>             ]
> >>         },
> >
> > When I trace through the filesystem on each OSD, I find the associated
> > file present on each OSD but with size 0 bytes.
> >
> > Interestingly, for the 3 OSDs for which "list_missing" shows locations
> > above (193,259,449), the timestamp of the 0-byte file is recent (within
> > last few weeks). For the other 3 OSDs (233,336,323), it's in the distant
> > past (08/2015 and 02/2016).
> >
> > All the unfound objects I've checked on this pg show the same pattern,
> > along with the "have" epoch showing as "0'0".
> >
> > Other than the potential data loss being disturbing, I wonder why this
> > showed up so suddenly?
> >
> > It seems to have been triggered by one OSD host failing over a long
> > weekend. By the time we looked at it on Monday, the cluster had
> > re-balanced enough data that I decided to simply leave it - we had long
> > wanted to evacuate a first host to convert to a newer OS release, as
> > well as Bluestore. Perhaps this was a bad choice, but the cluster
> > recovery appeared to be proceeding normally, and was apparently complete
> > a few days later. It was only around a week later that the unfound
> > objects started.
> >
> > All the unfound object file fragments I've tracked down so far have
> > their older members with timestamps in the same mid-2015 to mid-2016
> > period. I could be wrong but this really seems like a long-standing
> > problem has just been unearthed. I wonder if it could be connected to
> > this thread from early 2016, concerning a problem on the same cluster:
> >
> >
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008120.html
> >
> > It's a long thread, but the 0-byte files sound very like the "orphaned
> > files" in that thread - related to performing a directory split while
> > handling links on a filename with the special long filename handling...
> >
> >
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008317.html
> >
> > However unlike that thread, I'm not finding any other files with
> > duplicate names in the hierarchy.
> >
> > I'm not sure there's much else I can do besides record the names of any
> > unfound objects before resorting to "mark_unfound_lost delete" - any
> > suggestions for further research?
> >
> > Thanks,
> >
> > Graham
>
> --
> Graham Allan
> Minnesota Supercomputing Institute - g...@umn.edu
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to