Yep, thanks for all the help tracking down the root cause! -Sam
On Thu, Mar 17, 2016 at 10:50 AM, Jeffrey McDonald <jmcdo...@umn.edu> wrote: > Great, I just recovered the first placement group from this error. To be > sure, I ran a deep-scrub and that comes back clean. > > Thanks for all your help. > Regards, > Jeff > > On Thu, Mar 17, 2016 at 11:58 AM, Samuel Just <sj...@redhat.com> wrote: >> >> Oh, it's getting a stat mismatch. I think what happened is that on >> one of the earlier repairs it reset the stats to the wrong value (the >> orphan was causing the primary to scan two objects twice, which >> matches the stat mismatch I see here). A pg repair repair will clear >> that up. >> -Sam >> >> On Thu, Mar 17, 2016 at 9:22 AM, Jeffrey McDonald <jmcdo...@umn.edu> >> wrote: >> > Thanks Sam..... >> > >> > Since I have prepared a script for this, I decided to go ahead with the >> > checks.....(patience isn't one of my extended attributes....) >> > >> > I've got a file that searches the full erasure encoded spaces and does >> > your >> > checklist below. I have operated only on one PG so far, the 70.459 one >> > that we've been discussing. There was only the one file that I found >> > to >> > be out of place--the one we already discussed/found and it has been >> > removed. >> > >> > The pg is still marked as inconsistent. I've scrubbed it a couple of >> > times >> > now and what I've seen is: >> > >> > 2016-03-17 09:29:53.202818 7f2e816f8700 0 log_channel(cluster) log >> > [INF] : >> > 70.459 deep-scrub starts >> > 2016-03-17 09:36:38.436821 7f2e816f8700 -1 log_channel(cluster) log >> > [ERR] : >> > 70.459s0 deep-scrub stat mismatch, got 22319/22321 objects, 0/0 clones, >> > 22319/22321 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, >> > 68440088914/68445454633 bytes,0/0 hit_set_archive bytes. >> > 2016-03-17 09:36:38.436844 7f2e816f8700 -1 log_channel(cluster) log >> > [ERR] : >> > 70.459 deep-scrub 1 errors >> > 2016-03-17 09:44:23.592302 7f2e816f8700 0 log_channel(cluster) log >> > [INF] : >> > 70.459 deep-scrub starts >> > 2016-03-17 09:47:01.237846 7f2e816f8700 -1 log_channel(cluster) log >> > [ERR] : >> > 70.459s0 deep-scrub stat mismatch, got 22319/22321 objects, 0/0 clones, >> > 22319/22321 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, >> > 68440088914/68445454633 bytes,0/0 hit_set_archive bytes. >> > 2016-03-17 09:47:01.237880 7f2e816f8700 -1 log_channel(cluster) log >> > [ERR] : >> > 70.459 deep-scrub 1 errors >> > >> > >> > Should the scrub be sufficient to remove the inconsistent flag? I took >> > the >> > osd offline during the repairs. I've looked at files in all of the >> > osds >> > in the placement group and I'm not finding any more problem files. >> > The >> > vast majority of files do not have the user.cephos.lfn3 attribute. >> > There >> > are 22321 objects that I seen and only about 230 have the >> > user.cephos.lfn3 >> > file attribute. The files will have other attributes, just not >> > user.cephos.lfn3. >> > >> > Regards, >> > Jeff >> > >> > >> > On Wed, Mar 16, 2016 at 3:53 PM, Samuel Just <sj...@redhat.com> wrote: >> >> >> >> Ok, like I said, most files with _long at the end are *not orphaned*. >> >> The generation number also is *not* an indication of whether the file >> >> is orphaned -- some of the orphaned files will have ffffffffffffffff >> >> as the generation number and others won't. For each long filename >> >> object in a pg you would have to: >> >> 1) Pull the long name out of the attr >> >> 2) Parse the hash out of the long name >> >> 3) Turn that into a directory path >> >> 4) Determine whether the file is at the right place in the path >> >> 5) If not, remove it (or echo it to be checked) >> >> >> >> You probably want to wait for someone to get around to writing a >> >> branch for ceph-objectstore-tool. Should happen in the next week or >> >> two. >> >> -Sam >> >> >> > >> > -- >> > >> > Jeffrey McDonald, PhD >> > Assistant Director for HPC Operations >> > Minnesota Supercomputing Institute >> > University of Minnesota Twin Cities >> > 599 Walter Library email: jeffrey.mcdon...@msi.umn.edu >> > 117 Pleasant St SE phone: +1 612 625-6905 >> > Minneapolis, MN 55455 fax: +1 612 624-8861 >> > >> > > > > > > -- > > Jeffrey McDonald, PhD > Assistant Director for HPC Operations > Minnesota Supercomputing Institute > University of Minnesota Twin Cities > 599 Walter Library email: jeffrey.mcdon...@msi.umn.edu > 117 Pleasant St SE phone: +1 612 625-6905 > Minneapolis, MN 55455 fax: +1 612 624-8861 > > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com