Re: [ceph-users] Manually mucked up pg, need help fixing

Jeff Bachtel Mon, 05 May 2014 16:47:22 -0700

Thanks. That is a cool utility, unfortunately I'm pretty sure the pg inquestion had a cephfs object instead of rbd images (because mountingcephfs is the only noticeable brokenness).


Jeff


On 05/05/2014 06:43 PM, Jake Young wrote:

I was in a similar situation where I could see the PGs data on an osd,but there was nothing I could do to force the pg to use that osd's copy.

I ended up using the rbd_restore tool to create my rbd on disk andthen I reimported it into the pool.


See this thread for info on rbd_restore:
http://www.spinics.net/lists/ceph-devel/msg11552.html

Of course, you have to copy all of the pieces of the rbd image on onefile system somewhere (thank goodness for thin provisioning!) for thetool to work.


There really should be a better way.

Jake

On Monday, May 5, 2014, Jeff Bachtel <jbach...@bericotechnologies.com<mailto:jbach...@bericotechnologies.com>> wrote:


    Well, that'd be the ideal solution. Please check out the github
    gist I posted, though. It seems that despite osd.4 having nothing
    good for pg 0.2f, the cluster does not acknowledge any other osd
    has a copy of the pg. I've tried downing osd.4 and manually
    deleting the pg directory in question with the hope that the
    cluster would roll back epochs for 0.2f, but all it does is
    recreate the pg directory (empty) on osd.4.

    Jeff

    On 05/05/2014 04:33 PM, Gregory Farnum wrote:

        What's your cluster look like? I wonder if you can just remove
        the bad
        PG from osd.4 and let it recover from the existing osd.1
        -Greg
        Software Engineer #42 @ http://inktank.com | http://ceph.com


        On Sat, May 3, 2014 at 9:17 AM, Jeff Bachtel
        <jbach...@bericotechnologies.com> wrote:

            This is all on firefly rc1 on CentOS 6

            I had an osd getting overfull, and misinterpreting
            directions I downed it
            then manually removed pg directories from the osd mount.
            On restart and
            after a good deal of rebalancing (setting osd weights as I
            should've
            originally), I'm now at

                 cluster de10594a-0737-4f34-a926-58dc9254f95f
                  health HEALTH_WARN 2 pgs backfill; 1 pgs incomplete;
            1 pgs stuck
            inactive; 308 pgs stuck unclean; recov
            ery 1/2420563 objects degraded (0.000%); noout flag(s) set
                  monmap e7: 3 mons at
            
{controller1=10.100.2.1:6789/0,controller2=10.100.2.2:6789/0,controller3=10.100.2
            
<http://10.100.2.1:6789/0,controller2=10.100.2.2:6789/0,controller3=10.100.2>.
            3:6789/0}, election epoch 556, quorum 0,1,2
            controller1,controller2,controller3
                  mdsmap e268: 1/1/1 up {0=controller1=up:active}
                  osdmap e3492: 5 osds: 5 up, 5 in
                         flags noout
                   pgmap v4167420: 320 pgs, 15 pools, 4811 GB data,
            1181 kobjects
                         9770 GB used, 5884 GB / 15654 GB avail
                         1/2420563 objects degraded (0.000%)
                                3 active
                               12 active+clean
                                2 active+remapped+wait_backfill
                                1 incomplete
                              302 active+remapped
               client io 364 B/s wr, 0 op/s

            # ceph pg dump | grep 0.2f
            dumped all in format plain
            0.2f    0       0       0       0       0       0   0
            incomplete
            2014-05-03 11:38:01.526832 0'0      3492:23 [4] 4   [4]     4
            2254'20053      2014-04-28 00:24:36.504086  2100'18109
            2014-04-26
            22:26:23.699330

            # ceph pg map 0.2f
            osdmap e3492 pg 0.2f (0.2f) -> up [4] acting [4]

            The pg query for the downed pg is at
            https://gist.github.com/jeffb-bt/c8730899ff002070b325

            Of course, the osd I manually mucked with is the only one
            the cluster is
            picking up as up/acting. Now, I can query the pg and find
            epochs where other
            osds (that I didn't jack up) were acting. And in fact, the
            latest of those
            entries (osd.1) has the pg directory in its osd mount, and
            it's a good
            healthy 59gb.

            I've tried manually rsync'ing (and preserving attributes)
            that set of
            directories from osd.1 to osd.4 without success. Likewise
            I've tried copying
            the directories over without attributes set. I've done
            many, many deep
            scrubs but the pg query does not show the scrub timestamps
            being affected.

            I'm seeking ideas for either fixing metadata on the
            directory on osd.4 to
            cause this pg to be seen/recognized, or ideas on forcing
            the cluster's pg
            map to point to osd.1 for the incomplete pg (basically
            wiping out the
            cluster's memory that osd.4 ever had 0.2f). Or any other
            solution :) It's
            only 59g, so worst case I'll mark it lost and recreate the
            pg, but I'd
            prefer to learn enough of the innards to understand what
            is going on, and
            possible means of fixing it.

            Thanks for any help,

            Jeff

            _______________________________________________
            ceph-users mailing list
            ceph-users@lists.ceph.com
            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


    _______________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Manually mucked up pg, need help fixing

Reply via email to