After some creative PG surgery, everything is coming back online cleanly.
I went through one at a time(80-90 PG's) on the least filled(new osd.5) and
export-remove'd each PG that was causing the assertion failures after
testing starting the OSD. # tail -f /var/log/ceph/ceph-osd.5.log | grep -A1
"unlocked" (This helped identify the PG loaded right before the assertion
failure).  I've kept the flawed PG's with the striping issue in case they
were needed for anything later on. This allowed the OSD to finally start
with only clean PG's from that pool left.  Then I went through and started
that process on the other OSD(0), which is going to take forever because
that had existing data.  Paused with that, and I identified the
incomplete/inactive PG's and then exported those from the downed osd.0,
then imported into the osd.5 that was able to come online.  Some of the
imports identified split PG's where there were contents for other missing
PG's as part of a few of the imports.  Using the import capability while
specifiying the split pgid allowed those additional objects to import and
to satisfy all of the missing shards for additional objects that I hadn't
yet identified source PG's for.  5/6 OSD's up and running, and now all of
the PG's are active now, and all of the data is back working. Still
undersized/backfilling/moving but it seems there isn't any data loss.

Now I can either continue going through one at a time removing the
erroneous PG's from osd.0 or potentially blow it away and start a fresh
OSD.  Is there a recommended path there?
Second question, if I bring up the original OSD after pruning all of the
flawed PG copies with the stripe issue, is it important to remove the
leftover PG copies that were successfully imported into osd.5?  I'm
thinking I would want to, and can leave the exports around just in case.
Once data starts changing(new writes) I would imagine the exports wouldn't
work(or could they potentially screw something up?)


https://gist.githubusercontent.com/arodd/c95355a7b55f3e4a94f21bc5e801943d/raw/dfce381af603306e09c634196309d95c172961a7/osd-semi-healthy

After all of this, i'm going to make a new cephfs filesystem with a new
metadata/data pool with the newer ec settings to copy all of the data over
into with fresh PG's, and might consider moving to k=4,m=2 instead ;)

On Wed, Jul 3, 2019 at 2:28 PM Austin Workman <soilfla...@gmail.com> wrote:

> That makes more sense.
>
> Setting min_size = 4 on the EC pool allows data to flow again(kind of not
> really because of the still missing 22 other PG's) maybe this automatically
> raised to 5 when I adjusted the EC pool originally?, outside of the 21
> unknown and 1 down PG which are probably depending on the two OSD's.  These
> are probably the 22 PG's that actually got fully moved around(maybe even
> converted to k=5/m=1?).  Would be great if I can find a way to start those
> other two OSD's, and just deal with whatever state is causing the OSD's to
> crash.
>
> On Wed, Jul 3, 2019 at 2:18 PM Janne Johansson <icepic...@gmail.com>
> wrote:
>
>> Den ons 3 juli 2019 kl 20:51 skrev Austin Workman <soilfla...@gmail.com>:
>>
>>>
>>> But a very strange number shows up in the active sections of the pg's
>>> that's the same number roughly as 2147483648.....  This seems very odd,
>>> and maybe the value got lodged somewhere it doesn't belong which is causing
>>> an issue.
>>>
>>>
>> That pg number is "-1" or something for a signed 32bit int, which means
>> "I don't know which one it was anymore" which you can get in PG lists when
>> OSDs are gone.
>>
>> --
>> May the most significant bit of your life be positive.
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to