n Wed, Jul 3, 2019 at 11:09 AM Austin Workman wrote:
> Decided that if all the data was going to move, I should adjust my jerasure
> ec profile from k=4, m=1 -> k=5, m=1 with force(is this even recommended vs.
> just creating new pools???)
>
> Initially it unset crush-device-class=hdd to be blan
After some creative PG surgery, everything is coming back online cleanly.
I went through one at a time(80-90 PG's) on the least filled(new osd.5) and
export-remove'd each PG that was causing the assertion failures after
testing starting the OSD. # tail -f /var/log/ceph/ceph-osd.5.log | grep -A1
"un
That makes more sense.
Setting min_size = 4 on the EC pool allows data to flow again(kind of not
really because of the still missing 22 other PG's) maybe this automatically
raised to 5 when I adjusted the EC pool originally?, outside of the 21
unknown and 1 down PG which are probably depending on
Den ons 3 juli 2019 kl 20:51 skrev Austin Workman :
>
> But a very strange number shows up in the active sections of the pg's
> that's the same number roughly as 2147483648. This seems very odd,
> and maybe the value got lodged somewhere it doesn't belong which is causing
> an issue.
>
>
That
Something very curious is that I was adjusting the configuration for osd
memory target via ceph-ansible and had at one point set 2147483648
which is around 2GB
Currently It's set to 1610612736, but strangely in the config file it
wrote 1963336226.
But a very strange number shows up in the act
So several events unfolded that may have led to this situation. Some of
them in hindsight were probably not the smartest decision around adjusting
the ec pool and restarting the OSD's several times during these migrations.
1. Added a new 6th OSD with ceph-ansible
1. Hung during restart