Re: [ceph-users] OSD's won't start - thread abort

2019-07-05 Thread Gregory Farnum
n Wed, Jul 3, 2019 at 11:09 AM Austin Workman wrote: > Decided that if all the data was going to move, I should adjust my jerasure > ec profile from k=4, m=1 -> k=5, m=1 with force(is this even recommended vs. > just creating new pools???) > > Initially it unset crush-device-class=hdd to be blan

Re: [ceph-users] OSD's won't start - thread abort

2019-07-03 Thread Austin Workman
After some creative PG surgery, everything is coming back online cleanly. I went through one at a time(80-90 PG's) on the least filled(new osd.5) and export-remove'd each PG that was causing the assertion failures after testing starting the OSD. # tail -f /var/log/ceph/ceph-osd.5.log | grep -A1 "un

Re: [ceph-users] OSD's won't start - thread abort

2019-07-03 Thread Austin Workman
That makes more sense. Setting min_size = 4 on the EC pool allows data to flow again(kind of not really because of the still missing 22 other PG's) maybe this automatically raised to 5 when I adjusted the EC pool originally?, outside of the 21 unknown and 1 down PG which are probably depending on

Re: [ceph-users] OSD's won't start - thread abort

2019-07-03 Thread Janne Johansson
Den ons 3 juli 2019 kl 20:51 skrev Austin Workman : > > But a very strange number shows up in the active sections of the pg's > that's the same number roughly as 2147483648. This seems very odd, > and maybe the value got lodged somewhere it doesn't belong which is causing > an issue. > > That

Re: [ceph-users] OSD's won't start - thread abort

2019-07-03 Thread Austin Workman
Something very curious is that I was adjusting the configuration for osd memory target via ceph-ansible and had at one point set 2147483648 which is around 2GB Currently It's set to 1610612736, but strangely in the config file it wrote 1963336226. But a very strange number shows up in the act

[ceph-users] OSD's won't start - thread abort

2019-07-03 Thread Austin Workman
So several events unfolded that may have led to this situation. Some of them in hindsight were probably not the smartest decision around adjusting the ec pool and restarting the OSD's several times during these migrations. 1. Added a new 6th OSD with ceph-ansible 1. Hung during restart