Re: [ceph-users] PGs stuck activating after adding new OSDs

2018-03-29 Thread Jon Light
I let the 2 working OSDs backfill over the last couple days and today I was able to add 7 more OSDs before getting PGs stuck activating. Below is the OSD and health outputs after adding an 8th OSD and getting activating PGs. ceph osd df tree ID CLASS WEIGHTREWEIGHT SIZE USEAVAIL %USE

Re: [ceph-users] PGs stuck activating after adding new OSDs

2018-03-27 Thread Jon Light
, Mar 27, 2018 at 2:29 PM, Peter Linder wrote: > I've had similar issues, but I think your problem might be something else. > Could you send the output of "ceph osd df"? > > Other people will probably be interested in what version you are using as > well. > > Den 2

[ceph-users] PGs stuck activating after adding new OSDs

2018-03-27 Thread Jon Light
Hi all, I'm adding a new OSD node with 36 OSDs to my cluster and have run into some problems. Here are some of the details of the cluster: 1 OSD node with 80 OSDs 1 EC pool with k=10, m=3 pg_num 1024 osd failure domain I added a second OSD node and started creating OSDs with ceph-deploy, one by

[ceph-users] Moving OSDs between hosts

2018-03-16 Thread Jon Light
Hi all, I have a very small cluster consisting of 1 overloaded OSD node and a couple MON/MGR/MDS nodes. I will be adding new OSD nodes to the cluster and need to move 36 drives from the existing node to a new one. I'm running Luminous 12.2.2 on Ubuntu 16.04 and everything was created with ceph-dep

Re: [ceph-users] FAILED assert(p.same_interval_since) and unusable cluster

2017-11-08 Thread Jon Light
Thanks for the instructions Michael, I was able to successfully get the patch, build, and install. Unfortunately I'm now seeing "osd/PG.cc: 5381: FAILED assert(info.history.same_interval_since != 0)". Then the OSD crashes. On Sat, Nov 4, 2017 at 5:51 AM, Michael wrote: > Jon

Re: [ceph-users] FAILED assert(p.same_interval_since) and unusable cluster

2017-11-02 Thread Jon Light
nt installation? Thanks On Wed, Nov 1, 2017 at 11:39 AM, Jon Light wrote: > I'm currently running 12.2.0. How should I go about applying the patch? > Should I upgrade to 12.2.1, apply the changes, and then recompile? > > I really appreciate the patch. > Thanks > > On Wed, Nov

Re: [ceph-users] FAILED assert(p.same_interval_since) and unusable cluster

2017-11-01 Thread Jon Light
my tentative fix for this issue which is > in https://github.com/ceph/ceph/pull/18673 > > > Thanks > > David > > > > On 10/30/17 1:13 AM, Jon Light wrote: > >> Hello, >> >> I have three OSDs that are crashing on start with a FAILED >> assert(

[ceph-users] FAILED assert(p.same_interval_since) and unusable cluster

2017-10-30 Thread Jon Light
Hello, I have three OSDs that are crashing on start with a FAILED assert(p.same_interval_since) error. I ran across a thread from a few days ago about the same issue and a ticket was created here: http://tracker.ceph.com/issues/21833. A very overloaded node in my cluster OOM'd many times which ev