I think what happened is this :

http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/


Note


Sometimes, typically in a “small” cluster with few hosts (for instance with
a small testing cluster), the fact to take out the OSD can spawn a CRUSH
corner case where some PGs remain stuck in the active+remapped state

Its a small cluster with unequal number of osds and one of the OSD disk
failed and I had taken it out.
I have already purged it, so I cannot use the reweight option mentioned in
that link.


So any other workarounds ?
Will adding more disks will clear it ?

Karun Josy

On Mon, Dec 18, 2017 at 9:06 AM, David Turner <drakonst...@gmail.com> wrote:

> Maybe try outing the disk that should have a copy of the PG, but doesn't.
> Then mark it back in. It might check that it has everything properly and
> pull a copy of the data it's missing. I dunno.
>
> On Sun, Dec 17, 2017, 10:00 PM Karun Josy <karunjo...@gmail.com> wrote:
>
>> Tried restarting all osds. Still no luck.
>>
>> Will adding a new disk to any of the server forces a rebalance and fix it?
>>
>> Karun Josy
>>
>> On Sun, Dec 17, 2017 at 12:22 PM, Cary <dynamic.c...@gmail.com> wrote:
>>
>>> Karun,
>>>
>>>  Could you paste in the output from "ceph health detail"? Which OSD
>>> was just added?
>>>
>>> Cary
>>> -Dynamic
>>>
>>> On Sun, Dec 17, 2017 at 4:59 AM, Karun Josy <karunjo...@gmail.com>
>>> wrote:
>>> > Any help would be appreciated!
>>> >
>>> > Karun Josy
>>> >
>>> > On Sat, Dec 16, 2017 at 11:04 PM, Karun Josy <karunjo...@gmail.com>
>>> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> Repair didnt fix the issue.
>>> >>
>>> >> In the pg dump details, I notice this None. Seems pg is missing from
>>> one
>>> >> of the OSD
>>> >>
>>> >> [0,2,NONE,4,12,10,5,1]
>>> >> [0,2,1,4,12,10,5,1]
>>> >>
>>> >> There is no way Ceph corrects this automatically ? I have to edit/
>>> >> troubleshoot it manually ?
>>> >>
>>> >> Karun
>>> >>
>>> >> On Sat, Dec 16, 2017 at 10:44 PM, Cary <dynamic.c...@gmail.com>
>>> wrote:
>>> >>>
>>> >>> Karun,
>>> >>>
>>> >>>  Running ceph pg repair should not cause any problems. It may not fix
>>> >>> the issue though. If that does not help, there is more information at
>>> >>> the link below.
>>> >>> http://ceph.com/geen-categorie/ceph-manually-repair-object/
>>> >>>
>>> >>> I recommend not rebooting, or restarting while Ceph is repairing or
>>> >>> recovering. If possible, wait until the cluster is in a healthy state
>>> >>> first.
>>> >>>
>>> >>> Cary
>>> >>> -Dynamic
>>> >>>
>>> >>> On Sat, Dec 16, 2017 at 2:05 PM, Karun Josy <karunjo...@gmail.com>
>>> wrote:
>>> >>> > Hi Cary,
>>> >>> >
>>> >>> > No, I didnt try to repair it.
>>> >>> > I am comparatively new in ceph. Is it okay to try to repair it ?
>>> >>> > Or should I take any precautions while doing it ?
>>> >>> >
>>> >>> > Karun Josy
>>> >>> >
>>> >>> > On Sat, Dec 16, 2017 at 2:08 PM, Cary <dynamic.c...@gmail.com>
>>> wrote:
>>> >>> >>
>>> >>> >> Karun,
>>> >>> >>
>>> >>> >>  Did you attempt a "ceph pg repair <pgid>"? Replace <pgid> with
>>> the pg
>>> >>> >> ID that needs repaired, 3.4.
>>> >>> >>
>>> >>> >> Cary
>>> >>> >> -D123
>>> >>> >>
>>> >>> >> On Sat, Dec 16, 2017 at 8:24 AM, Karun Josy <karunjo...@gmail.com
>>> >
>>> >>> >> wrote:
>>> >>> >> > Hello,
>>> >>> >> >
>>> >>> >> > I added 1 disk to the cluster and after rebalancing, it shows 1
>>> PG
>>> >>> >> > is in
>>> >>> >> > remapped state. How can I correct it ?
>>> >>> >> >
>>> >>> >> > (I had to restart some osds during the rebalancing as there were
>>> >>> >> > some
>>> >>> >> > slow
>>> >>> >> > requests)
>>> >>> >> >
>>> >>> >> > $ ceph pg dump | grep remapped
>>> >>> >> > dumped all
>>> >>> >> > 3.4         981                  0        0         0       0
>>> >>> >> > 2655009792
>>> >>> >> > 1535     1535 active+clean+remapped 2017-12-15 22:07:21.663964
>>> >>> >> > 2824'785115
>>> >>> >> > 2824:2297888 [0,2,NONE,4,12,10,5,1]          0
>>>  [0,2,1,4,12,10,5,1]
>>> >>> >> > 0  2288'767367 2017-12-14 11:00:15.576741      417'518549
>>> 2017-12-08
>>> >>> >> > 03:56:14.006982
>>> >>> >> >
>>> >>> >> > That PG belongs to an erasure pool with k=5, m =3 profile,
>>> failure
>>> >>> >> > domain is
>>> >>> >> > host.
>>> >>> >> >
>>> >>> >> > ===========
>>> >>> >> >
>>> >>> >> > $ ceph osd tree
>>> >>> >> > ID  CLASS WEIGHT   TYPE NAME                STATUS REWEIGHT
>>> PRI-AFF
>>> >>> >> >  -1       16.94565 root default
>>> >>> >> >  -3        2.73788     host ceph-a1
>>> >>> >> >   0   ssd  1.86469         osd.0                up  1.00000
>>> 1.00000
>>> >>> >> >  14   ssd  0.87320         osd.14               up  1.00000
>>> 1.00000
>>> >>> >> >  -5        2.73788     host ceph-a2
>>> >>> >> >   1   ssd  1.86469         osd.1                up  1.00000
>>> 1.00000
>>> >>> >> >  15   ssd  0.87320         osd.15               up  1.00000
>>> 1.00000
>>> >>> >> >  -7        1.86469     host ceph-a3
>>> >>> >> >   2   ssd  1.86469         osd.2                up  1.00000
>>> 1.00000
>>> >>> >> >  -9        1.74640     host ceph-a4
>>> >>> >> >   3   ssd  0.87320         osd.3                up  1.00000
>>> 1.00000
>>> >>> >> >   4   ssd  0.87320         osd.4                up  1.00000
>>> 1.00000
>>> >>> >> > -11        1.74640     host ceph-a5
>>> >>> >> >   5   ssd  0.87320         osd.5                up  1.00000
>>> 1.00000
>>> >>> >> >   6   ssd  0.87320         osd.6                up  1.00000
>>> 1.00000
>>> >>> >> > -13        1.74640     host ceph-a6
>>> >>> >> >   7   ssd  0.87320         osd.7                up  1.00000
>>> 1.00000
>>> >>> >> >   8   ssd  0.87320         osd.8                up  1.00000
>>> 1.00000
>>> >>> >> > -15        1.74640     host ceph-a7
>>> >>> >> >   9   ssd  0.87320         osd.9                up  1.00000
>>> 1.00000
>>> >>> >> >  10   ssd  0.87320         osd.10               up  1.00000
>>> 1.00000
>>> >>> >> > -17        2.61960     host ceph-a8
>>> >>> >> >  11   ssd  0.87320         osd.11               up  1.00000
>>> 1.00000
>>> >>> >> >  12   ssd  0.87320         osd.12               up  1.00000
>>> 1.00000
>>> >>> >> >  13   ssd  0.87320         osd.13               up  1.00000
>>> 1.00000
>>> >>> >> >
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > Karun
>>> >>> >> >
>>> >>> >> > _______________________________________________
>>> >>> >> > ceph-users mailing list
>>> >>> >> > ceph-users@lists.ceph.com
>>> >>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >>> >> >
>>> >>> >
>>> >>> >
>>> >>
>>> >>
>>> >
>>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to