On Tue, Jul 4, 2017 at 10:47 PM Eino Tuominen <e...@utu.fi> wrote:

> ​Hello,
>
>
> I noticed the same behaviour in our cluster.
>
>
> ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
>
>
>
>     cluster 0a9f2d69-5905-4369-81ae-e36e4a791831
>
>      health HEALTH_WARN
>
>             1 pgs backfill_toofull
>
>             4366 pgs backfill_wait
>
>             11 pgs backfilling
>
>             45 pgs degraded
>
>             45 pgs recovery_wait
>
>             45 pgs stuck degraded
>
>             4423 pgs stuck unclean
>
>             recovery 181563/302722835 objects degraded (0.060%)
>
>             recovery 57192879/302722835 objects misplaced (18.893%)
>
>             1 near full osd(s)
>
>             noout,nodeep-scrub flag(s) set
>
>      monmap e3: 3 mons at {0=
> 130.232.243.65:6789/0,1=130.232.243.66:6789/0,2=130.232.243.67:6789/0}
>
>             election epoch 356, quorum 0,1,2 0,1,2
>
>      osdmap e388588: 260 osds: 260 up, 242 in; 4378 remapped pgs
>
>             flags nearfull,noout,nodeep-scrub,require_jewel_osds
>
>       pgmap v80658624: 25728 pgs, 8 pools, 202 TB data, 89212 kobjects
>
>             612 TB used, 300 TB / 912 TB avail
>
>             181563/302722835 objects degraded (0.060%)
>
>             57192879/302722835 objects misplaced (18.893%)
>
>                21301 active+clean
>
>                 4366 active+remapped+wait_backfill
>
>                   45 active+recovery_wait+degraded
>
>                   11 active+remapped+backfilling
>
>                    4 active+clean+scrubbing
>
>                    1 active+remapped+backfill_toofull
>
> recovery io 421 MB/s, 155 objects/s
>
>   client io 201 kB/s rd, 2034 B/s wr, 75 op/s rd, 0 op/s wr
>
> I'm currently doing a rolling migration from Puppet on Ubuntu to Ansible
> on RHEL, and I started with a healthy cluster, evacuated some nodes by
> setting their weight to 0, removed them from the cluster and re-added them
> with ansible playbook.
>
> Basically I ran
>
>         ceph osd crush remove osd.$num
>
>         ceph osd rm $num
>
>         ceph auth del osd.$num
>
> in a loop for the osds I was replacing, and then let the ansible ceph-osd
> playbook to bring the host back to the cluster. Crushmap is attached.
>

This case is different. If you are removing OSDs before they've had the
chance to offload themselves, objects are going to be degraded since you're
removing a copy! :)
-Greg


> ​
> --
>   Eino Tuominen
>
>
> ------------------------------
> *From:* ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of
> Gregory Farnum <gfar...@redhat.com>
> *Sent:* Friday, June 30, 2017 23:38
> *To:* Andras Pataki; ceph-users
> *Subject:* Re: [ceph-users] Degraded objects while OSD is being
> added/filled
>
> On Wed, Jun 21, 2017 at 6:57 AM Andras Pataki <
> apat...@flatironinstitute.org> wrote:
>
>> Hi cephers,
>>
>> I noticed something I don't understand about ceph's behavior when adding
>> an OSD.  When I start with a clean cluster (all PG's active+clean) and add
>> an OSD (via ceph-deploy for example), the crush map gets updated and PGs
>> get reassigned to different OSDs, and the new OSD starts getting filled
>> with data.  As the new OSD gets filled, I start seeing PGs in degraded
>> states.  Here is an example:
>>
>>       pgmap v52068792: 42496 pgs, 6 pools, 1305 TB data, 390 Mobjects
>>             3164 TB used, 781 TB / 3946 TB avail
>> *            8017/994261437 objects degraded (0.001%)*
>>             2220581/994261437 objects misplaced (0.223%)
>>                42393 active+clean
>>                   91 active+remapped+wait_backfill
>>                    9 active+clean+scrubbing+deep
>> *                   1 active+recovery_wait+degraded*
>>                    1 active+clean+scrubbing
>>                    1 active+remapped+backfilling
>>
>>
>> Any ideas why there would be any persistent degradation in the cluster
>> while the newly added drive is being filled?  It takes perhaps a day or two
>> to fill the drive - and during all this time the cluster seems to be
>> running degraded.  As data is written to the cluster, the number of
>> degraded objects increases over time.  Once the newly added OSD is filled,
>> the cluster comes back to clean again.
>>
>> Here is the PG that is degraded in this picture:
>>
>> 7.87c    1    0    2    0    0    4194304    7    7
>> active+recovery_wait+degraded    2017-06-20 14:12:44.119921    344610'7
>> 583572:2797    [402,521]    402    [402,521]    402    344610'7
>> 2017-06-16 06:04:55.822503    344610'7    2017-06-16 06:04:55.822503
>>
>> The newly added osd here is 521.  Before it got added, this PG had two
>> replicas clean, but one got forgotten somehow?
>>
>
> This sounds a bit concerning at first glance. Can you provide some output
> of exactly what commands you're invoking, and the "ceph -s" output as it
> changes in response?
>
> I really don't see how adding a new OSD can result in it "forgetting"
> about existing valid copies — it's definitely not supposed to — so I wonder
> if there's a collision in how it's deciding to remove old locations.
>
> Are you running with only two copies of your data? It shouldn't matter but
> there could also be errors resulting in a behavioral difference between two
> and three copies.
> -Greg
>
>
>>
>> Other remapped PGs have 521 in their "up" set but still have the two
>> existing copies in their "acting" set - and no degradation is shown.
>> Examples:
>>
>> 2.f24    14282    0    16    28564    0    51014850801    3102    3102
>> active+remapped+wait_backfill    2017-06-20 14:12:42.650308
>> 583553'2033479    583573:2033266    [467,521]    467    [467,499]    467
>> 582430'2033337    2017-06-16 09:08:51.055131    582036'2030837
>> 2017-05-31 20:37:54.831178
>> 6.2b7d    10499    0    140    20998    0    37242874687    3673
>> 3673    active+remapped+wait_backfill    2017-06-20 14:12:42.070019
>> 583569'165163    583572:342128    [541,37,521]    541    [541,37,532]
>> 541    582430'161890    2017-06-18 09:42:49.148402    582430'161890
>> 2017-06-18 09:42:49.148402
>>
>> We are running the latest Jewel patch level everywhere (10.2.7).  Any
>> insights would be appreciated.
>>
>> Andras
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to