On Wed, Jun 21, 2017 at 6:57 AM Andras Pataki <apat...@flatironinstitute.org>
wrote:

> Hi cephers,
>
> I noticed something I don't understand about ceph's behavior when adding
> an OSD.  When I start with a clean cluster (all PG's active+clean) and add
> an OSD (via ceph-deploy for example), the crush map gets updated and PGs
> get reassigned to different OSDs, and the new OSD starts getting filled
> with data.  As the new OSD gets filled, I start seeing PGs in degraded
> states.  Here is an example:
>
>       pgmap v52068792: 42496 pgs, 6 pools, 1305 TB data, 390 Mobjects
>             3164 TB used, 781 TB / 3946 TB avail
> *            8017/994261437 objects degraded (0.001%)*
>             2220581/994261437 objects misplaced (0.223%)
>                42393 active+clean
>                   91 active+remapped+wait_backfill
>                    9 active+clean+scrubbing+deep
> *                   1 active+recovery_wait+degraded*
>                    1 active+clean+scrubbing
>                    1 active+remapped+backfilling
>
>
> Any ideas why there would be any persistent degradation in the cluster
> while the newly added drive is being filled?  It takes perhaps a day or two
> to fill the drive - and during all this time the cluster seems to be
> running degraded.  As data is written to the cluster, the number of
> degraded objects increases over time.  Once the newly added OSD is filled,
> the cluster comes back to clean again.
>
> Here is the PG that is degraded in this picture:
>
> 7.87c    1    0    2    0    0    4194304    7    7
> active+recovery_wait+degraded    2017-06-20 14:12:44.119921    344610'7
> 583572:2797    [402,521]    402    [402,521]    402    344610'7
> 2017-06-16 06:04:55.822503    344610'7    2017-06-16 06:04:55.822503
>
> The newly added osd here is 521.  Before it got added, this PG had two
> replicas clean, but one got forgotten somehow?
>

This sounds a bit concerning at first glance. Can you provide some output
of exactly what commands you're invoking, and the "ceph -s" output as it
changes in response?

I really don't see how adding a new OSD can result in it "forgetting" about
existing valid copies — it's definitely not supposed to — so I wonder if
there's a collision in how it's deciding to remove old locations.

Are you running with only two copies of your data? It shouldn't matter but
there could also be errors resulting in a behavioral difference between two
and three copies.
-Greg


>
> Other remapped PGs have 521 in their "up" set but still have the two
> existing copies in their "acting" set - and no degradation is shown.
> Examples:
>
> 2.f24    14282    0    16    28564    0    51014850801    3102    3102
> active+remapped+wait_backfill    2017-06-20 14:12:42.650308
> 583553'2033479    583573:2033266    [467,521]    467    [467,499]    467
> 582430'2033337    2017-06-16 09:08:51.055131    582036'2030837
> 2017-05-31 20:37:54.831178
> 6.2b7d    10499    0    140    20998    0    37242874687    3673
> 3673    active+remapped+wait_backfill    2017-06-20 14:12:42.070019
> 583569'165163    583572:342128    [541,37,521]    541    [541,37,532]
> 541    582430'161890    2017-06-18 09:42:49.148402    582430'161890
> 2017-06-18 09:42:49.148402
>
> We are running the latest Jewel patch level everywhere (10.2.7).  Any
> insights would be appreciated.
>
> Andras
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to