On Wed, Jun 21, 2017 at 6:57 AM Andras Pataki <apat...@flatironinstitute.org> wrote:
> Hi cephers, > > I noticed something I don't understand about ceph's behavior when adding > an OSD. When I start with a clean cluster (all PG's active+clean) and add > an OSD (via ceph-deploy for example), the crush map gets updated and PGs > get reassigned to different OSDs, and the new OSD starts getting filled > with data. As the new OSD gets filled, I start seeing PGs in degraded > states. Here is an example: > > pgmap v52068792: 42496 pgs, 6 pools, 1305 TB data, 390 Mobjects > 3164 TB used, 781 TB / 3946 TB avail > * 8017/994261437 objects degraded (0.001%)* > 2220581/994261437 objects misplaced (0.223%) > 42393 active+clean > 91 active+remapped+wait_backfill > 9 active+clean+scrubbing+deep > * 1 active+recovery_wait+degraded* > 1 active+clean+scrubbing > 1 active+remapped+backfilling > > > Any ideas why there would be any persistent degradation in the cluster > while the newly added drive is being filled? It takes perhaps a day or two > to fill the drive - and during all this time the cluster seems to be > running degraded. As data is written to the cluster, the number of > degraded objects increases over time. Once the newly added OSD is filled, > the cluster comes back to clean again. > > Here is the PG that is degraded in this picture: > > 7.87c 1 0 2 0 0 4194304 7 7 > active+recovery_wait+degraded 2017-06-20 14:12:44.119921 344610'7 > 583572:2797 [402,521] 402 [402,521] 402 344610'7 > 2017-06-16 06:04:55.822503 344610'7 2017-06-16 06:04:55.822503 > > The newly added osd here is 521. Before it got added, this PG had two > replicas clean, but one got forgotten somehow? > This sounds a bit concerning at first glance. Can you provide some output of exactly what commands you're invoking, and the "ceph -s" output as it changes in response? I really don't see how adding a new OSD can result in it "forgetting" about existing valid copies — it's definitely not supposed to — so I wonder if there's a collision in how it's deciding to remove old locations. Are you running with only two copies of your data? It shouldn't matter but there could also be errors resulting in a behavioral difference between two and three copies. -Greg > > Other remapped PGs have 521 in their "up" set but still have the two > existing copies in their "acting" set - and no degradation is shown. > Examples: > > 2.f24 14282 0 16 28564 0 51014850801 3102 3102 > active+remapped+wait_backfill 2017-06-20 14:12:42.650308 > 583553'2033479 583573:2033266 [467,521] 467 [467,499] 467 > 582430'2033337 2017-06-16 09:08:51.055131 582036'2030837 > 2017-05-31 20:37:54.831178 > 6.2b7d 10499 0 140 20998 0 37242874687 3673 > 3673 active+remapped+wait_backfill 2017-06-20 14:12:42.070019 > 583569'165163 583572:342128 [541,37,521] 541 [541,37,532] > 541 582430'161890 2017-06-18 09:42:49.148402 582430'161890 > 2017-06-18 09:42:49.148402 > > We are running the latest Jewel patch level everywhere (10.2.7). Any > insights would be appreciated. > > Andras > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com