On Tue, Jul 4, 2017 at 10:47 PM Eino Tuominen <e...@utu.fi> wrote: > Hello, > > > I noticed the same behaviour in our cluster. > > > ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) > > > > cluster 0a9f2d69-5905-4369-81ae-e36e4a791831 > > health HEALTH_WARN > > 1 pgs backfill_toofull > > 4366 pgs backfill_wait > > 11 pgs backfilling > > 45 pgs degraded > > 45 pgs recovery_wait > > 45 pgs stuck degraded > > 4423 pgs stuck unclean > > recovery 181563/302722835 objects degraded (0.060%) > > recovery 57192879/302722835 objects misplaced (18.893%) > > 1 near full osd(s) > > noout,nodeep-scrub flag(s) set > > monmap e3: 3 mons at {0= > 130.232.243.65:6789/0,1=130.232.243.66:6789/0,2=130.232.243.67:6789/0} > > election epoch 356, quorum 0,1,2 0,1,2 > > osdmap e388588: 260 osds: 260 up, 242 in; 4378 remapped pgs > > flags nearfull,noout,nodeep-scrub,require_jewel_osds > > pgmap v80658624: 25728 pgs, 8 pools, 202 TB data, 89212 kobjects > > 612 TB used, 300 TB / 912 TB avail > > 181563/302722835 objects degraded (0.060%) > > 57192879/302722835 objects misplaced (18.893%) > > 21301 active+clean > > 4366 active+remapped+wait_backfill > > 45 active+recovery_wait+degraded > > 11 active+remapped+backfilling > > 4 active+clean+scrubbing > > 1 active+remapped+backfill_toofull > > recovery io 421 MB/s, 155 objects/s > > client io 201 kB/s rd, 2034 B/s wr, 75 op/s rd, 0 op/s wr > > I'm currently doing a rolling migration from Puppet on Ubuntu to Ansible > on RHEL, and I started with a healthy cluster, evacuated some nodes by > setting their weight to 0, removed them from the cluster and re-added them > with ansible playbook. > > Basically I ran > > ceph osd crush remove osd.$num > > ceph osd rm $num > > ceph auth del osd.$num > > in a loop for the osds I was replacing, and then let the ansible ceph-osd > playbook to bring the host back to the cluster. Crushmap is attached. >
This case is different. If you are removing OSDs before they've had the chance to offload themselves, objects are going to be degraded since you're removing a copy! :) -Greg > > -- > Eino Tuominen > > > ------------------------------ > *From:* ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of > Gregory Farnum <gfar...@redhat.com> > *Sent:* Friday, June 30, 2017 23:38 > *To:* Andras Pataki; ceph-users > *Subject:* Re: [ceph-users] Degraded objects while OSD is being > added/filled > > On Wed, Jun 21, 2017 at 6:57 AM Andras Pataki < > apat...@flatironinstitute.org> wrote: > >> Hi cephers, >> >> I noticed something I don't understand about ceph's behavior when adding >> an OSD. When I start with a clean cluster (all PG's active+clean) and add >> an OSD (via ceph-deploy for example), the crush map gets updated and PGs >> get reassigned to different OSDs, and the new OSD starts getting filled >> with data. As the new OSD gets filled, I start seeing PGs in degraded >> states. Here is an example: >> >> pgmap v52068792: 42496 pgs, 6 pools, 1305 TB data, 390 Mobjects >> 3164 TB used, 781 TB / 3946 TB avail >> * 8017/994261437 objects degraded (0.001%)* >> 2220581/994261437 objects misplaced (0.223%) >> 42393 active+clean >> 91 active+remapped+wait_backfill >> 9 active+clean+scrubbing+deep >> * 1 active+recovery_wait+degraded* >> 1 active+clean+scrubbing >> 1 active+remapped+backfilling >> >> >> Any ideas why there would be any persistent degradation in the cluster >> while the newly added drive is being filled? It takes perhaps a day or two >> to fill the drive - and during all this time the cluster seems to be >> running degraded. As data is written to the cluster, the number of >> degraded objects increases over time. Once the newly added OSD is filled, >> the cluster comes back to clean again. >> >> Here is the PG that is degraded in this picture: >> >> 7.87c 1 0 2 0 0 4194304 7 7 >> active+recovery_wait+degraded 2017-06-20 14:12:44.119921 344610'7 >> 583572:2797 [402,521] 402 [402,521] 402 344610'7 >> 2017-06-16 06:04:55.822503 344610'7 2017-06-16 06:04:55.822503 >> >> The newly added osd here is 521. Before it got added, this PG had two >> replicas clean, but one got forgotten somehow? >> > > This sounds a bit concerning at first glance. Can you provide some output > of exactly what commands you're invoking, and the "ceph -s" output as it > changes in response? > > I really don't see how adding a new OSD can result in it "forgetting" > about existing valid copies — it's definitely not supposed to — so I wonder > if there's a collision in how it's deciding to remove old locations. > > Are you running with only two copies of your data? It shouldn't matter but > there could also be errors resulting in a behavioral difference between two > and three copies. > -Greg > > >> >> Other remapped PGs have 521 in their "up" set but still have the two >> existing copies in their "acting" set - and no degradation is shown. >> Examples: >> >> 2.f24 14282 0 16 28564 0 51014850801 3102 3102 >> active+remapped+wait_backfill 2017-06-20 14:12:42.650308 >> 583553'2033479 583573:2033266 [467,521] 467 [467,499] 467 >> 582430'2033337 2017-06-16 09:08:51.055131 582036'2030837 >> 2017-05-31 20:37:54.831178 >> 6.2b7d 10499 0 140 20998 0 37242874687 3673 >> 3673 active+remapped+wait_backfill 2017-06-20 14:12:42.070019 >> 583569'165163 583572:342128 [541,37,521] 541 [541,37,532] >> 541 582430'161890 2017-06-18 09:42:49.148402 582430'161890 >> 2017-06-18 09:42:49.148402 >> >> We are running the latest Jewel patch level everywhere (10.2.7). Any >> insights would be appreciated. >> >> Andras >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com