Re: [ceph-users] Degraded objects while OSD is being added/filled

Eino Tuominen Mon, 10 Jul 2017 04:25:48 -0700

Hi Greg,


I was not clear enough. First I set the weight to 0 (ceph osd out), I waited 
until the cluster was stable and healthy (all pgs active+clean). Then I went 
and removed the now empty osds. That was when I saw degraded objects. I'm soon 
about to add some new disks to the cluster. I can reproduce this on the cluster 
if you'd like to see what's happening. What would help to debug this? ceph osd 
dump and ceph pg dump before and after the modifications?


--

  Eino Tuominen


________________________________
From: Gregory Farnum <gfar...@redhat.com>
Sent: Thursday, July 6, 2017 19:20
To: Eino Tuominen; Andras Pataki; ceph-users
Subject: Re: [ceph-users] Degraded objects while OSD is being added/filled



On Tue, Jul 4, 2017 at 10:47 PM Eino Tuominen <e...@utu.fi<mailto:e...@utu.fi>> 
wrote:

Hello,


I noticed the same behaviour in our cluster.


ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)



    cluster 0a9f2d69-5905-4369-81ae-e36e4a791831

     health HEALTH_WARN

            1 pgs backfill_toofull

            4366 pgs backfill_wait

            11 pgs backfilling

            45 pgs degraded

            45 pgs recovery_wait

            45 pgs stuck degraded

            4423 pgs stuck unclean

            recovery 181563/302722835 objects degraded (0.060%)

            recovery 57192879/302722835 objects misplaced (18.893%)

            1 near full osd(s)

            noout,nodeep-scrub flag(s) set

     monmap e3: 3 mons at 
{0=130.232.243.65:6789/0,1=130.232.243.66:6789/0,2=130.232.243.67:6789/0<http://130.232.243.65:6789/0,1=130.232.243.66:6789/0,2=130.232.243.67:6789/0>}

            election epoch 356, quorum 0,1,2 0,1,2

     osdmap e388588: 260 osds: 260 up, 242 in; 4378 remapped pgs

            flags nearfull,noout,nodeep-scrub,require_jewel_osds

      pgmap v80658624: 25728 pgs, 8 pools, 202 TB data, 89212 kobjects

            612 TB used, 300 TB / 912 TB avail

            181563/302722835 objects degraded (0.060%)

            57192879/302722835 objects misplaced (18.893%)

               21301 active+clean

                4366 active+remapped+wait_backfill

                  45 active+recovery_wait+degraded

                  11 active+remapped+backfilling

                   4 active+clean+scrubbing

                   1 active+remapped+backfill_toofull

recovery io 421 MB/s, 155 objects/s

  client io 201 kB/s rd, 2034 B/s wr, 75 op/s rd, 0 op/s wr

I'm currently doing a rolling migration from Puppet on Ubuntu to Ansible on 
RHEL, and I started with a healthy cluster, evacuated some nodes by setting 
their weight to 0, removed them from the cluster and re-added them with ansible 
playbook.

Basically I ran


        ceph osd crush remove osd.$num

        ceph osd rm $num

        ceph auth del osd.$num

in a loop for the osds I was replacing, and then let the ansible ceph-osd 
playbook to bring the host back to the cluster. Crushmap is attached.

This case is different. If you are removing OSDs before they've had the chance 
to offload themselves, objects are going to be degraded since you're removing a 
copy! :)
-Greg


--
  Eino Tuominen


________________________________
From: ceph-users 
<ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>> 
on behalf of Gregory Farnum <gfar...@redhat.com<mailto:gfar...@redhat.com>>
Sent: Friday, June 30, 2017 23:38
To: Andras Pataki; ceph-users
Subject: Re: [ceph-users] Degraded objects while OSD is being added/filled

On Wed, Jun 21, 2017 at 6:57 AM Andras Pataki 
<apat...@flatironinstitute.org<mailto:apat...@flatironinstitute.org>> wrote:
Hi cephers,

I noticed something I don't understand about ceph's behavior when adding an 
OSD.  When I start with a clean cluster (all PG's active+clean) and add an OSD 
(via ceph-deploy for example), the crush map gets updated and PGs get 
reassigned to different OSDs, and the new OSD starts getting filled with data.  
As the new OSD gets filled, I start seeing PGs in degraded states.  Here is an 
example:

      pgmap v52068792: 42496 pgs, 6 pools, 1305 TB data, 390 Mobjects
            3164 TB used, 781 TB / 3946 TB avail
            8017/994261437 objects degraded (0.001%)
            2220581/994261437 objects misplaced (0.223%)
               42393 active+clean
                  91 active+remapped+wait_backfill
                   9 active+clean+scrubbing+deep
                   1 active+recovery_wait+degraded
                   1 active+clean+scrubbing
                   1 active+remapped+backfilling

Any ideas why there would be any persistent degradation in the cluster while 
the newly added drive is being filled?  It takes perhaps a day or two to fill 
the drive - and during all this time the cluster seems to be running degraded.  
As data is written to the cluster, the number of degraded objects increases 
over time.  Once the newly added OSD is filled, the cluster comes back to clean 
again.

Here is the PG that is degraded in this picture:

7.87c    1    0    2    0    0    4194304    7    7    
active+recovery_wait+degraded    2017-06-20 14:12:44.119921    344610'7    
583572:2797    [402,521]    402    [402,521]    402    344610'7    2017-06-16 
06:04:55.822503    344610'7    2017-06-16 06:04:55.822503

The newly added osd here is 521.  Before it got added, this PG had two replicas 
clean, but one got forgotten somehow?

This sounds a bit concerning at first glance. Can you provide some output of 
exactly what commands you're invoking, and the "ceph -s" output as it changes 
in response?

I really don't see how adding a new OSD can result in it "forgetting" about 
existing valid copies — it's definitely not supposed to — so I wonder if 
there's a collision in how it's deciding to remove old locations.

Are you running with only two copies of your data? It shouldn't matter but 
there could also be errors resulting in a behavioral difference between two and 
three copies.
-Greg


Other remapped PGs have 521 in their "up" set but still have the two existing 
copies in their "acting" set - and no degradation is shown.  Examples:

2.f24    14282    0    16    28564    0    51014850801    3102    3102    
active+remapped+wait_backfill    2017-06-20 14:12:42.650308    583553'2033479   
 583573:2033266    [467,521]    467    [467,499]    467    582430'2033337    
2017-06-16 09:08:51.055131    582036'2030837    2017-05-31 20:37:54.831178
6.2b7d    10499    0    140    20998    0    37242874687    3673    3673    
active+remapped+wait_backfill    2017-06-20 14:12:42.070019    583569'165163    
583572:342128    [541,37,521]    541    [541,37,532]    541    582430'161890    
2017-06-18 09:42:49.148402    582430'161890    2017-06-18 09:42:49.148402

We are running the latest Jewel patch level everywhere (10.2.7).  Any insights 
would be appreciated.

Andras

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Degraded objects while OSD is being added/filled

Reply via email to