Hello,

On Thu, 6 Jul 2017 17:57:06 +0000 george.vasilaka...@stfc.ac.uk wrote:

> Thanks for your response David.
> 
> What you've described has been what I've been thinking about too. We have 
> 1401 OSDs in the cluster currently and this output is from the tail end of 
> the backfill for +64 PG increase on the biggest pool.
> 
> The problem is we see this cluster do at most 20 backfills at the same time 
> and as the queue of PGs to backfill gets smaller there are fewer and fewer 
> actively backfilling which I don't quite understand.
> 

Welcome to the club.
You're not the first one to wonder about this and while David's comment
about max_backfill is valid, it simply doesn't explain all of this.

See this and my thoughts about things, unfortunately no developer ever
followed up on it:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-May/009704.html

Christian

> Out of the PGs currently backfilling, all of them have completely changed 
> their sets (difference between acting and up sets is 11), which makes some 
> sense since what moves around are the newly spawned PGs. That's 5 PGs 
> currently in backfilling states which makes 110 OSDs blocked. What happened 
> to the other 1300? That's what's strange to me. There are another 7 waiting 
> to backfill.
> Out of all the OSDs in the up and acting sets of all PGs currently 
> backfilling or waiting to backfill there are 13 OSDs in common so I guess 
> that kind of answers it. I haven't checked to see but I suspect each 
> backfilling PG has at least one OSD in one of its sets in common with either 
> set of one of the waiting PGs.
> 
> So I guess we can't do much about the tail end taking so long: there's no way 
> for more of the PGs to actually be backfilling at the same time.
> 
> I think we'll have to try bumping osd_max_backfills. Has anyone tried bumping 
> the relative priorities of recovery vs others? What about noscrub?
> 
> Best regards,
> 
> George
> 
> ________________________________
> From: David Turner [drakonst...@gmail.com]
> Sent: 06 July 2017 16:08
> To: Vasilakakos, George (STFC,RAL,SC); ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Speeding up backfill after increasing PGs and or 
> adding OSDs
> 
> Just a quick place to start is osd_max_backfills.  You have this set to 1.  
> Each PG is on 11 OSDs.  When you have a PG moving, it is on the original 11 
> OSDs and the new X number of OSDs that it is going to.  For each of your PGs 
> that is moving, an OSD can only move 1 at a time (your osd_max_backfills), 
> and each PG is on 11 + X OSDs.
> 
> So with your cluster.  I don't see how many OSDs you have, but you have 25 
> PGs moving around and 8 of them are actively backfilling.  Assuming you were 
> only changing 1 OSD per backfill operation, that would mean that you had at 
> least 96 OSDs (11+1 * 8).  That would be a perfect distribution of OSDs for 
> the PGs backfilling.  Let's say now that you're averaging closer to 3 OSDs 
> changing per PG and that the remaining 17 PGs waiting to backfill are blocked 
> by a few OSDs each (because those OSDs are already included in the 8 active 
> backfilling PGs.  That would indicate that you have closer to 200+ OSDs.
> 
> Every time I'm backfilling and want to speed things up, I watch iostat on 
> some of my OSDs and increase osd_max_backfills until I'm consistently using 
> about 70% of the disk to allow for customer overhead.  You can always figure 
> out what's best for your use case though.  Generally I've been ok running 
> with osd_max_backfills=5 without much problem and bringing that up some when 
> I know that client IO will be minimal, but again it depends on your use case 
> and cluster.
> 
> On Thu, Jul 6, 2017 at 10:08 AM 
> <george.vasilaka...@stfc.ac.uk<mailto:george.vasilaka...@stfc.ac.uk>> wrote:
> Hey folks,
> 
> We have a cluster that's currently backfilling from increasing PG counts. We 
> have tuned recovery and backfill way down as a "precaution" and would like to 
> start tuning it to bring up to a good balance between that and client I/O.
> 
> At the moment we're in the process of bumping up PG numbers for pools serving 
> production workloads. Said pools are EC 8+3.
> 
> It looks like we're having very low numbers of PGs backfilling as in:
> 
>             2567 TB used, 5062 TB / 7630 TB avail
>             145588/849529410 objects degraded (0.017%)
>             5177689/849529410 objects misplaced (0.609%)
>                 7309 active+clean
>                   23 active+clean+scrubbing
>                   18 active+clean+scrubbing+deep
>                   13 active+remapped+backfill_wait
>                    5 active+undersized+degraded+remapped+backfilling
>                    4 active+undersized+degraded+remapped+backfill_wait
>                    3 active+remapped+backfilling
>                    1 active+clean+inconsistent
> recovery io 1966 MB/s, 96 objects/s
>   client io 726 MB/s rd, 147 MB/s wr, 89 op/s rd, 71 op/s wr
> 
> Also, the rate of recovery in terms of data and object throughput varies a 
> lot, even with the number of PGs backfilling remaining constant.
> 
> Here's the config in the OSDs:
> 
>     "osd_max_backfills": "1",
>     "osd_min_recovery_priority": "0",
>     "osd_backfill_full_ratio": "0.85",
>     "osd_backfill_retry_interval": "10",
>     "osd_allow_recovery_below_min_size": "true",
>     "osd_recovery_threads": "1",
>     "osd_backfill_scan_min": "16",
>     "osd_backfill_scan_max": "64",
>     "osd_recovery_thread_timeout": "30",
>     "osd_recovery_thread_suicide_timeout": "300",
>     "osd_recovery_sleep": "0",
>     "osd_recovery_delay_start": "0",
>     "osd_recovery_max_active": "5",
>     "osd_recovery_max_single_start": "1",
>     "osd_recovery_max_chunk": "8388608",
>     "osd_recovery_max_omap_entries_per_chunk": "64000",
>     "osd_recovery_forget_lost_objects": "false",
>     "osd_scrub_during_recovery": "false",
>     "osd_kill_backfill_at": "0",
>     "osd_debug_skip_full_check_in_backfill_reservation": "false",
>     "osd_debug_reject_backfill_probability": "0",
>     "osd_recovery_op_priority": "5",
>     "osd_recovery_priority": "5",
>     "osd_recovery_cost": "20971520",
>     "osd_recovery_op_warn_multiple": "16",
> 
> What I'm looking for, first of all, is a better understanding of the 
> mechanism that schedules the backfilling/recovery work; the end goal is to 
> understand how to tune this safely to achieve as close to an optimal balance 
> between rate at which recovery and client work is performed.
> 
> I'm thinking things like osd_max_backfills, 
> osd_backfill_scan_min/osd_backfill_scan_max might be prime candidates for 
> tuning.
> 
> Any thoughs/insights by the Ceph community will be greatly appreciated,
> 
> George
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian Balzer        Network/Systems Engineer                
ch...@gol.com           Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to