After doing this, I've found that I'm having problems with a few
specific PGs.  If I set nodeep-scrub, then manually deep-scrub one
specific PG, the responsible OSDs get kicked out.  I'm starting a new
discussion, subject: "I have PGs that I can't deep-scrub"

I'll re-test this correlation after I fix the broken PGs.

On Mon, Jun 9, 2014 at 10:20 PM, Gregory Farnum <g...@inktank.com> wrote:
> On Mon, Jun 9, 2014 at 6:42 PM, Mike Dawson <mike.daw...@cloudapt.com> wrote:
>> Craig,
>>
>> I've struggled with the same issue for quite a while. If your i/o is similar
>> to mine, I believe you are on the right track. For the past month or so, I
>> have been running this cronjob:
>>
>> * * * * *       for strPg in `ceph pg dump | egrep '^[0-9]\.[0-9a-f]{1,4}' |
>> sort -k20 | awk '{ print $1 }' | head -2`; do ceph pg deep-scrub $strPg;
>> done
>>
>> That roughly handles my 20672 PGs that are set to be deep-scrubbed every 7
>> days. Your script may be a bit better, but this quick and dirty method has
>> helped my cluster maintain more consistency.
>>
>> The real key for me is to avoid the "clumpiness" I have observed without
>> that hack where concurrent deep-scrubs sit at zero for a long period of time
>> (despite having PGs that were months overdue for a deep-scrub), then
>> concurrent deep-scrubs suddenly spike up and stay in the teens for hours,
>> killing client writes/second.
>>
>> The scrubbing behavior table[0] indicates that a periodic tick initiates
>> scrubs on a per-PG basis. Perhaps the timing of ticks aren't sufficiently
>> randomized when you restart lots of OSDs concurrently (for instance via
>> pdsh).
>>
>> On my cluster I suffer a significant drag on client writes/second when I
>> exceed perhaps four or five concurrent PGs in deep-scrub. When concurrent
>> deep-scrubs get into the teens, I get a massive drop in client
>> writes/second.
>>
>> Greg, is there locking involved when a PG enters deep-scrub? If so, is the
>> entire PG locked for the duration or is each individual object inside the PG
>> locked as it is processed? Some of my PGs will be in deep-scrub for minutes
>> at a time.
>
> It locks very small regions of the key space, but the expensive part
> is that deep scrub actually has to read all the data off disk, and
> that's often a lot more disk seeks than simply examining the metadata
> is.
> -Greg
>
>>
>> 0: http://ceph.com/docs/master/dev/osd_internals/scrub/
>>
>> Thanks,
>> Mike Dawson
>>
>>
>>
>> On 6/9/2014 6:22 PM, Craig Lewis wrote:
>>>
>>> I've correlated a large deep scrubbing operation to cluster stability
>>> problems.
>>>
>>> My primary cluster does a small amount of deep scrubs all the time,
>>> spread out over the whole week.  It has no stability problems.
>>>
>>> My secondary cluster doesn't spread them out.  It saves them up, and
>>> tries to do all of the deep scrubs over the weekend.  The secondary
>>> starts loosing OSDs about an hour after these deep scrubs start.
>>>
>>> To avoid this, I'm thinking of writing a script that continuously scrubs
>>> the oldest outstanding PG.  In psuedo-bash:
>>> # Sort by the deep-scrub timestamp, taking the single oldest PG
>>> while ceph pg dump | awk '$1 ~ /[0-9a-f]+\.[0-9a-f]+/ {print $20, $21,
>>> $1}' | sort | head -1 | read date time pg
>>>   do
>>>    ceph pg deep-scrub ${pg}
>>>    while ceph status | grep scrubbing+deep
>>>     do
>>>      sleep 5
>>>    done
>>>    sleep 30
>>> done
>>>
>>>
>>> Does anybody think this will solve my problem?
>>>
>>> I'm also considering disabling deep-scrubbing until the secondary
>>> finishes replicating from the primary.  Once it's caught up, the write
>>> load should drop enough that opportunistic deep scrubs should have a
>>> chance to run.  It should only take another week or two to catch up.
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to