On Mon, Feb 26, 2018 at 6:08 PM, David Turner <drakonst...@gmail.com> wrote:

> The slow requests are absolutely expected on filestore subfolder
> splitting.  You can however stop an OSD, split it's subfolders, and start
> it back up.  I perform this maintenance once/month.  I changed my settings
> to [1]these, but I only suggest doing something this drastic if you're
> committed to manually split your PGs regularly.  In my environment that
> needs to be once/month.
>

Hi David, to be honest I've still not completely got my head around the
filestore splitting, but one thing's for sure it's causing major IO issues
on my small cluster. If I understand correctly, your settings in [1]
completely disable "online" merging and splitting. Have I got that right?

Why is your filestore_merge_threshold -16 as opposed to -1?

You say you need to do your offline splitting on a monthly basis in your
environment but how are you arriving at that conclusion? What would I need
to monitor to discover how frequently I would need to do a split?

Thanks for all your help on this


>
> Along with those settings, I use [2]this script to perform the subfolder
> splitting. It will change your config file to [3]these settings, perform
> the subfolder splitting, change them back to what you currently have, and
> start your OSDs back up.  using a negative merge threshold prevents
> subfolder merging which is useful for some environments.
>
> The script automatically sets noout and unset it for you afterwards as
> well it won't start unless the cluster is health_ok.  Feel free to use it
> as is or pick from it what's useful for you.  I highly suggest that anyone
> feeling the pains of subfolder splitting to do some sort of offline
> splitting to get through it.  If you're using some sort of config
> management like salt or puppet, be sure to disable it so that the config
> won't be overwritten while the subfolders are being split.
>
>
> [1] filestore_merge_threshold = -16
>      filestore_split_multiple = 256
>
> [2] https://gist.github.com/drakonstein/cb76c7696e65522ab0e699b7ea1ab1c4
>
> [3] filestore_merge_threshold = -1
>      filestore_split_multiple = 1
> On Mon, Feb 26, 2018 at 12:18 PM David C <dcsysengin...@gmail.com> wrote:
>
>> Thanks, David. I think I've probably used the wrong terminology here, I'm
>> not splitting PGs to create more PGs. This is the PG folder splitting that
>> happens automatically, I believe it's controlled by the
>> "filestore_split_multiple" setting (which is 8 on my OSDs, I believe that's
>> the Luminous default...). Increasing heartbeat grace would probably still
>> be a good idea to prevent the flapping. I'm trying to understand if the
>> slow requests is to be expected or if I need to tune something or look at
>> hardware.
>>
>> On Mon, Feb 26, 2018 at 4:19 PM, David Turner <drakonst...@gmail.com>
>> wrote:
>>
>>> Splitting PG's is one of the most intensive and disruptive things you
>>> can, and should, do to a cluster.  Tweaking recovery sleep, max backfills,
>>> and heartbeat grace should help with this.  Heartbeat grace can be set high
>>> enough to mitigate the OSDs flapping which slows things down by peering and
>>> additional recovery, while still being able to detect OSDs that might fail
>>> and go down.  The recovery sleep and max backfills are the settings you
>>> want to look at for mitigating slow requests.  I generally tweak those
>>> while watching iostat of some OSDs and ceph -s to make sure I'm not giving
>>> too  much priority to the recovery operations so that client IO can still
>>> happen.
>>>
>>> On Mon, Feb 26, 2018 at 11:10 AM David C <dcsysengin...@gmail.com>
>>> wrote:
>>>
>>>> Hi All
>>>>
>>>> I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners,
>>>> journals on NVME. Cluster primarily used for CephFS, ~20M objects.
>>>>
>>>> I'm seeing some OSDs getting marked down, it appears to be related to
>>>> PG splitting, e.g:
>>>>
>>>> 2018-02-26 10:27:27.935489 7f140dbe2700  1 _created [C,D] has 5121
>>>>> objects, starting split.
>>>>>
>>>>
>>>> Followed by:
>>>>
>>>> 2018-02-26 10:27:58.242551 7f141cc3f700  0 log_channel(cluster) log
>>>>> [WRN] : 9 slow requests, 5 included below; oldest blocked for > 30.308128
>>>>> secs
>>>>> 2018-02-26 10:27:58.242563 7f141cc3f700  0 log_channel(cluster) log
>>>>> [WRN] : slow request 30.151105 seconds old, received at 2018-02-26
>>>>> 10:27:28.091312: osd_op(mds.0.5339:811969 3.5c
>>>>> 3:3bb9d743:::200.0018c6c4:head [write 73416~5897 [fadvise_dontneed]] snapc
>>>>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>>>>> commit_sent
>>>>> 2018-02-26 10:27:58.242569 7f141cc3f700  0 log_channel(cluster) log
>>>>> [WRN] : slow request 30.133441 seconds old, received at 2018-02-26
>>>>> 10:27:28.108976: osd_op(mds.0.5339:811970 3.5c
>>>>> 3:3bb9d743:::200.0018c6c4:head [write 79313~4866 [fadvise_dontneed]] snapc
>>>>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>>>>> commit_sent
>>>>> 2018-02-26 10:27:58.242574 7f141cc3f700  0 log_channel(cluster) log
>>>>> [WRN] : slow request 30.083401 seconds old, received at 2018-02-26
>>>>> 10:27:28.159016: osd_op(mds.9174516.0:444202 3.5c
>>>>> 3:3bb9d743:::200.0018c6c4:head [stat] snapc 0=[]
>>>>> ondisk+read+rwordered+known_if_redirected+full_force e13994)
>>>>> currently waiting for rw locks
>>>>> 2018-02-26 10:27:58.242579 7f141cc3f700  0 log_channel(cluster) log
>>>>> [WRN] : slow request 30.072310 seconds old, received at 2018-02-26
>>>>> 10:27:28.170107: osd_op(mds.0.5339:811971 3.5c
>>>>> 3:3bb9d743:::200.0018c6c4:head [write 84179~1941 [fadvise_dontneed]] snapc
>>>>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>>>>> waiting for rw locks
>>>>> 2018-02-26 10:27:58.242584 7f141cc3f700  0 log_channel(cluster) log
>>>>> [WRN] : slow request 30.308128 seconds old, received at 2018-02-26
>>>>> 10:27:27.934288: osd_op(mds.0.5339:811964 3.5c
>>>>> 3:3bb9d743:::200.0018c6c4:head [write 0~62535 [fadvise_dontneed]] snapc
>>>>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>>>>> commit_sent
>>>>> 2018-02-26 10:27:59.242768 7f141cc3f700  0 log_channel(cluster) log
>>>>> [WRN] : 47 slow requests, 5 included below; oldest blocked for > 31.308410
>>>>> secs
>>>>> 2018-02-26 10:27:59.242776 7f141cc3f700  0 log_channel(cluster) log
>>>>> [WRN] : slow request 30.349575 seconds old, received at 2018-02-26
>>>>> 10:27:28.893124:
>>>>
>>>>
>>>> I'm also experiencing some MDS crash issues which I think could be
>>>> related.
>>>>
>>>> Is there anything I can do to mitigate the slow requests problem? The
>>>> rest of the time the cluster is performing pretty well.
>>>>
>>>> Thanks,
>>>> David
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to