Hi All

I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners, journals on
NVME. Cluster primarily used for CephFS, ~20M objects.

I'm seeing some OSDs getting marked down, it appears to be related to PG
splitting, e.g:

2018-02-26 10:27:27.935489 7f140dbe2700  1 _created [C,D] has 5121 objects,
> starting split.
>

Followed by:

2018-02-26 10:27:58.242551 7f141cc3f700  0 log_channel(cluster) log [WRN] :
> 9 slow requests, 5 included below; oldest blocked for > 30.308128 secs
> 2018-02-26 10:27:58.242563 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.151105 seconds old, received at 2018-02-26
> 10:27:28.091312: osd_op(mds.0.5339:811969 3.5c
> 3:3bb9d743:::200.0018c6c4:head [write 73416~5897 [fadvise_dontneed]] snapc
> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
> commit_sent
> 2018-02-26 10:27:58.242569 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.133441 seconds old, received at 2018-02-26
> 10:27:28.108976: osd_op(mds.0.5339:811970 3.5c
> 3:3bb9d743:::200.0018c6c4:head [write 79313~4866 [fadvise_dontneed]] snapc
> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
> commit_sent
> 2018-02-26 10:27:58.242574 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.083401 seconds old, received at 2018-02-26
> 10:27:28.159016: osd_op(mds.9174516.0:444202 3.5c
> 3:3bb9d743:::200.0018c6c4:head [stat] snapc 0=[]
> ondisk+read+rwordered+known_if_redirected+full_force e13994) currently
> waiting for rw locks
> 2018-02-26 10:27:58.242579 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.072310 seconds old, received at 2018-02-26
> 10:27:28.170107: osd_op(mds.0.5339:811971 3.5c
> 3:3bb9d743:::200.0018c6c4:head [write 84179~1941 [fadvise_dontneed]] snapc
> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently waiting
> for rw locks
> 2018-02-26 10:27:58.242584 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.308128 seconds old, received at 2018-02-26
> 10:27:27.934288: osd_op(mds.0.5339:811964 3.5c
> 3:3bb9d743:::200.0018c6c4:head [write 0~62535 [fadvise_dontneed]] snapc
> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
> commit_sent
> 2018-02-26 10:27:59.242768 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : 47 slow requests, 5 included below; oldest blocked for > 31.308410 secs
> 2018-02-26 10:27:59.242776 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.349575 seconds old, received at 2018-02-26
> 10:27:28.893124:


I'm also experiencing some MDS crash issues which I think could be related.

Is there anything I can do to mitigate the slow requests problem? The rest
of the time the cluster is performing pretty well.

Thanks,
David
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to