Hi Greg,

On 4/19/2014 2:20 PM, Greg Poirier wrote:
We have a cluster in a sub-optimal configuration with data and journal
colocated on OSDs (that coincidentally are spinning disks).

During recovery/backfill, the entire cluster suffers degraded
performance because of the IO storm that backfills cause. Client IO
becomes extremely latent.

Graph '%util' or simply watch it from 'iostat -xt 2'. It may likely show you the bottleneck is iops available from your spinning disks. Client IO can see significant latency (or at worse complete stalls) as your disks approach saturation.

> I've tried to decrease the impact that
recovery/backfill has with the following:

ceph tell osd.* injectargs '--osd-max-backfills 1'
ceph tell osd.* injectargs '--osd-max-recovery-threads 1'
ceph tell osd.* injectargs '--osd-recovery-op-priority 1'
ceph tell osd.* injectargs '--osd-client-op-priority 63'
ceph tell osd.* injectargs '--osd-recovery-max-active 1'

On our cluster, these settings can be an effective method for minimizing disruption. I'd also recommend you disable deep scrub by:

ceph osd set nodeep-scrub

Re-enable it later with:

ceph osd unset nodeep-scrub

I have some clients that are much more susceptible to disruptions from spindle contention during recovery/backfill. Others operate without disruption. I am working to quantify the difference, but I believe it is related to caching or syncing behavior of the individual application/OS.


The only other option I have left would be to use linux traffic shaping
to artificially reduce the bandwidth available to the interfaced tagged
for cluster traffic (instead of separate physical networks, we use VLAN
tagging). We are nowhere _near_ the point where network saturation would
cause the latency we're seeing, so I am left to believe that it is
simply disk IO saturation.

I could be wrong about this assumption, though, as iostat doesn't
terrify me. This could be suboptimal network configuration on the
cluster as well. I'm still looking into that possibility, but I wanted
to get feedback on what I'd done already first--as well as the proposed
traffic shaping idea.

Thoughts?


I would exhaust all troubleshooting / tuning related to spindle contention before spending much more than a cursory look at network sanity.

It sounds to me that you simply don't have enough IOPS available as configured in your cluster to operate your client IO workload while also absorbing the performance hit of recovery/backfill.

With a workload consisting of lots of small writes, I've seen client IO starved with as little as 5Mbps of traffic per host due to spindle contention once deep-scrub and/or recovery/backfill start. Co-locating OSD Journals on the same spinners as you have will double that likelihood.

Possible solutions include moving OSD Journals to SSD (with a reasonable ratio), expanding the cluster, or increasing the performance of underlying storage.

Cheers,
Mike


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to