We actually disabled swap all together on these machines...
On Thu, Jun 12, 2014 at 5:06 PM, Gregory Farnum wrote:
> To be clear, that's the solution to one of the causes of this issue.
> The log message is very general, and just means that a disk access
> thread has been gone for a long time (
To be clear, that's the solution to one of the causes of this issue.
The log message is very general, and just means that a disk access
thread has been gone for a long time (15 seconds, in this case)
without checking in (so usually, it's been inside of a read/write
syscall for >=15 seconds).
Other
Can you check and see if swap is being used on your OSD servers when
this happens, and even better, use something like collectl or another
tool to look for major page faults?
If you see anything like this, you may want to tweak swappiness to be
lower (say 10).
Mark
On 06/12/2014 03:17 PM, X
I've done some more tracing. It looks like the high IO wait in VMs are
somewhat correlated when some OSDs have high inflight ops (ceph admin
socket, dump_ops_in_flight).
When in_flight_ops is high, I see something like this in the OSD log:
2014-06-12 19:57:24.572338 7f4db6bdf700 1 heartbeat_map r
On 06/12/2014 08:47 AM, Xu (Simon) Chen wrote:
1) I did check iostat on all OSDs, and iowait seems normal.
2) ceph -w shows no correlation between high io wait and high iops.
Sometimes the reverse is true: when io wait is high (since it's a
cluster wide thing), the overall ceph iops drops too.
1) I did check iostat on all OSDs, and iowait seems normal.
2) ceph -w shows no correlation between high io wait and high iops.
Sometimes the reverse is true: when io wait is high (since it's a cluster
wide thing), the overall ceph iops drops too.
3) We have collectd running in VMs, and that's how
Hi Simon,
Did you check iostat on the OSDs to check their utilization? What does your
ceph -w say - pehaps you’re maxing your cluster’s IOPS?
Also, are you running any monitoring of your VMs iostats? We’ve often found
some culprits overusing IOs..
Kind Regards,
David Majchrzak
12 jun 2014 kl.