Re: [ceph-users] spiky io wait within VMs running on rbd

2014-06-12 Thread Xu (Simon) Chen
We actually disabled swap all together on these machines... On Thu, Jun 12, 2014 at 5:06 PM, Gregory Farnum wrote: > To be clear, that's the solution to one of the causes of this issue. > The log message is very general, and just means that a disk access > thread has been gone for a long time (

Re: [ceph-users] spiky io wait within VMs running on rbd

2014-06-12 Thread Gregory Farnum
To be clear, that's the solution to one of the causes of this issue. The log message is very general, and just means that a disk access thread has been gone for a long time (15 seconds, in this case) without checking in (so usually, it's been inside of a read/write syscall for >=15 seconds). Other

Re: [ceph-users] spiky io wait within VMs running on rbd

2014-06-12 Thread Mark Nelson
Can you check and see if swap is being used on your OSD servers when this happens, and even better, use something like collectl or another tool to look for major page faults? If you see anything like this, you may want to tweak swappiness to be lower (say 10). Mark On 06/12/2014 03:17 PM, X

Re: [ceph-users] spiky io wait within VMs running on rbd

2014-06-12 Thread Xu (Simon) Chen
I've done some more tracing. It looks like the high IO wait in VMs are somewhat correlated when some OSDs have high inflight ops (ceph admin socket, dump_ops_in_flight). When in_flight_ops is high, I see something like this in the OSD log: 2014-06-12 19:57:24.572338 7f4db6bdf700 1 heartbeat_map r

Re: [ceph-users] spiky io wait within VMs running on rbd

2014-06-12 Thread Mark Nelson
On 06/12/2014 08:47 AM, Xu (Simon) Chen wrote: 1) I did check iostat on all OSDs, and iowait seems normal. 2) ceph -w shows no correlation between high io wait and high iops. Sometimes the reverse is true: when io wait is high (since it's a cluster wide thing), the overall ceph iops drops too.

Re: [ceph-users] spiky io wait within VMs running on rbd

2014-06-12 Thread Xu (Simon) Chen
1) I did check iostat on all OSDs, and iowait seems normal. 2) ceph -w shows no correlation between high io wait and high iops. Sometimes the reverse is true: when io wait is high (since it's a cluster wide thing), the overall ceph iops drops too. 3) We have collectd running in VMs, and that's how

Re: [ceph-users] spiky io wait within VMs running on rbd

2014-06-12 Thread David
Hi Simon, Did you check iostat on the OSDs to check their utilization? What does your ceph -w say - pehaps you’re maxing your cluster’s IOPS? Also, are you running any monitoring of your VMs iostats? We’ve often found some culprits overusing IOs.. Kind Regards, David Majchrzak 12 jun 2014 kl.