Re: [Openstack] [Swift] block I/O all disks

2015-11-18 Thread Heiko Krämer
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ok, was a fail. After now 1 week we had 3 outages on one machine. I have actually no idea anymore. My actual kernel settings: # disable TIME_WAIT.. wait.. net.ipv4.tcp_tw_recycle=1 net.ipv4.tcp_tw_reuse=1 # disable syn cookies net.ipv4.tcp_syncook

Re: [Openstack] [Swift] block I/O all disks

2015-11-10 Thread Mark Kirkwood
On 11/11/15 00:59, Heiko Krämer wrote: Hi Mark, thanks for your response. I changed the dirty page settings again and i have a logging server which gets the syslog and swift logs. But this messages will not help to find the problem. They are stale processes and it seems the start partition is

Re: [Openstack] [Swift] block I/O all disks

2015-11-10 Thread Heiko Krämer
Hi Mark, thanks for your response. I changed the dirty page settings again and i have a logging server which gets the syslog and swift logs. Nov 10 08:28:17 storage3 kernel: [92476.204616] INFO: task xfsaild/sdd:1108 blocked for more than 120 seconds. Nov 10 08:28:17 storage3 kernel: [92476.20463

Re: [Openstack] [Swift] block I/O all disks

2015-11-09 Thread Mark Kirkwood
Right, I agree that the dirty size of the cache could be the issue (particularly with SATA drives in a JBOD array without the help of any writeback caches etc). You might even want to wind those settings down a bit more (3 and 6 perhaps which means with 32G of ram your dirty cache size is be

Re: [Openstack] [Swift] block I/O all disks

2015-11-09 Thread Mark Kirkwood
On 10/11/15 00:41, Eren Türkay wrote: On 09-11-2015 12:39, Heiko Krämer wrote: You're right only a hard reboot can solve the problem because SSH login or other commands can't be executed because the whole system was frozen. Hello Heiko, I just want to give some tips about debugging those kind

Re: [Openstack] [Swift] block I/O all disks

2015-11-09 Thread Eren Türkay
On 09-11-2015 12:39, Heiko Krämer wrote: > You're right only a hard reboot can solve the problem because SSH > login or other commands can't be executed because the whole system was > frozen. Hello Heiko, I just want to give some tips about debugging those kind of issues. I had completely differe

Re: [Openstack] [Swift] block I/O all disks

2015-11-09 Thread Heiko Krämer
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Mark, nothing can be logged because all disks are stale. I see only an output on the IPMI blocked for more than 120 seconds kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. I found there could be a problem with th

Re: [Openstack] [Swift] block I/O all disks

2015-11-06 Thread Mark Kirkwood
Do you reboot the machine? It might be interesting just restart the swift storage services and see if that brings everything right again. Also check out the swift logs for what it is doing when things start to hang, and also what dmesg is saying at the time. A bit more info about your setup w

[Openstack] [Swift] block I/O all disks

2015-11-04 Thread Heiko Krämer
Hi guys, we notice on our Swift storage nodes some problems with our disks. After some time they blocks all I/O requests to the disks. Therefore the server isn't working suddenly and needs a reboot. Serversetup: * Kernel 3.19.x * Ubuntu 14.04 * Swift (Kilo) * 12 x 2TB SATA (JBOD) * 2 x 480GB SSD