-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Ok,
was a fail. After now 1 week we had 3 outages on one machine.
I have actually no idea anymore.
My actual kernel settings:
# disable TIME_WAIT.. wait..
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_tw_reuse=1
# disable syn cookies
net.ipv4.tcp_syncook
On 11/11/15 00:59, Heiko Krämer wrote:
Hi Mark,
thanks for your response.
I changed the dirty page settings again and i have a logging server
which gets the syslog and swift logs.
But this messages will not help to find the problem. They are stale
processes and it seems the start partition is
Hi Mark,
thanks for your response.
I changed the dirty page settings again and i have a logging server
which gets the syslog and swift logs.
Nov 10 08:28:17 storage3 kernel: [92476.204616] INFO: task
xfsaild/sdd:1108 blocked for more than 120 seconds.
Nov 10 08:28:17 storage3 kernel: [92476.20463
Right,
I agree that the dirty size of the cache could be the issue
(particularly with SATA drives in a JBOD array without the help of any
writeback caches etc).
You might even want to wind those settings down a bit more (3 and 6
perhaps which means with 32G of ram your dirty cache size is be
On 10/11/15 00:41, Eren Türkay wrote:
On 09-11-2015 12:39, Heiko Krämer wrote:
You're right only a hard reboot can solve the problem because SSH
login or other commands can't be executed because the whole system was
frozen.
Hello Heiko,
I just want to give some tips about debugging those kind
On 09-11-2015 12:39, Heiko Krämer wrote:
> You're right only a hard reboot can solve the problem because SSH
> login or other commands can't be executed because the whole system was
> frozen.
Hello Heiko,
I just want to give some tips about debugging those kind of issues. I had
completely differe
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi Mark,
nothing can be logged because all disks are stale.
I see only an output on the IPMI
blocked for more than 120 seconds
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
I found there could be a problem with th
Do you reboot the machine? It might be interesting just restart the
swift storage services and see if that brings everything right again.
Also check out the swift logs for what it is doing when things start to
hang, and also what dmesg is saying at the time.
A bit more info about your setup w
Hi guys,
we notice on our Swift storage nodes some problems with our disks.
After some time they blocks all I/O requests to the disks. Therefore the
server isn't working suddenly and needs a reboot.
Serversetup:
* Kernel 3.19.x
* Ubuntu 14.04
* Swift (Kilo)
* 12 x 2TB SATA (JBOD)
* 2 x 480GB SSD