Right,
I agree that the dirty size of the cache could be the issue
(particularly with SATA drives in a JBOD array without the help of any
writeback caches etc).
You might even want to wind those settings down a bit more (3 and 6
perhaps which means with 32G of ram your dirty cache size is between 1
and 2G), should stop the massive IO stall.
regards
Mark
On 09/11/15 23:39, Heiko Krämer wrote:
Hi Mark,
nothing can be logged because all disks are stale.
I see only an output on the IPMI
blocked for more than 120 seconds
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
I found there could be a problem with the write caching of the disks.
So i have reduced the available dirty cache:
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
I hope this will solve the issue.
You're right only a hard reboot can solve the problem because SSH
login or other commands can't be executed because the whole system was
frozen.
That's the problem i can't get any deeper informations was happened.
_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack