tags 944247 + moreinfo severity 944247 normal thanks Hi Mario,
On 11/6/19 4:46 PM, mario wrote: > Source: xen > Severity: important > > Dear Maintainer, > > we have updated our server from debian oldstable (which unfortunately wasn't > running stable after the last update, bug reported) to debian buster. > > unfortunately xen doesn't work reliably there either: > > the virtual server crashes every 1-2 week with i/o problems and sometimes > also takes other domU instances with it. > we use qcow2 images. > > the harddisk of the domU is simply no longer accessible for the linux kernel, > no logfiles are available. in the xl console the following last lines can be > read, login not possible: > > [ 1450.976415] INFO: task nginx:376 blocked for more than 120 seconds. > [ 1450.976423] Not tainted 4.9.0-9-amd64 #1 Debian 4.9.168-1+deb9u5 > [ 1450.976428] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 1450.976469] INFO: task nginx:377 blocked for more than 120 seconds. > [ 1450.976474] Not tainted 4.9.0-9-amd64 #1 Debian 4.9.168-1+deb9u5 > [ 1450.976479] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 1450.976624] INFO: task nginx:378 blocked for more than 120 seconds. > > the process varies: > [1523692.508073] INFO: task jbd2/xvda2-8:159 blocked for more than 120 seconds > [1523692.508084] Not tainted [...] > > all hard disk accesses fail as if the i/o system is completely dead. > only "xl destroy <domid>" and recreate will help This report is now a year old. Unfortunately it did not get any reply. This might have several reasons, and one of them is probably that there's not someone else around reading it that uses the same storage configuration and as well runs into the same problem. > you can easily reproduce this with the tool stress "stress -c 8 -i 8 -d 8". > it takes a maximum of 10 minutes until the vm crashes. > > in our experience, as a workaround you can convert all images to raw. after > our tests, the error will no longer occur. > but since we need the snapshot functions of qcow2 images, this is not a > permanent solution. > > does anyone else have problems with qcow2 images and xen under buster? > maybe this also concerns qemu? > > [...] To be honest, I do not know. Have you been able to find out more about the problem yet, in the last year? Have you taken steps to try narrow down the problem by investigating other combinations of used software with/without xen? I mean, for example, reboot into just Linux and mount the qcow2 image somewhere and do the same load test to see if it's also happening when eliminating Xen from the equation? The bug report right now is not really actionable for anyone else than yourself. As Debian Xen team we unfortunately do not have the bandwidth to go set up a test server with the same configuration as you have and try hammer on it and cause the same problem to happen. Thanks, Hans