OK, the bug happened again with strace attached to nova-compute. Once again, there's little to no IO/network while it happens. memory is stable. CPU is at least 50% idle (and the rest of it largely user mode). Nothing in dmesg.
nova-compute logs are as follow : 2018-04-25 14:48:04.587 54255 INFO nova.virt.libvirt.driver [req-85551d96-713d-499d-b7ff-9f911fb0842d bc0ab055427645aca4ed09266e85b1db 1cb457a8302543fea067e5f14b5241e7 - - -] [instance: bd17aeef-240b-489c-8bb6-b37167155174] Deleting instance files /srv/nova/instances/bd17aeef-240b-489c-8bb6-b37167155174_del 2018-04-25 14:52:06.350 54255 INFO nova.virt.libvirt.driver [req- 85551d96-713d-499d-b7ff-9f911fb0842d bc0ab055427645aca4ed09266e85b1db 1cb457a8302543fea067e5f14b5241e7 - - -] [instance: bd17aeef-240b-489c- 8bb6-b37167155174] Deletion of /srv/nova/instances/bd17aeef-240b-489c- 8bb6-b37167155174_del complete So it took 4 minutes to remove the files. Looking at strace, it's mostly a single thread doing (unrelated) stuff in the meantime : https://pastebin.canonical.com/p/w63Z62r4zN/ (sorry, Canonical-only link). The first line is the unlink, and you can see the syscall finishing at the bottom. Looking at this, I'm fairly convinced that it's the unlink() itself that took 4 minutes. So perhaps a kernel bug ? I'm now running "perf record sleep 60" in a loop to try and diagnose what the hell the system is doing during that time. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1766543 Title: instance deletion takes a while and blocks nova-compute To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1766543/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs