OK, the bug happened again with strace attached to nova-compute. Once
again, there's little to no IO/network while it happens. memory is
stable. CPU is at least 50% idle (and the rest of it largely user mode).
Nothing in dmesg.

nova-compute logs are as follow :
2018-04-25 14:48:04.587 54255 INFO nova.virt.libvirt.driver 
[req-85551d96-713d-499d-b7ff-9f911fb0842d bc0ab055427645aca4ed09266e85b1db 
1cb457a8302543fea067e5f14b5241e7 - - -] [instance: 
bd17aeef-240b-489c-8bb6-b37167155174] Deleting instance files 
/srv/nova/instances/bd17aeef-240b-489c-8bb6-b37167155174_del

2018-04-25 14:52:06.350 54255 INFO nova.virt.libvirt.driver [req-
85551d96-713d-499d-b7ff-9f911fb0842d bc0ab055427645aca4ed09266e85b1db
1cb457a8302543fea067e5f14b5241e7 - - -] [instance: bd17aeef-240b-489c-
8bb6-b37167155174] Deletion of /srv/nova/instances/bd17aeef-240b-489c-
8bb6-b37167155174_del complete

So it took 4 minutes to remove the files. Looking at strace, it's mostly
a single thread doing (unrelated) stuff in the meantime :
https://pastebin.canonical.com/p/w63Z62r4zN/ (sorry, Canonical-only
link).

The first line is the unlink, and you can see the syscall finishing at
the bottom. Looking at this, I'm fairly convinced that it's the unlink()
itself that took 4 minutes. So perhaps a kernel bug ?

I'm now running "perf record sleep 60" in a loop to try and diagnose
what the hell the system is doing during that time.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1766543

Title:
  instance deletion takes a while and blocks nova-compute

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1766543/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to