Most of my virt nodes run the standard Trusty kernel, 3.13.0-52-generic or similar. Recently I had cause to shut down one of them, so I started by running a scripted 'nova suspend' of all instances. A couple of instances into the script, the kernel locked up and the whole system died. Further investigation on a test node confirmed: Anytime I suspended more than a couple of instances, the entire system was a goner and required a reboot.

So... today I've been investigating alternative kernels. My first attempt was 3.19. With 3.19 I can suspend and resume instances as much as I want, and the server stays up. But, once an instance is resumed its clock is garbled. A simple 'sleep 1' on a resumed instance causes it to hang forever.

The sweet spot is in the middle. 3.16 doesn't hang with suspend/resume, and the instances actually work once resumed. So, I have my solution.

What gives? Is suspend/resume just generally considered harmful? Am I encountering a nasty hardware interaction such that these kernels work for others?

Issues like this make me think that I'm the only person in the world who is actually using this stuff :(

-A


_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to