John —

I’ve submitted our patch to work around the issue to review board and tied it 
to the original ticket we found.  I submitted it against the 4.3 but I know 
you’ve been testing the patch on 4.2.  If someone could take a look at it for 
sanity, please check.  It looks like it would be an issue in all version of 
cloudstack on CentOS/KVM as a hypervisor.

Review Request 25585: [CLOUDSTACK-2823] SystemVMs start fail on CentOS 6.4




Thanks,
David Bierce
Senior System Administrator  | Appcore

Office +1.800.735.7104
Direct +1.515.612.7801 
www.appcore.com

On Sep 12, 2014, at 1:23 PM, John Skinner <john.skin...@appcore.com> wrote:

> Actually, I believe the kernel is the problem. The hosts are running CentOS 
> 6, the systemvm is stock template, Debian 7. This does not seem to be an 
> issue on Ubuntu KVM hypervisors.
> 
> The fact that you are rebuilding systemvms on reboot is exactly why you are 
> not seeing this issue. New system VMs are usually successful, it’s when you 
> reboot them or start a stopped one where this issue shows up.
> 
> The serial port is loading, but I think the behavior is different after 
> initial boot because if you access the system VM after you reboot it you do 
> not have anything on /proc/cmdline and in /var/cache/cloud/cmdline the file 
> is old and does not contain the new control network IP address. However, I am 
> able to net cat the serial port between the hypervisor and the systemvm after 
> it comes up - but CloudStack will eventually force stop the VM since it 
> doesn’t get the new control network IP address it assumes it never started.
> 
> Which is why when we wrap that while loop to check for an empty string on 
> $cmd it works every time after that.
> 
> Change that global setting from true to false, and try to reboot a few 
> routers. I guarantee you will see this issue.
> 
> John Skinner
> Appcore
> 
> On Sep 12, 2014, at 10:48 AM, Marcus <shadow...@gmail.com> wrote:
> 
>> You may also want to investigate on whether you are seeing a race condition
>> with /dev/vport0p1 coming on line and cloud-early-config running. It will
>> be indicated by a log line in the systemvm /var/log/cloud.log:
>> 
>> log_it "/dev/vport0p1 not loaded, perhaps guest kernel is too old."
>> 
>> Actually, if it has anything to do with the virtio-serial socket that would
>> probably be logged. Can you open a bug in Jira and provide the logs?
>> 
>> On Fri, Sep 12, 2014 at 9:36 AM, Marcus <shadow...@gmail.com> wrote:
>> 
>>> Can you provide more info? Is the host running CentOS 6.x, or is your
>>> systemvm? What is rebooted, the host or the router, and how is it rebooted?
>>> We have what sounds like the same config (CentOS 6.x hosts, stock
>>> community provided systemvm), and are running thousands of virtual routers,
>>> rebooted regularly with no issue (both hosts and virtual routers).  One
>>> setting we may have that you may not is that our system vms are rebuilt
>>> from scratch on every reboot (recreate.systemvm.enabled=true in global
>>> settings), not that I expect this to be the problem, but might be something
>>> to look at.
>>> 
>>> On Fri, Sep 12, 2014 at 8:49 AM, John Skinner <john.skin...@appcore.com>
>>> wrote:
>>> 
>>>> I have found that on CloudStack 4.2 + (when we changed to using the
>>>> virtio-socket to send data to the systemvm) when running CentOS 6.X
>>>> cloud-early-config fails. On new systemvm creation there is a high chance
>>>> for success, but still a chance for failure. After the systemvm has been
>>>> created a simple reboot will cause start to fail every time. This has been
>>>> confirmed on 2 separate CloudStack 4.2 environments; 1 running CentOS 6.3
>>>> KVM, and another running CentOS 6.2 KVM. This can be fixed with a simple
>>>> modification to the get_boot_params function in the cloud-early-config
>>>> script. If you wrap the while read line inside of another while that checks
>>>> if $cmd returns an empty string it fixes the issue.
>>>> 
>>>> This is a pretty nasty issue for any one running CloudStack 4.2 + on
>>>> CentOS 6.X
>>>> 
>>>> John Skinner
>>>> Appcore
>>> 
>>> 
>>> 
> 

Reply via email to