Actually, I believe the kernel is the problem. The hosts are running CentOS 6, 
the systemvm is stock template, Debian 7. This does not seem to be an issue on 
Ubuntu KVM hypervisors.

The fact that you are rebuilding systemvms on reboot is exactly why you are not 
seeing this issue. New system VMs are usually successful, it’s when you reboot 
them or start a stopped one where this issue shows up.

The serial port is loading, but I think the behavior is different after initial 
boot because if you access the system VM after you reboot it you do not have 
anything on /proc/cmdline and in /var/cache/cloud/cmdline the file is old and 
does not contain the new control network IP address. However, I am able to net 
cat the serial port between the hypervisor and the systemvm after it comes up - 
but CloudStack will eventually force stop the VM since it doesn’t get the new 
control network IP address it assumes it never started.

Which is why when we wrap that while loop to check for an empty string on $cmd 
it works every time after that.

Change that global setting from true to false, and try to reboot a few routers. 
I guarantee you will see this issue.

John Skinner
Appcore

On Sep 12, 2014, at 10:48 AM, Marcus <shadow...@gmail.com> wrote:

> You may also want to investigate on whether you are seeing a race condition
> with /dev/vport0p1 coming on line and cloud-early-config running. It will
> be indicated by a log line in the systemvm /var/log/cloud.log:
> 
> log_it "/dev/vport0p1 not loaded, perhaps guest kernel is too old."
> 
> Actually, if it has anything to do with the virtio-serial socket that would
> probably be logged. Can you open a bug in Jira and provide the logs?
> 
> On Fri, Sep 12, 2014 at 9:36 AM, Marcus <shadow...@gmail.com> wrote:
> 
>> Can you provide more info? Is the host running CentOS 6.x, or is your
>> systemvm? What is rebooted, the host or the router, and how is it rebooted?
>> We have what sounds like the same config (CentOS 6.x hosts, stock
>> community provided systemvm), and are running thousands of virtual routers,
>> rebooted regularly with no issue (both hosts and virtual routers).  One
>> setting we may have that you may not is that our system vms are rebuilt
>> from scratch on every reboot (recreate.systemvm.enabled=true in global
>> settings), not that I expect this to be the problem, but might be something
>> to look at.
>> 
>> On Fri, Sep 12, 2014 at 8:49 AM, John Skinner <john.skin...@appcore.com>
>> wrote:
>> 
>>> I have found that on CloudStack 4.2 + (when we changed to using the
>>> virtio-socket to send data to the systemvm) when running CentOS 6.X
>>> cloud-early-config fails. On new systemvm creation there is a high chance
>>> for success, but still a chance for failure. After the systemvm has been
>>> created a simple reboot will cause start to fail every time. This has been
>>> confirmed on 2 separate CloudStack 4.2 environments; 1 running CentOS 6.3
>>> KVM, and another running CentOS 6.2 KVM. This can be fixed with a simple
>>> modification to the get_boot_params function in the cloud-early-config
>>> script. If you wrap the while read line inside of another while that checks
>>> if $cmd returns an empty string it fixes the issue.
>>> 
>>> This is a pretty nasty issue for any one running CloudStack 4.2 + on
>>> CentOS 6.X
>>> 
>>> John Skinner
>>> Appcore
>> 
>> 
>> 

Reply via email to