Re: Review Request 25585: [CLOUDSTACK-2823] SystemVMs start fail on CentOS 6.4

David Bierce Mon, 15 Sep 2014 07:21:15 -0700

Good Idea.  I’ve submitted an update with a little wait in the while so to not 
stress out the CPU waiting.  Looks like it missed against the 4.3.1 release, 
should I also submit for 4.4.1?




Thanks,
David Bierce
Senior System Administrator  | Appcore

Office +1.800.735.7104
Direct +1.515.612.7801 
www.appcore.com

On Sep 12, 2014, at 8:01 PM, David Bierce <david.bie...@appcore.com> wrote:

> Looks good, and I was going to suggest waiting like that if it turned out
> to be a race condition. The only thing I would suggest is maybe a sleep so
> it doesn't kill CPU if for some reason the system vm never gets cmdline.
> On Sep 12, 2014 1:28 PM, "David Bierce" <david.bie...@appcore.com> wrote:
> 
>> John —
>> 
>> I’ve submitted our patch to work around the issue to review board and tied
>> it to the original ticket we found.  I submitted it against the 4.3 but I
>> know you’ve been testing the patch on 4.2.  If someone could take a look at
>> it for sanity, please check.  It looks like it would be an issue in all
>> version of cloudstack on CentOS/KVM as a hypervisor.
>> 
>> Review Request 25585: [CLOUDSTACK-2823] SystemVMs start fail on CentOS 6.4
>> 
>> 
>> 
>> 
>> Thanks,
>> David Bierce
>> Senior System Administrator  | Appcore
>> 
>> Office +1.800.735.7104
>> Direct +1.515.612.7801
>> www.appcore.com
>> 
>> On Sep 12, 2014, at 1:23 PM, John Skinner <john.skin...@appcore.com>
>> wrote:
>> 
>>> Actually, I believe the kernel is the problem. The hosts are running
>> CentOS 6, the systemvm is stock template, Debian 7. This does not seem to
>> be an issue on Ubuntu KVM hypervisors.
>>> 
>>> The fact that you are rebuilding systemvms on reboot is exactly why you
>> are not seeing this issue. New system VMs are usually successful, it’s when
>> you reboot them or start a stopped one where this issue shows up.
>>> 
>>> The serial port is loading, but I think the behavior is different after
>> initial boot because if you access the system VM after you reboot it you do
>> not have anything on /proc/cmdline and in /var/cache/cloud/cmdline the file
>> is old and does not contain the new control network IP address. However, I
>> am able to net cat the serial port between the hypervisor and the systemvm
>> after it comes up - but CloudStack will eventually force stop the VM since
>> it doesn’t get the new control network IP address it assumes it never
>> started.
>>> 
>>> Which is why when we wrap that while loop to check for an empty string
>> on $cmd it works every time after that.
>>> 
>>> Change that global setting from true to false, and try to reboot a few
>> routers. I guarantee you will see this issue.
>>> 
>>> John Skinner
>>> Appcore
>>> 
>>> On Sep 12, 2014, at 10:48 AM, Marcus <shadow...@gmail.com> wrote:
>>> 
>>>> You may also want to investigate on whether you are seeing a race
>> condition
>>>> with /dev/vport0p1 coming on line and cloud-early-config running. It
>> will
>>>> be indicated by a log line in the systemvm /var/log/cloud.log:
>>>> 
>>>> log_it "/dev/vport0p1 not loaded, perhaps guest kernel is too old."
>>>> 
>>>> Actually, if it has anything to do with the virtio-serial socket that
>> would
>>>> probably be logged. Can you open a bug in Jira and provide the logs?
>>>> 
>>>> On Fri, Sep 12, 2014 at 9:36 AM, Marcus <shadow...@gmail.com> wrote:
>>>> 
>>>>> Can you provide more info? Is the host running CentOS 6.x, or is your
>>>>> systemvm? What is rebooted, the host or the router, and how is it
>> rebooted?
>>>>> We have what sounds like the same config (CentOS 6.x hosts, stock
>>>>> community provided systemvm), and are running thousands of virtual
>> routers,
>>>>> rebooted regularly with no issue (both hosts and virtual routers).  One
>>>>> setting we may have that you may not is that our system vms are rebuilt
>>>>> from scratch on every reboot (recreate.systemvm.enabled=true in global
>>>>> settings), not that I expect this to be the problem, but might be
>> something
>>>>> to look at.
>>>>> 
>>>>> On Fri, Sep 12, 2014 at 8:49 AM, John Skinner <
>> john.skin...@appcore.com>
>>>>> wrote:
>>>>> 
>>>>>> I have found that on CloudStack 4.2 + (when we changed to using the
>>>>>> virtio-socket to send data to the systemvm) when running CentOS 6.X
>>>>>> cloud-early-config fails. On new systemvm creation there is a high
>> chance
>>>>>> for success, but still a chance for failure. After the systemvm has
>> been
>>>>>> created a simple reboot will cause start to fail every time. This has
>> been
>>>>>> confirmed on 2 separate CloudStack 4.2 environments; 1 running CentOS
>> 6.3
>>>>>> KVM, and another running CentOS 6.2 KVM. This can be fixed with a
>> simple
>>>>>> modification to the get_boot_params function in the cloud-early-config
>>>>>> script. If you wrap the while read line inside of another while that
>> checks
>>>>>> if $cmd returns an empty string it fixes the issue.
>>>>>> 
>>>>>> This is a pretty nasty issue for any one running CloudStack 4.2 + on
>>>>>> CentOS 6.X
>>>>>> 
>>>>>> John Skinner
>>>>>> Appcore
>>>>> 
>>>>> 
>>>>> 
>>> 
>> 
>>

Re: Review Request 25585: [CLOUDSTACK-2823] SystemVMs start fail on CentOS 6.4

Reply via email to