Looks good, and I was going to suggest waiting like that if it turned out
to be a race condition. The only thing I would suggest is maybe a sleep so
it doesn't kill CPU if for some reason the system vm never gets cmdline.
On Sep 12, 2014 1:28 PM, "David Bierce" <david.bie...@appcore.com> wrote:

> John —
>
> I’ve submitted our patch to work around the issue to review board and tied
> it to the original ticket we found.  I submitted it against the 4.3 but I
> know you’ve been testing the patch on 4.2.  If someone could take a look at
> it for sanity, please check.  It looks like it would be an issue in all
> version of cloudstack on CentOS/KVM as a hypervisor.
>
> Review Request 25585: [CLOUDSTACK-2823] SystemVMs start fail on CentOS 6.4
>
>
>
>
> Thanks,
> David Bierce
> Senior System Administrator  | Appcore
>
> Office +1.800.735.7104
> Direct +1.515.612.7801
> www.appcore.com
>
> On Sep 12, 2014, at 1:23 PM, John Skinner <john.skin...@appcore.com>
> wrote:
>
> > Actually, I believe the kernel is the problem. The hosts are running
> CentOS 6, the systemvm is stock template, Debian 7. This does not seem to
> be an issue on Ubuntu KVM hypervisors.
> >
> > The fact that you are rebuilding systemvms on reboot is exactly why you
> are not seeing this issue. New system VMs are usually successful, it’s when
> you reboot them or start a stopped one where this issue shows up.
> >
> > The serial port is loading, but I think the behavior is different after
> initial boot because if you access the system VM after you reboot it you do
> not have anything on /proc/cmdline and in /var/cache/cloud/cmdline the file
> is old and does not contain the new control network IP address. However, I
> am able to net cat the serial port between the hypervisor and the systemvm
> after it comes up - but CloudStack will eventually force stop the VM since
> it doesn’t get the new control network IP address it assumes it never
> started.
> >
> > Which is why when we wrap that while loop to check for an empty string
> on $cmd it works every time after that.
> >
> > Change that global setting from true to false, and try to reboot a few
> routers. I guarantee you will see this issue.
> >
> > John Skinner
> > Appcore
> >
> > On Sep 12, 2014, at 10:48 AM, Marcus <shadow...@gmail.com> wrote:
> >
> >> You may also want to investigate on whether you are seeing a race
> condition
> >> with /dev/vport0p1 coming on line and cloud-early-config running. It
> will
> >> be indicated by a log line in the systemvm /var/log/cloud.log:
> >>
> >> log_it "/dev/vport0p1 not loaded, perhaps guest kernel is too old."
> >>
> >> Actually, if it has anything to do with the virtio-serial socket that
> would
> >> probably be logged. Can you open a bug in Jira and provide the logs?
> >>
> >> On Fri, Sep 12, 2014 at 9:36 AM, Marcus <shadow...@gmail.com> wrote:
> >>
> >>> Can you provide more info? Is the host running CentOS 6.x, or is your
> >>> systemvm? What is rebooted, the host or the router, and how is it
> rebooted?
> >>> We have what sounds like the same config (CentOS 6.x hosts, stock
> >>> community provided systemvm), and are running thousands of virtual
> routers,
> >>> rebooted regularly with no issue (both hosts and virtual routers).  One
> >>> setting we may have that you may not is that our system vms are rebuilt
> >>> from scratch on every reboot (recreate.systemvm.enabled=true in global
> >>> settings), not that I expect this to be the problem, but might be
> something
> >>> to look at.
> >>>
> >>> On Fri, Sep 12, 2014 at 8:49 AM, John Skinner <
> john.skin...@appcore.com>
> >>> wrote:
> >>>
> >>>> I have found that on CloudStack 4.2 + (when we changed to using the
> >>>> virtio-socket to send data to the systemvm) when running CentOS 6.X
> >>>> cloud-early-config fails. On new systemvm creation there is a high
> chance
> >>>> for success, but still a chance for failure. After the systemvm has
> been
> >>>> created a simple reboot will cause start to fail every time. This has
> been
> >>>> confirmed on 2 separate CloudStack 4.2 environments; 1 running CentOS
> 6.3
> >>>> KVM, and another running CentOS 6.2 KVM. This can be fixed with a
> simple
> >>>> modification to the get_boot_params function in the cloud-early-config
> >>>> script. If you wrap the while read line inside of another while that
> checks
> >>>> if $cmd returns an empty string it fixes the issue.
> >>>>
> >>>> This is a pretty nasty issue for any one running CloudStack 4.2 + on
> >>>> CentOS 6.X
> >>>>
> >>>> John Skinner
> >>>> Appcore
> >>>
> >>>
> >>>
> >
>
>

Reply via email to