Good Idea. I’ve submitted an update with a little wait in the while so to not stress out the CPU waiting. Looks like it missed against the 4.3.1 release, should I also submit for 4.4.1?
Thanks, David Bierce Senior System Administrator | Appcore Office +1.800.735.7104 Direct +1.515.612.7801 www.appcore.com On Sep 12, 2014, at 8:01 PM, David Bierce <david.bie...@appcore.com> wrote: > Looks good, and I was going to suggest waiting like that if it turned out > to be a race condition. The only thing I would suggest is maybe a sleep so > it doesn't kill CPU if for some reason the system vm never gets cmdline. > On Sep 12, 2014 1:28 PM, "David Bierce" <david.bie...@appcore.com> wrote: > >> John — >> >> I’ve submitted our patch to work around the issue to review board and tied >> it to the original ticket we found. I submitted it against the 4.3 but I >> know you’ve been testing the patch on 4.2. If someone could take a look at >> it for sanity, please check. It looks like it would be an issue in all >> version of cloudstack on CentOS/KVM as a hypervisor. >> >> Review Request 25585: [CLOUDSTACK-2823] SystemVMs start fail on CentOS 6.4 >> >> >> >> >> Thanks, >> David Bierce >> Senior System Administrator | Appcore >> >> Office +1.800.735.7104 >> Direct +1.515.612.7801 >> www.appcore.com >> >> On Sep 12, 2014, at 1:23 PM, John Skinner <john.skin...@appcore.com> >> wrote: >> >>> Actually, I believe the kernel is the problem. The hosts are running >> CentOS 6, the systemvm is stock template, Debian 7. This does not seem to >> be an issue on Ubuntu KVM hypervisors. >>> >>> The fact that you are rebuilding systemvms on reboot is exactly why you >> are not seeing this issue. New system VMs are usually successful, it’s when >> you reboot them or start a stopped one where this issue shows up. >>> >>> The serial port is loading, but I think the behavior is different after >> initial boot because if you access the system VM after you reboot it you do >> not have anything on /proc/cmdline and in /var/cache/cloud/cmdline the file >> is old and does not contain the new control network IP address. However, I >> am able to net cat the serial port between the hypervisor and the systemvm >> after it comes up - but CloudStack will eventually force stop the VM since >> it doesn’t get the new control network IP address it assumes it never >> started. >>> >>> Which is why when we wrap that while loop to check for an empty string >> on $cmd it works every time after that. >>> >>> Change that global setting from true to false, and try to reboot a few >> routers. I guarantee you will see this issue. >>> >>> John Skinner >>> Appcore >>> >>> On Sep 12, 2014, at 10:48 AM, Marcus <shadow...@gmail.com> wrote: >>> >>>> You may also want to investigate on whether you are seeing a race >> condition >>>> with /dev/vport0p1 coming on line and cloud-early-config running. It >> will >>>> be indicated by a log line in the systemvm /var/log/cloud.log: >>>> >>>> log_it "/dev/vport0p1 not loaded, perhaps guest kernel is too old." >>>> >>>> Actually, if it has anything to do with the virtio-serial socket that >> would >>>> probably be logged. Can you open a bug in Jira and provide the logs? >>>> >>>> On Fri, Sep 12, 2014 at 9:36 AM, Marcus <shadow...@gmail.com> wrote: >>>> >>>>> Can you provide more info? Is the host running CentOS 6.x, or is your >>>>> systemvm? What is rebooted, the host or the router, and how is it >> rebooted? >>>>> We have what sounds like the same config (CentOS 6.x hosts, stock >>>>> community provided systemvm), and are running thousands of virtual >> routers, >>>>> rebooted regularly with no issue (both hosts and virtual routers). One >>>>> setting we may have that you may not is that our system vms are rebuilt >>>>> from scratch on every reboot (recreate.systemvm.enabled=true in global >>>>> settings), not that I expect this to be the problem, but might be >> something >>>>> to look at. >>>>> >>>>> On Fri, Sep 12, 2014 at 8:49 AM, John Skinner < >> john.skin...@appcore.com> >>>>> wrote: >>>>> >>>>>> I have found that on CloudStack 4.2 + (when we changed to using the >>>>>> virtio-socket to send data to the systemvm) when running CentOS 6.X >>>>>> cloud-early-config fails. On new systemvm creation there is a high >> chance >>>>>> for success, but still a chance for failure. After the systemvm has >> been >>>>>> created a simple reboot will cause start to fail every time. This has >> been >>>>>> confirmed on 2 separate CloudStack 4.2 environments; 1 running CentOS >> 6.3 >>>>>> KVM, and another running CentOS 6.2 KVM. This can be fixed with a >> simple >>>>>> modification to the get_boot_params function in the cloud-early-config >>>>>> script. If you wrap the while read line inside of another while that >> checks >>>>>> if $cmd returns an empty string it fixes the issue. >>>>>> >>>>>> This is a pretty nasty issue for any one running CloudStack 4.2 + on >>>>>> CentOS 6.X >>>>>> >>>>>> John Skinner >>>>>> Appcore >>>>> >>>>> >>>>> >>> >> >>