Hi, This mail to share a fight we had at INRIA upgrading our Cloudstack/KVM farm from 4.2 to 4.4.2 following this documentation :
http://cloudstack-release-notes.readthedocs.org/en/latest/upgrade/upgrade-4.2.html It's now solved, but I would like to share, as I think : - it could helps other people like us who have already migrated from Cloudstack 3.X to 4.X - there is one bug marked as fixed and it should not https://issues.apache.org/jira/browse/CLOUDSTACK-7399 - a little documentation is missing (how to test if we have the good qemu-kvm version for systemVMs templates) Here are the (long) details Technical informations : ------------------------ - Upgrade from Cloudstack 4.2.1 to 4.4.2 - CentOS 6/KVM for agents - official Cloudstack rpms - 1 zone with BasicNetworking We are using cloudstack here in two environnments : - qualification, with MS and agents created on 4.2.1 - production, with MS and agents originally created on 3.x version, long time ago before Apache :D Qualification troubles and solution : ------------------------------------- - systemVM do not start after cloudstack-sysvmadm launch - Solution was tu upgrade the KVM agents from Centos 6.3 to 6.6 - we think (not sure) that we had a trouble with an historical qemu-kvm version, and a good test to document may be : what version of CentOS qemu-kvm supports, launching this command : --- /usr/libexec/qemu-kvm -M ? --- Production troubles and solution : ---------------------------------- - cloudstack-sysvmadm takes hours to shutdown, upgrade and restart systemVM (2 or 3 hours) - starting/stopping existing instances works - but we're unable to create new instances (error on MS : --- com.cloud.exception.AgentUnavailableException: Resource [Host:xx] is unreachable: Host xx: Unable to start instance due to Unable to get answer that is of class com.cloud.agent.api.StartAnswer --- - when destroyed manually, systemVM won't restart - debug on agents shows the same message as this bug : https://issues.apache.org/jira/browse/CLOUDSTACK-7399 which is officially resolved in 4.4.1 (our version is 4.4.2 !!!) --- WARN [cloud.agent.Agent] (agentRequest-Handler-2:null) Caught: java.lang.NullPointerException at com.cloud.network.Networks$BroadcastDomainType.getSchemeValue(Networks.java:159) ... DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) Seq 25-6233544834234187813: { Ans: , MgmtId: 345044038925, via: 25, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":false,"details":"java.lang.NullPointerException\n\tat com.cloud.network.Networks$BroadcastDomainType.getSchemeValue(Networks.java:159)\n\tat com.cloud.network.Networks$BroadcastDomainType.getValue(Networks.java:213)\n\tat com.cloud.hypervisor. ... --- - we had to find our bascicnetwork in mysql table networks, whom broadcast_uri was NULL - and modify it to the "new" style vlan://untagged : --- update networks set broadcast_uri="vlan://untagged" where id="our bascinetwork id"; Hope it could help, -- Laurent Steff DSI/SESI INRIA http://www.inria.fr/