Hi ilya: I can confirm that issus, please check : https://issues.apache.org/jira/browse/CLOUDSTACK-9144 When we deployed cloudstack(4.6/4.7/4.8) with vmware(5.x/6.0) in basic zone, The VR is nerver leaves the "starting" state. fell back to 4.5 is fine. Maybe you can test it by yourself.
2016-07-29 3:24 GMT+08:00 ilya <[email protected]>: > I guess it would help to know what type of zone you use? > > Is it advanced, isolated vpc or shared network? what type of isolation? > or perhaps basic zone? > > Lastly, try stopping the iptables and restarting cloud agent (via stop > and start) > > Please see my response in-line > > On 7/28/16 6:58 AM, Jacob Seeley wrote: > > Hi ilya, > > > > Funny you brought up debugging the router VM. After I responding > yesterday, I did just that and I did find some odd things. > > Just to be clear (I think we're on the same page), since I'm not the OP > of this thread, the virtual router always gets deployed and it starts up > just fine; however, CloudStack reports that it's always stuck in starting. > VMs that get deployed ultimately fail. CloudStack reports the router > version as UNKNOWN. > > Before I provide what I found debugging the router VM, I'll address some > of your points. > > > > ### FOLLOW-UP QUESTIONS ### > > > > " Another reason would be an issue of hypervisor accessing the NFS mount > used for secondary storage." > > I don't believe this is an issue. The hypervisor (VMware) does mount the > secondary storage via NFS just fine. If this were an issue, I would think > the Secondary Storage and Console VMs would not deploy. > > > > " Use console of vCenter to see what is happening on router vm. You can > login locally with root/password and see the content of /var/log/cloud.out > file, paste it on pastebin - if it makes no sense to you..." > > It looks like to me that /var/log/cloud.out is only logged to when > $CLOUD_DEBUG is set to a non-zero length in the /etc/init.d/cloud script. > As such, there isn't even a file for /var/log/cloud.out. Even when I set > that variable, I never get anything logged to /var/log/cloud.out. However, > there is a /var/log/cloud.log. Here is the contents of that: > http://pastebin.com/aaTsRKZE > > > > " you can also run /etc/init.d/cloud stop and start.. that will give you > a fresh start on logs.." > > The service is in a failed state. It's worth noting that this service is > in a started state on the Console and Secondary Storage VMs. > > this is concerning - see you did "sh -x", read on.. > > > > > " also, confirm that management server can talk to VR on POD IP > > (management) on port 3922.." > > It appears this is not an issue; see below: > > 3922 from MS to VR - this is the SSH daemon on VR with private key > 8250 from VR to MS - cloudstack java agent on VR talking to MS > > > > > > root@r-4-VM:~# telnet 10.70.110.101 8250 > > Trying 10.70.110.101... > > Connected to 10.70.110.101. > > Escape character is '^]'. > > > > > > ### ROUTE VM DEBUG ### > > > > Here is what I found with router VM gets deployed (please tell me if > anything seems off): > > 2 NICs; only one NIC gets an IP address. CloudStack NIC1 shows an IP > address coming from the defaultGuestNetwork. NIC2 is traffic type Control > but has an IP address of 0.0.0.0 > > It is an issue for concern to see 0.0.0.0 assigned to eth1 > > Lets assume NIC1 (as eth0) and NIC2 (as eth1). > > 1) we should not be getting 0.0.0.0 for eth1 - aka control network. This > IP should be coming from the POD network range -> when you added a pod - > i assume you did it as part of Add Zone wizard... > > To see the PODIP range, goto UI > Infrastructure, Zones, Your Zone, Physical Network, Physical Network 1 > (assume you did not create anything special), Management, IP Ranges -> > you should see a range defined there and it should not be 0.0.0.0... > > > From the CloudStack management server, I cannot SSH into the router VM > on NIC1. I've found this is because of iptables rules on the router VM. If > I issue a /etc/init.d/iptables-persistent flush on the router VM, I can SSH > into the router VM using the SSH key at port 3922. > > The service "cloud" is in a failed state. Looking at the cloud init > script, I see the following: > > > > CMDLINE=$(cat /var/cache/cloud/cmdline) > > > > TYPE="router" > > for i in $CMDLINE > > do > > # search for foo=bar pattern and cut out foo > > FIRSTPATTERN=$(echo $i | cut -d= -f1) > > case $FIRSTPATTERN in > > type) > > TYPE=$(echo $i | cut -d= -f2) > > ;; > > esac > > done > > > > The file cat /var/cache/cloud/cmdline exist; here are the contents: > > > > template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0 > gateway=10.70.116.1 domain=vit.vertitechit.com cidrsize=24 > dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr= > 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr > disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 > baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ > baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ > host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03 > > > > > You can also try updating your /var/cache/cloud/cmdline with proper > value for eth1ip=0.0.0.0 eth1mask=0.0.0.0, you can look it up under > Infrastructure, Routers, r-4, Nics and look for control nic.. > > Then try starting the cloud service.. > > Also, did you enable baremetal support? can you deploy a zone without > baremetal support? Perhaps there is a bug on how IPs are assigned to > eth1 (control nic)... > > > > The previous code suggests that the value of TYPE starts as router but > will get set to dhcpsrvr, as indicated by the contents of > /var/cache/cloud/cmdline. Is this normal? > > Further down the script, I see: > > > > CLOUDSTACK_HOME="/usr/local/cloud" > <----------------------------------------Exists > > if [ -f $CLOUDSTACK_HOME/systemvm/utils.sh ]; > <----------------------------------------Does not exist. Seems odd! > > then > > . $CLOUDSTACK_HOME/systemvm/utils.sh > > else > > _failure > > fi > > > > # mkdir -p /var/log/vmops > > > > start() { > > local pid=$(get_pids) > > if [ "$pid" != "" ]; then > > echo "CloudStack cloud sevice is already running, PID = $pid" > > return 0 > > fi > > > > echo -n "Starting CloudStack cloud service (type=$TYPE) " > > if [ -f $CLOUDSTACK_HOME/systemvm/run.sh ]; > <------------------------------------------------------Does not exist. > Seems odd! > > then > > if [ "$pid" == "" ] > > then > > (cd $CLOUDSTACK_HOME/systemvm; nohup ./run.sh > $LOG_FILE 2>&1 & ) > > pid=$(get_pids) > > echo $pid > /var/run/cloud.pid > > fi > > _success > > else > > _failure > > fi > > echo > > echo 'start' > $CLOUDSTACK_HOME/systemvm/user_request > > } > > > > I see that it sets CLOUDSTACK_HOME to /usr/local/cloud. This folder > exists; however, the script then looks for the file > /usr/local/cloud/systemvm/utils.sh. This file doesn't exist. It also looks > is supposed to start the script run.sh but that also doesn't exist. This > seems like a problem to me. > > Here you can see step through when I try to start the cloud service: > > > > sh -x /etc/init.d/cloud start > > + ENABLED=0 > > + [ -e /etc/default/cloud ] > > + . /etc/default/cloud > > + ENABLED=0 > > + cat /var/cache/cloud/cmdline > > + CMDLINE= template=domP name=r-4-VM eth0ip=10.70.116.75 > eth0mask=255.255.255.0 gateway=10.70.116.1 domain=vit.vertitechit.com > cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr= > 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr > disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 > baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ > baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ > host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03 > > + [ ! -z ] > > + LOG_FILE=/dev/null > > + TYPE=router > > + cut -d= -f1 > > + echo template=domP > > + FIRSTPATTERN=template > > + cut -d= -f1 > > + echo name=r-4-VM > > + FIRSTPATTERN=name > > + cut -d= -f1 > > + echo eth0ip=10.70.116.75 > > + FIRSTPATTERN=eth0ip > > + cut -d= -f1 > > + echo eth0mask=255.255.255.0 > > + FIRSTPATTERN=eth0mask > > + cut -d= -f1 > > + echo gateway=10.70.116.1 > > + FIRSTPATTERN=gateway > > + cut -d= -f1 > > + echo domain=vit.vertitechit.com > > + FIRSTPATTERN=domain > > + cut -d= -f1 > > + echo cidrsize=24 > > + FIRSTPATTERN=cidrsize > > + cut -d= -f1 > > + echo dhcprange=10.70.116.1 > > + FIRSTPATTERN=dhcprange > > + cut -d= -f1 > > + echo eth1ip=0.0.0.0 > > + FIRSTPATTERN=eth1ip > > + cut -d= -f1 > > + echo eth1mask=0.0.0.0 > > + FIRSTPATTERN=eth1mask > > + cut -d= -f1 > > + echo mgmtcidr=10.70.110.0/24 > > + FIRSTPATTERN=mgmtcidr > > + cut -d= -f1 > > + echo localgw=10.70.116.1 > > + FIRSTPATTERN=localgw > > + cut -d= -f1 > > + echo sshonguest=true > > + FIRSTPATTERN=sshonguest > > + cut -d= -f1 > > + echo type=dhcpsrvr > > + FIRSTPATTERN=type > > + cut -d= -f2 > > + echo type=dhcpsrvr > > + TYPE=dhcpsrvr > > + cut -d= -f1 > > + echo disable_rp_filter=true > > + FIRSTPATTERN=disable_rp_filter > > + cut -d= -f1 > > + echo extra_pubnics=2 > > + FIRSTPATTERN=extra_pubnics > > + cut -d= -f1 > > + echo dns1=10.70.10.21 > > + FIRSTPATTERN=dns1 > > + cut -d= -f1 > > + echo > baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ > > + FIRSTPATTERN=baremetalnotificationsecuritykey > > + cut -d= -f1 > > + echo > baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ > > + FIRSTPATTERN=baremetalnotificationapikey > > + cut -d= -f1 > > + echo host=10.70.110.101 > > + FIRSTPATTERN=host > > + cut -d= -f1 > > + echo port=8080 > > + FIRSTPATTERN=port > > + cut -d= -f1 > > + echo nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03 > > + FIRSTPATTERN=nic_macs > > + [ -f /etc/init.d/functions ] > > + [ -f ./lib/lsb/init-functions ] > > + RETVAL=0 > > + CLOUDSTACK_HOME=/usr/local/cloud > > + [ -f /usr/local/cloud/systemvm/utils.sh ] > > + _failure > > + [ -f /etc/init.d/functions ] > > + echo Failed > > Failed > > + [ 0 != 0 ] > > + exit 0 > > > > Thoughts? > > > > Jacob Seeley > > Sr. Infrastructure Engineer > > VertitechIT > > 413-268-1631 > > > > www.vertitechit.com > > > > -----Original Message----- > > From: ilya [mailto:[email protected]] > > Sent: Wednesday, July 27, 2016 8:43 PM > > To: [email protected] > > Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting > > > > Hi Jacob > > > > I gave this a second read - if your issue is Router VM in starting mode > > - but not started - it means cloudstack agent on routerVM cannot talk to > management server on 8250 over POD network. > > > > Another reason would be an issue of hypervisor accessing the NFS mount > used for secondary storage. > > > > Use console of vCenter to see what is happening on router vm. You can > login locally with root/password and see the content of /var/log/cloud.out > file, paste it on pastebin - if it makes no sense to you... > > > > you can also run /etc/init.d/cloud stop and start.. that will give you a > fresh start on logs.. > > > > also, confirm that management server can talk to VR on POD IP > > (management) on port 3922.. > > > > Regards > > ilya > > > > On 7/27/16 9:34 AM, Jacob Seeley wrote: > >> ilya, > >> > >> Here are the contents of the secondary storage: > >> > >> . > >> ./template > >> ./template/tmpl > >> ./template/tmpl/1 > >> ./template/tmpl/1/8 > >> ./template/tmpl/1/8/49a4c4ee-ef06-4474-92c3-1d8efb082266.ova > >> ./template/tmpl/1/8/template.properties > >> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw > >> are.ovf > >> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw > >> are-disk3.vmdk > >> ./template/tmpl/1/7 > >> ./template/tmpl/1/7/template.properties > >> ./template/tmpl/1/7/0098d168-4985-3b33-9840-eb5848d2f385.ova > >> ./template/tmpl/1/7/CentOS5.3-x86_64.ovf > >> ./template/tmpl/1/7/CentOS5.3-x86_64-disk1.vmdk > >> ./template/tmpl/1/7/CentOS5.3-x86_64.mf > >> ./systemvm > >> ./systemvm/systemvm-4.8.0.1.iso > >> ./systemvm/.lck-bf162a0100000000 > >> ./snapshots > >> ./volumes > >> > >> I've noticed that both the Secondary Storage VM and Console Proxy VM > mount this ISO and as stated before, they come up just fine. > >> > >> Regards, > >> > >> Jacob Seeley > >> Sr. Infrastructure Engineer > >> VertitechIT > >> 413-268-1631 > >> > >> www.vertitechit.com > >> > >> -----Original Message----- > >> From: ilya [mailto:[email protected]] > >> Sent: Wednesday, July 27, 2016 3:22 AM > >> To: [email protected] > >> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting > >> > >> Jacob > >> > >> The upgrade usually occurs though systemvm.iso - that is generated by > cloudstack on the first start. > >> > >> Please show the content of your secondary store specifically > >> > >> /mnt/[secondary-storage]/systemvm > >> > >> Regards > >> ilya > >> > >> On 7/25/16 11:19 AM, Jacob Seeley wrote: > >>> Here is a pastebin snippet the management-server.log - > >>> http://pastebin.com/GCLm53Gz > >>> > >>> Hopefully the relevant data is in there. > >>> > >>> I made sure to start from scratch for this example. Everything from > the vSphere ESXi to the vCenter to the CentOS 7 with CloudStack install is > fresh. I deployed a new instance in CloudStack, a VM internally named > i-2-3-VM with an IP address of 192.168.0.78. This prompted CloudStack to > deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79. > >>> > >>> Thank you, > >>> > >>> Jacob Seeley > >>> Sr. Infrastructure Engineer > >>> VertitechIT > >>> 413-268-1631 > >>> > >>> www.vertitechit.com > >>> > >>> -----Original Message----- > >>> From: Suresh Sadhu [mailto:[email protected]] > >>> Sent: Monday, July 25, 2016 1:37 AM > >>> To: [email protected] > >>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting > >>> > >>> please upload the logs in the issue. > >>>> On Jul 5, 2016, at 8:46 AM, Darren Tang <[email protected]> > wrote: > >>>> > >>>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144 > >>>> > >>>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <[email protected]>: > >>>> > >>>>> Hi, > >>>>> > >>>>> What template are you using to start your first VM? - the default > >>>>> vmware template? > >>>>> If you look in vcenter , what does the console show you ? > >>>>> > >>>>> > >>>>> Glenn > >>>>> > >>>>> > >>>>> > >>>>> [email protected] > >>>>> www.shapeblue.com > >>>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town > >>>>> 7130South Africa @shapeblue > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> -----Original Message----- > >>>>> From: Pascal R. [mailto:[email protected]] > >>>>> Sent: Monday, 04 July 2016 1:26 PM > >>>>> To: [email protected] > >>>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting > >>>>> > >>>>> hi, > >>>>> > >>>>> we have a CS4.8 deployment with VMWare 5.5. > >>>>> > >>>>> When trying to launch the first VM, the VS is created. VS starts > >>>>> up, but in CS, it stuck with "starting" state. > >>>>> > >>>>> i can't find any usefull information in the logs. > >>>>> > >>>>> any hint? > >>>>> > >>> > >>> > >>> > >>> > >>> DISCLAIMER > >>> ========== > >>> This e-mail may contain privileged and confidential information which > is the property of Accelerite, a Persistent Systems business. It is > intended only for the use of the individual or entity to which it is > addressed. If you are not the intended recipient, you are not authorized to > read, retain, copy, print, distribute or use this message. If you have > received this communication in error, please notify the sender and delete > all copies of this message. Accelerite, a Persistent Systems business does > not accept any liability for virus infected mails. > >>> >
