----- Original Message ----- > From: "Bob Doolittle" <[email protected]> > To: "Simone Tiraboschi" <[email protected]> > Cc: "users-ovirt" <[email protected]> > Sent: Tuesday, March 10, 2015 7:29:44 PM > Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 > (The VDSM host was found in a failed > state) > > > On 03/10/2015 10:20 AM, Simone Tiraboschi wrote: > > > > ----- Original Message ----- > >> From: "Bob Doolittle" <[email protected]> > >> To: "Simone Tiraboschi" <[email protected]> > >> Cc: "users-ovirt" <[email protected]> > >> Sent: Tuesday, March 10, 2015 2:40:13 PM > >> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on > >> F20 (The VDSM host was found in a failed > >> state) > >> > >> > >> On 03/10/2015 04:58 AM, Simone Tiraboschi wrote: > >>> ----- Original Message ----- > >>>> From: "Bob Doolittle" <[email protected]> > >>>> To: "Simone Tiraboschi" <[email protected]> > >>>> Cc: "users-ovirt" <[email protected]> > >>>> Sent: Monday, March 9, 2015 11:48:03 PM > >>>> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on > >>>> F20 (The VDSM host was found in a failed > >>>> state) > >>>> > >>>> > >>>> On 03/09/2015 02:47 PM, Bob Doolittle wrote: > >>>>> Resending with CC to list (and an update). > >>>>> > >>>>> On 03/09/2015 01:40 PM, Simone Tiraboschi wrote: > >>>>>> ----- Original Message ----- > >>>>>>> From: "Bob Doolittle" <[email protected]> > >>>>>>> To: "Simone Tiraboschi" <[email protected]> > >>>>>>> Cc: "users-ovirt" <[email protected]> > >>>>>>> Sent: Monday, March 9, 2015 6:26:30 PM > >>>>>>> Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 > >>>>>>> on > >>>>>>> F20 (Cannot add the host to cluster ... SSH > >>>>>>> has failed) > >>>>>>> > >> ... > >>>>>>> OK, I've started over. Simply removing the storage domain was > >>>>>>> insufficient, > >>>>>>> the hosted-engine deploy failed when it found the HA and Broker > >>>>>>> services > >>>>>>> already configured. I decided to just start over fresh starting with > >>>>>>> re-installing the OS on my host. > >>>>>>> > >>>>>>> I can't deploy DNS at the moment, so I have to simply replicate > >>>>>>> /etc/hosts > >>>>>>> files on my host/engine. I did that this time, but have run into a > >>>>>>> new > >>>>>>> problem: > >>>>>>> > >>>>>>> [ INFO ] Engine replied: DB Up!Welcome to Health Status! > >>>>>>> Enter the name of the cluster to which you want to add the > >>>>>>> host > >>>>>>> (Default) [Default]: > >>>>>>> [ INFO ] Waiting for the host to become operational in the engine. > >>>>>>> This > >>>>>>> may > >>>>>>> take several minutes... > >>>>>>> [ ERROR ] The VDSM host was found in a failed state. Please check > >>>>>>> engine > >>>>>>> and > >>>>>>> bootstrap installation logs. > >>>>>>> [ ERROR ] Unable to add ovirt-vm to the manager > >>>>>>> Please shutdown the VM allowing the system to launch it as > >>>>>>> a > >>>>>>> monitored service. > >>>>>>> The system will wait until the VM is down. > >>>>>>> [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] > >>>>>>> Connection > >>>>>>> refused > >>>>>>> [ INFO ] Stage: Clean up > >>>>>>> [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection > >>>>>>> refused > >>>>>>> > >>>>>>> > >>>>>>> I've attached my engine log and the ovirt-hosted-engine-setup log. I > >>>>>>> think I > >>>>>>> had an issue with resolving external hostnames, or else a > >>>>>>> connectivity > >>>>>>> issue > >>>>>>> during the install. > >>>>>> For some reason your engine wasn't able to deploy your hosts but the > >>>>>> SSH > >>>>>> session this time was established. > >>>>>> 2015-03-09 13:05:58,514 ERROR > >>>>>> [org.ovirt.engine.core.bll.InstallVdsInternalCommand] > >>>>>> (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed > >>>>>> for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: > >>>>>> java.io.IOException: Command returned failure code 1 during SSH > >>>>>> session > >>>>>> '[email protected]' > >>>>>> > >>>>>> Can you please attach host-deploy logs from the engine VM? > >>>>> OK, attached. > >>>>> > >>>>> Like I said, it looks to me like a name-resolution issue during the yum > >>>>> update on the engine. I think I've fixed that, but do you have a better > >>>>> suggestion for cleaning up and re-deploying other than installing the > >>>>> OS > >>>>> on my host and starting all over again? > >>>> I just finished starting over from scratch, starting with OS > >>>> installation > >>>> on > >>>> my host/node, and wound up with a very similar problem - the engine > >>>> couldn't > >>>> reach the hosts during the yum operation. But this time the error was > >>>> "Network is unreachable". Which is weird, because I can ssh into the > >>>> engine > >>>> and ping many of those hosts, after the operation has failed. > >>>> > >>>> Here's my latest host-deploy log from the engine. I'd appreciate any > >>>> clues. > >>> It seams that now your host is able to resolve that addresses but it's > >>> not > >>> able to connect over http. > >>> On your hosts some of them resolves as IPv6 addresses; can you please try > >>> to use curl to get one of the file that it wasn't able to fetch? > >>> Can you please check your network configuration before and after > >>> host-deploy? > >> I can give you the network configuration after host-deploy, at least for > >> the > >> host/Node. The engine won't start for me this morning, after I shut down > >> the > >> host for the night. > >> > >> In order to give you the config before host-deploy (or, apparently for the > >> engine), I'll have to re-install the OS on the host and start again from > >> scratch. Obviously I'd rather not do that unless absolutely necessary. > >> > >> Here's the host config after the failed host-deploy: > >> > >> Host/Node: > >> > >> # ip route > >> 169.254.0.0/16 dev ovirtmgmt scope link metric 1007 > >> 172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58 > > You are missing a default gateway and so the issue. > > Are you sure that it was properly configured before trying to deploy that > > host? > > It should have been, it was a fresh OS install. So I'm starting again, and > keeping careful records of my network config. > > Here is my initial network config of my host/node, immediately following a > new OS install: > > % ip route > default via 172.16.0.1 dev p3p1 proto static metric 1024 > 172.16.0.0/16 dev p3p1 proto kernel scope link src 172.16.0.58 > > % ip addr > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group > default > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > inet 127.0.0.1/8 scope host lo > valid_lft forever preferred_lft forever > inet6 ::1/128 scope host > valid_lft forever preferred_lft forever > 2: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP > group default qlen 1000 > link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff > inet 172.16.0.58/16 brd 172.16.255.255 scope global p3p1 > valid_lft forever preferred_lft forever > inet6 fe80::baca:3aff:fe79:2212/64 scope link > valid_lft forever preferred_lft forever > 3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN > group default qlen 1000 > link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff > > > After the VM is first created, the host/node config is: > > # ip route > default via 172.16.0.1 dev ovirtmgmt > 169.254.0.0/16 dev ovirtmgmt scope link metric 1006 > 172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58 > > # ip addr > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group > default > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > inet 127.0.0.1/8 scope host lo > valid_lft forever preferred_lft forever > inet6 ::1/128 scope host > valid_lft forever preferred_lft forever > 2: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master > ovirtmgmt state UP group default qlen 1000 > link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff > inet6 fe80::baca:3aff:fe79:2212/64 scope link > valid_lft forever preferred_lft forever > 3: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN > group default qlen 1000 > link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff > 4: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue > state DOWN group default > link/ether 92:cb:9d:97:18:36 brd ff:ff:ff:ff:ff:ff > 5: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group > default > link/ether 9a:bc:29:52:82:38 brd ff:ff:ff:ff:ff:ff > 6: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state > UP group default > link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff > inet 172.16.0.58/16 brd 172.16.255.255 scope global ovirtmgmt > valid_lft forever preferred_lft forever > inet6 fe80::baca:3aff:fe79:2212/64 scope link > valid_lft forever preferred_lft forever > 7: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master > ovirtmgmt state UNKNOWN group default qlen 500 > link/ether fe:16:3e:16:a4:37 brd ff:ff:ff:ff:ff:ff > inet6 fe80::fc16:3eff:fe16:a437/64 scope link > valid_lft forever preferred_lft forever > > > At this point, I was already seeing a problem on the host/node. I remembered > that a newer version of sos package is delivered from the ovirt > repositories. So I tried to do a "yum update" on my host, and got a similar > problem: > > % sudo yum update > [sudo] password for rad: > Loaded plugins: langpacks, refresh-packagekit > Resolving Dependencies > --> Running transaction check > ---> Package sos.noarch 0:3.1-1.fc20 will be updated > ---> Package sos.noarch 0:3.2-0.2.fc20.ovirt will be an update > --> Finished Dependency Resolution > > Dependencies Resolved > > ================================================================================================================ > Package Arch Version > Repository Size > ================================================================================================================ > Updating: > sos noarch 3.2-0.2.fc20.ovirt > ovirt-3.5 292 k > > Transaction Summary > ================================================================================================================ > Upgrade 1 Package > > Total download size: 292 k > Is this ok [y/d/N]: y > Downloading packages: > No Presto metadata available for ovirt-3.5 > sos-3.2-0.2.fc20.ovirt.noarch. FAILED > http://www.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm: > [Errno 14] curl#6 - "Could not resolve host: www.gtlib.gatech.edu" > Trying other mirror. > sos-3.2-0.2.fc20.ovirt.noarch. FAILED > ftp://ftp.gtlib.gatech.edu/pub/oVirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm: > [Errno 14] curl#6 - "Could not resolve host: ftp.gtlib.gatech.edu" > Trying other mirror. > sos-3.2-0.2.fc20.ovirt.noarch. FAILED > http://resources.ovirt.org/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm: > [Errno 14] curl#6 - "Could not resolve host: resources.ovirt.org" > Trying other mirror. > sos-3.2-0.2.fc20.ovirt.noarch. FAILED > http://ftp.snt.utwente.nl/pub/software/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm: > [Errno 14] curl#6 - "Could not resolve host: ftp.snt.utwente.nl" > Trying other mirror. > sos-3.2-0.2.fc20.ovirt.noarch. FAILED > http://ftp.nluug.nl/os/Linux/virtual/ovirt/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm: > [Errno 14] curl#6 - "Could not resolve host: ftp.nluug.nl" > Trying other mirror. > sos-3.2-0.2.fc20.ovirt.noarch. FAILED > http://mirror.linux.duke.edu/ovirt/pub/ovirt-3.5/rpm/fc20/noarch/sos-3.2-0.2.fc20.ovirt.noarch.rpm: > [Errno 14] curl#6 - "Could not resolve host: mirror.linux.duke.edu" > Trying other mirror. > > > Error downloading packages: > sos-3.2-0.2.fc20.ovirt.noarch: [Errno 256] No more mirrors to try. > > > This was similar to my previous failures. I took a look, and the problem was > that /etc/resolv.conf had no nameservers, and the > /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt file contained no entries for > DNS1 or DOMAIN. > > So, it appears that when hosted-engine set up my bridged network, it > neglected to carry over the DNS configuration necessary to the bridge.
Unfortunately you find a know bug: VDSM doesn't report static DNS (DNS1 from /etc/sysconfig/network-scripts/ifcfg-ethX) and so we are going to loose them simply deploying the host: https://bugzilla.redhat.com/show_bug.cgi?id=1160667 https://bugzilla.redhat.com/show_bug.cgi?id=1160423 We are going to fix it for 3.6; thanks for reporting. > Note that I am using *static* network configuration, rather than DHCP. During > installation of the OS I am setting up the network configuration as Manual. > Perhaps the hosted-engine script is not properly prepared to deal with that? > > I went ahead and modified the ifcfg-ovirtmgmt network script (for the next > service restart/boot) and resolv.conf (I was afraid to restart the network > in the middle of hosted-engine execution since I don't know what might > already be connected to the engine). This time it got further, but > ultimately it still failed at the very end: Manually fixing /etc/resolv.conf is a valid workaroud. > [ INFO ] Waiting for the host to become operational in the engine. This may > take several minutes... > [ INFO ] Still waiting for VDSM host to become operational... > [ INFO ] The VDSM Host is now operational > Please shutdown the VM allowing the system to launch it as a > monitored service. > The system will wait until the VM is down. > [ ERROR ] Failed to execute stage 'Closing up': Error acquiring VM status > [ INFO ] Stage: Clean up > [ INFO ] Generating answer file > '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150310140028.conf' > [ INFO ] Stage: Pre-termination > [ INFO ] Stage: Termination > > > At that point, neither the ovirt-ha-broker or ovirt-ha-agent services were > running. > > Note there was no significant pause after it said "The system will wait until > the VM is down". > > After the script completed, I shut down the VM, and manually started the ha > services, and the VM came up. I could login to the Administration Portal, > and finally see my HostedEngine VM. :-) > > I seem to be in a bad state however: The Data Center has *no* storage domains > attached. I'm not sure what else might need cleaning up. Any assistance > appreciated. No, it's right: hosted engine storage domain is a special one and is currently not reported by the engine cause you cannot use it for other VMs. Simply add another storage domain and, after all, you are done. > -Bob > > > >> # ip addr > >> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group > >> default > >> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > >> inet 127.0.0.1/8 scope host lo > >> valid_lft forever preferred_lft forever > >> inet6 ::1/128 scope host > >> valid_lft forever preferred_lft forever > >> 2: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast > >> master > >> ovirtmgmt state UP group default qlen 1000 > >> link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff > >> inet6 fe80::baca:3aff:fe79:2212/64 scope link > >> valid_lft forever preferred_lft forever > >> 3: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc > >> noqueue > >> state DOWN group default > >> link/ether 56:56:f7:cf:73:27 brd ff:ff:ff:ff:ff:ff > >> 4: wlp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state > >> DOWN > >> group default qlen 1000 > >> link/ether 1c:3e:84:50:8d:c3 brd ff:ff:ff:ff:ff:ff > >> 6: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group > >> default > >> link/ether 22:a1:01:9e:30:71 brd ff:ff:ff:ff:ff:ff > >> 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue > >> state > >> UP group default > >> link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff > >> inet 172.16.0.58/16 brd 172.16.255.255 scope global ovirtmgmt > >> valid_lft forever preferred_lft forever > >> inet6 fe80::baca:3aff:fe79:2212/64 scope link > >> valid_lft forever preferred_lft forever > >> > >> > >> The only unusual thing about my setup that I can think of, from the > >> network > >> perspective, is that my physical host has a wireless interface, which I've > >> not configured. Could it be confusing hosted-engine --deploy? > >> > >> -Bob > >> > >> > > _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

