Re: [Users] hosted engine help

Giuseppe Ragusa Tue, 11 Mar 2014 08:09:14 -0700

> Date: Tue, 11 Mar 2014 15:16:36 +0100
> From: [email protected]
> To: [email protected]; [email protected]; [email protected]
> CC: [email protected]; [email protected]; [email protected]
> Subject: Re: [Users] hosted engine help
> 
> Il 10/03/2014 22:32, Giuseppe Ragusa ha scritto:
> > Hi all,
> > 
> >> Date: Mon, 10 Mar 2014 12:56:19 -0400
> >> From: [email protected]
> >> To: [email protected]
> >> CC: [email protected]
> >> Subject: Re: [Users] hosted engine help
> >>
> >>
> >>
> >> ----- Original Message -----
> >> > From: "Martin Sivak" <[email protected]>
> >> > To: "Dan Kenigsberg" <[email protected]>
> >> > Cc: [email protected]
> >> > Sent: Saturday, March 8, 2014 11:52:59 PM
> >> > Subject: Re: [Users] hosted engine help
> >> >
> >> > Hi Jason,
> >> >
> >> > can you please attach the full logs? We had very similar issue before I 
> >> > we
> >> > need to see if is the same or not.
> >>
> >> I may have to recreate it -- I switched back to an all in one engine after 
> >> my
> >> setup started refusing to run the engine at all. It's no fun losing your 
> >> engine!
> >>
> >> This was a migrated-from-standalone setup, maybe that caused additional 
> >> wrinkles...
> >>
> >> Jason
> >>
> >> >
> >> > Thanks
> > 
> > I experienced the exact same symptoms as Jason on a from-scratch 
> > installation on two physical nodes with CentOS 6.5 (fully up-to-date) using 
> > oVirt
> > 3.4.0_pre (latest test-day release) and GlusterFS 3.5.0beta3 (with 
> > Gluster-provided NFS as storage for the self-hosted engine VM only).
> 
> Using GlusterFS with hosted-engine storage is not supported and not 
> recommended.
> HA daemon may not work properly there.


If it is unsupported (and particularly "not recommended") even with the 
interposed NFS (the native Gluster-provided NFSv3 export of a volume), then 
which is the recommended way to setup a fault-tolerant load-balanced 2 node 
oVirt cluster (without external dedicated SAN/NAS)?

> > I roughly followed the guide from Andrew Lau:
> > 
> > http://www.andrewklau.com/ovirt-hosted-engine-with-3-4-0-nightly/
> > 
> > with some variations due to newer packages (resolved bugs) and different 
> > hardware setup (no VLANs in my setup: physically separated networks; custom
> > second nic added to Engine VM template before deploying etc.)
> > 
> > The self-hosted installation on first node + Engine VM (configured for 
> > managing both oVirt and the storage; Datacenter default set to NFS because 
> > no
> > GlusterFS offered) went apparently smooth, but the HA-agent failed to start 
> > at the very end (same errors in logs as Jason: the storage domain seems
> > "missing") and I was only able to start it all manually with:
> > 
> > hosted-engine --connect-storage
> > hosted-engine --start-pool
> 
> The above commands are used for development and shouldn't be used for 
> starting the engine.

Directly starting the engine (with the command below) failed because of storage 
unavailability, so I used the above "trick" as a "last resort" to at least 
prove that the engine was able to start and had not been somewhat "destroyed" 
or "lost" in the process (but I do understand that it is an extreme debug-only 
action).

> > hosted-engine --vm-start
> > 
> > then the Engine came up and I could use it, I even registered the second 
> > node (same final error in HA-agent) and tried to add GlusterFS storage
> > domains for further VMs and ISOs (by the way: the original NFS-GlusterFS 
> > domain for Engine VM only is not present inside the Engine web UI) but it
> > always failed activating the domains (they remain "Inactive").
> > 
> > Furthermore the engine gets killed some time after starting (from 3 up to 
> > 11 hours later) and the only way to get it back is repeating the above 
> > commands.
> 
> Need logs for this.

I will try to reproduce it all, but I can recall that on libvirt logs 
(HostedEngine.log) there was always clear indication of the PID that killed the 
Engine VM and each time it belonged to an instance of sanlock.

> > I always managed GlusterFS "natively" (not through oVirt) from the 
> > commandline and verified that the NFS-exported Engine-VM-only volume gets
> > replicated, but I obviously failed to try migration because the HA part 
> > results inactive and oVirt refuse to migrate the Engine.
> > 
> > Since I tried many times, with variations and further manual actions 
> > between (like trying to manually mount the NFS Engine domain, restarting the
> > HA-agent only etc.), my logs are "cluttered", so I should start from 
> > scratch again and pack up all logs in one swipe.
> 
> +1

;>

> > Tell me what I should capture and at which points in the whole process and 
> > I will try to follow up as soon as possible.
> 
> What:
> hosted-engine-setup, hosted-engine-ha, vdsm, libvirt, sanlock from the 
> physical hosts and engine and server logs from the hosted engine VM.
> 
> When:
> As soon as you see an error.

If the setup design (wholly GlusterFS based) is somewhat flawed, please point 
me to some hint/docs/guide for the right way of setting it up on 2 standalone 
physical nodes, so as not to waste your time in chasing "defects" in something 
that is not supposed to be working anyway.

I will follow your advice and try it accordingly.

Many thanks again,
Giuseppe

> > Many thanks,
> > Giuseppe
> > 
> >> > --
> >> > Martin Sivák
> >> > [email protected]
> >> > Red Hat Czech
> >> > RHEV-M SLA / Brno, CZ
> >> >
> >> > ----- Original Message -----
> >> > > On Fri, Mar 07, 2014 at 10:17:43AM +0100, Sandro Bonazzola wrote:
> >> > > > Il 07/03/2014 01:10, Jason Brooks ha scritto:
> >> > > > > Hey everyone, I've been testing out oVirt 3.4 w/ hosted engine, and
> >> > > > > while I've managed to bring the engine up, I've only been able to 
> >> > > > > do it
> >> > > > > manually, using "hosted-engine --vm-start".
> >> > > > >
> >> > > > > The ovirt-ha-agent service fails reliably for me, erroring out with
> >> > > > > "RequestError: Request failed: success."
> >> > > > >
> >> > > > > I've pasted error passages from the ha agent and vdsm logs below.
> >> > > > >
> >> > > > > Any pointers?
> >> > > >
> >> > > > looks like a VDSM bug, Dan?
> >> > >
> >> > > Why? The exception is raised from deep inside the 
> >> > > ovirt_hosted_engine_ha
> >> > > code.
> >> > > _______________________________________________
> >> > > Users mailing list
> >> > > [email protected]
> >> > > http://lists.ovirt.org/mailman/listinfo/users
> >> > >
> >> > _______________________________________________
> >> > Users mailing list
> >> > [email protected]
> >> > http://lists.ovirt.org/mailman/listinfo/users
> >> >
> >> _______________________________________________
> >> Users mailing list
> >> [email protected]
> >> http://lists.ovirt.org/mailman/listinfo/users
> > 
> > 
> > _______________________________________________
> > Users mailing list
> > [email protected]
> > http://lists.ovirt.org/mailman/listinfo/users
> > 
> 
> 
> -- 
> Sandro Bonazzola
> Better technology. Faster innovation. Powered by community collaboration.
> See how it works at redhat.com

_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Re: [Users] hosted engine help

Reply via email to