Thanks for starting this thread Remi. >From my perspective the pros of simply enabling XenServer HA are:
- automatic election of pool master in the event of hardware failure - automatic fencing of a host in the event of dom0 corruption - automatic fencing of a host in the event of heartbeat failure The risks of simply enabling XenServer HA are: - additional code to detect a newly elected pool master - acceptance of the fact an admin can force a new pool master from XenServer CLI - requirement for pool size to be greater than 2 (pool size of 2 results in semi-deterministic fencing which isn't user obvious) - understanding that storage heartbeat can be shorter than storage timeout (aggressive fencing) - understanding that HA plans are computed even when no VMs are protected (performance decrease) One question we'll want to decide on is who is the primary actor when it comes to creating the pool since that will define the first pool master. During my demo build using 4.4 at CCCEU I expected to add pool members through the CS UI, but found that adding them in XenServer was required. This left me in an indeterminate state wrt pool members. I vote that if a host is added to CS and it *is* already a member of a pool, that the pool be imported as a cluster and any future membership changes happen using CS APIs. If a host is added which isn't a member of a pool, then the user be asked if they wish to add it to an existing cluster (and behind the scenes add it to a pool), or create a new cluster and add it to that cluster. This would be a change to the "add host" semantics. Once the host is added, we can enable XenServer HA on the pool if it satisfies the requirements for XenServer HA (has shared storage and three or more pool members). There are some details we'd want to take care of, but this flow makes sense to me, and we could use it even with upgrades. -tim On Mon, May 4, 2015 at 6:04 AM, Remi Bergsma <r...@remi.nl> wrote: > Hi all, > > Since CloudStack 4.4 the implementation of HA in CloudStack was changed to > use the XenHA feature of XenServer. As of 4.4, it is expected to have XenHA > enabled for the pool (not for the VMs!) and so XenServer will be the one to > elect a new pool master, whereas CloudStack did it before. Also, XenHA > takes care of fencing the box instead of CloudStack should storage be > unavailable. To be exact, they both try to fence but XenHA is usually > faster. > > To be 100% clear: HA on VMs is in all cases done by CloudStack. It's just > that without a pool master, no VMs will be recovered anyway. This brought > some headaches to me, as first of all I didn't know. We probably need to > document this somewhere. This is important, because without XenHA turned on > you'll not get a new pool master (a behaviour change). > > Personally, I don't like the fact that we have "two captains" in case > something goes wrong. But, some say they like this behaviour. I'm OK with > both, as long as one can choose whatever suits their needs best. > > In Austin I talked to several people about this. We came up with the idea > to have CloudStack check whether XenHA is on or not. If it is, it does the > current 4.4+ behaviour (XenHA selects new pool master). When it is not, we > do the CloudStack 4.3 behaviour where CloudStack is fully in control. > > I also talked to Tim Mackey and he wants to help implement this, but he > doesn't have much time. The idea is to have someone else join in to code > the change and then Tim will be able to help out on a regularly basis > should we need in depth knowledge of XenServer or its implementation in > CloudStack. > > Before we kick this off, I'd like to discuss and agree that this is the way > forward. Also, if you're interested in joining this effort let me know and > I'll kick it off. > > Regards, > Remi >