How hard is it to configure Zookeeper and get everything up and running? BTW: what zookeeper would be managing? CloudStack management servers or MySQL nodes?
On Mon, Dec 18, 2017 at 7:13 AM, Ivan Kudryavtsev <kudryavtsev...@bw-sw.com> wrote: > Hello, Marc-Aurele, I strongly believe that all mysql locks should be > removed in favour of truly DLM solution like Zookeeper. The performance of > 3node ZK ensemble should be enough to hold up to 1000-2000 locks per second > and it helps to move to truly clustered MySQL like galera without single > master server. > > 2017-12-18 15:33 GMT+07:00 Marc-Aurèle Brothier <ma...@exoscale.ch>: > > > Hi everyone, > > > > I was wondering how many of you are running CloudStack with a cluster of > > management servers. I would think most of you, but it would be nice to > hear > > everyone voices. And do you get hosts going over their capacity limits? > > > > We discovered that during the VM allocation, if you get a lot of parallel > > requests to create new VMs, most notably with large profiles, the > capacity > > increase is done too far after the host capacity checks and results in > > hosts going over their capacity limits. To detail the steps: the > deployment > > planner checks for cluster/host capacity and pick up one deployment plan > > (zone, cluster, host). The plan is stored in the database under a VMwork > > job and another thread picks that entry and starts the deployment, > > increasing the host capacity and sending the commands. Here there's a > time > > gap between the host being picked up and the capacity increase for that > > host of a couple of seconds, which is well enough to go over the capacity > > on one or more hosts. A few VMwork job can be added in the DB queue > > targeting the same host before one gets picked up. > > > > To fix this issue, we're using Zookeeper to act as the multi JVM lock > > manager thanks to their curator library ( > > https://curator.apache.org/curator-recipes/shared-lock.html). We also > > changed the time when the capacity is increased, which occurs now pretty > > much after the deployment plan is found and inside the zookeeper lock. > This > > ensure we don't go over the capacity of any host, and it has been proven > > efficient since a month in our management server cluster. > > > > This adds another potential requirement which should be discuss before > > proposing a PR. Today the code works seamlessly without ZK too, to ensure > > it's not a hard requirement, for example in a lab. > > > > Comments? > > > > Kind regards, > > Marc-Aurèle > > > > > > -- > With best regards, Ivan Kudryavtsev > Bitworks Software, Ltd. > Cell: +7-923-414-1515 > WWW: http://bitworks.software/ <http://bw-sw.com/> > -- Rafael Weingärtner