[Consultation] Remove DB HA feature (db.ha.enabled)
Hello everyone, A few weeks ago I asked you if you use or managed to use the DB HA Cloudstack feature (db.ha.enabled)[1] and after reading some of the replies and doing intensive testing myself I have found out that the feature is indeed non-functional, it's broken. In my testing I discovered DB HA can easily be done outside of Cloudstack by employing load balancers and other techniques. Personally I have achieved that by using Haproxy in front of Galera cluster, but also introduced Keepalived (vrrp) in my setup to "balance" multiple Haproxies which also worked well. As such, since the feature is basically broken, it will not be trivial to fix it and there are better ways of doing HA, then I propose to remove it altogether. Thoughts? Anyone against it? Cheers [1] - https://docs.cloudstack.apache.org/en/latest/adminguide/reliability.html#database-high-availability
Re: [Consultation] Remove DB HA feature (db.ha.enabled)
+1 it's a broken feature that at least doesn't work with MySQL 8.x, I'm not sure if it worked with prior versions of MySQL. However, we need to document some sort of suggested MySQL HA setup in our docs. Regards. From: Nux Sent: Tuesday, August 22, 2023 18:54 To: us...@cloudstack.apache.org ; Dev Subject: [Consultation] Remove DB HA feature (db.ha.enabled) Hello everyone, A few weeks ago I asked you if you use or managed to use the DB HA Cloudstack feature (db.ha.enabled)[1] and after reading some of the replies and doing intensive testing myself I have found out that the feature is indeed non-functional, it's broken. In my testing I discovered DB HA can easily be done outside of Cloudstack by employing load balancers and other techniques. Personally I have achieved that by using Haproxy in front of Galera cluster, but also introduced Keepalived (vrrp) in my setup to "balance" multiple Haproxies which also worked well. As such, since the feature is basically broken, it will not be trivial to fix it and there are better ways of doing HA, then I propose to remove it altogether. Thoughts? Anyone against it? Cheers [1] - https://docs.cloudstack.apache.org/en/latest/adminguide/reliability.html#database-high-availability
Re: [Consultation] Remove DB HA feature (db.ha.enabled)
Happy to contribute a doc on how to achieve HA if we decide to remove this. Thanks On 2023-08-22 15:01, Rohit Yadav wrote: +1 it's a broken feature that at least doesn't work with MySQL 8.x, I'm not sure if it worked with prior versions of MySQL. However, we need to document some sort of suggested MySQL HA setup in our docs. Regards. From: Nux Sent: Tuesday, August 22, 2023 18:54 To: us...@cloudstack.apache.org ; Dev Subject: [Consultation] Remove DB HA feature (db.ha.enabled) Hello everyone, A few weeks ago I asked you if you use or managed to use the DB HA Cloudstack feature (db.ha.enabled)[1] and after reading some of the replies and doing intensive testing myself I have found out that the feature is indeed non-functional, it's broken. In my testing I discovered DB HA can easily be done outside of Cloudstack by employing load balancers and other techniques. Personally I have achieved that by using Haproxy in front of Galera cluster, but also introduced Keepalived (vrrp) in my setup to "balance" multiple Haproxies which also worked well. As such, since the feature is basically broken, it will not be trivial to fix it and there are better ways of doing HA, then I propose to remove it altogether. Thoughts? Anyone against it? Cheers [1] - https://docs.cloudstack.apache.org/en/latest/adminguide/reliability.html#database-high-availability
Re: [PROPOSE] ACS 4.18.1.0 release
Thanks for the update, Wei. Good job so far! On 2023-08-21 12:48, Wei ZHOU wrote: Hi all, In the last weeks, we have merged a few bug fixes into the 4.18 branch. We are still working on remaining bug fixes and reviewing pull requests. 22 pull requests are open for review: https://github.com/apache/cloudstack/pulls?q=is%3Aopen+is%3Apr+milestone%3A4.18.1.0 51 issues are open (including 2 critical , 15 major, 33 minor issues): https://github.com/apache/cloudstack/issues?q=is%3Aopen+is%3Aissue+milestone%3A4.18.1.0 *The code freeze time of 4.18.1.0 will be 12:00pm UTC (1pm BST, 2pm CEST), 28th August*. The open pull requests and issues after code freeze will be moved to 4.18.2.0 milestone. -Wei On Wed, 2 Aug 2023 at 03:22, Wei ZHOU wrote: Hi all, Here is an update of Apache CloudStack 4.18.1.0 release: There are some open PRs and issues on github: 37 pull requests are open for review: https://github.com/apache/cloudstack/pulls?q=is%3Aopen+is%3Apr+milestone%3A4.18.1.0 66 issues are open (including 1 BLOCKER, 1 critical , 18 major, 39 minor issues): https://github.com/apache/cloudstack/issues?q=is%3Aopen+is%3Aissue+milestone%3A4.18.1.0 We are busy with them. The processes need to be postponed for 2-4 weeks. -Wei On Thu, 4 May 2023 at 10:34, Wei ZHOU wrote: Hi all, Currently CloudStack 4.18.0.0 is the latest LTS release. There are some bugs and pull requests with 4.18.0.0 [1], including the fix for the upgrade issue if users use MySQL 5.6 and 5.7. I would like to propose the release of 4.18.1.0 and the timeline - from now till the end of July (3 months): accept bug fixes and minor improvements [2] - first week in Aug: stablisation efforts, accept only blocker and critical bug fixes. - Aug: start cutting RCs, vote and finish release work. I will push myself as the release manager (RM) of 4.18.1.0, if nobody objects. In case anyone wants to include a bug fix or a pull request in 4.18.1.0 milestone, please mention me (weizhouapache) on github. [1] https://github.com/apache/cloudstack/milestone/27 [2] https://cwiki.apache.org/confluence/display/CLOUDSTACK/LTS Any suggestions ? Kind regards, Wei
Re: Register Now for CloudStack Collaboration Conference - First 50 get a CloudStack T-shirt
Do you have a picture of them? Thanks On 2023-08-21 13:14, Ivet Petrova wrote: Hi all, I am happy to announce that we have a special surprise for the early birds! As you know the CloudStack Collaboration Conference will happen on November 23-24th in Paris, France. Now we have a special surprise for the first 50 people registered for the event! Get an Apache Cloudstack branded Tshirt special edition for the conference in Paris. Hurry up, we are limited in these cool t-shirts: https://events.hubilo.com/cloudstack-collaboration-conference-2023/register Kind regards,
Re: [Consultation] Remove DB HA feature (db.ha.enabled)
We faced some issues when running Galera. We went back to master slave. Anyone using Galera in production for a long time? Regards, Shiv > On 22-Aug-2023, at 19:34, Nux wrote: > > Happy to contribute a doc on how to achieve HA if we decide to remove this. > > Thanks > > On 2023-08-22 15:01, Rohit Yadav wrote: >> +1 it's a broken feature that at least doesn't work with MySQL 8.x, I'm not >> sure if it worked with prior versions of MySQL. However, we need to document >> some sort of suggested MySQL HA setup in our docs. >> Regards. >> >> From: Nux >> Sent: Tuesday, August 22, 2023 18:54 >> To: us...@cloudstack.apache.org ; Dev >> >> Subject: [Consultation] Remove DB HA feature (db.ha.enabled) >> Hello everyone, >> A few weeks ago I asked you if you use or managed to use the DB HA >> Cloudstack feature (db.ha.enabled)[1] and after reading some of the >> replies and doing intensive testing myself I have found out that the >> feature is indeed non-functional, it's broken. >> In my testing I discovered DB HA can easily be done outside of >> Cloudstack by employing load balancers and other techniques. >> Personally I have achieved that by using Haproxy in front of Galera >> cluster, but also introduced Keepalived (vrrp) in my setup to "balance" >> multiple Haproxies which also worked well. >> As such, since the feature is basically broken, it will not be trivial >> to fix it and there are better ways of doing HA, then I propose to >> remove it altogether. >> Thoughts? Anyone against it? >> Cheers >> [1] - >> https://docs.cloudstack.apache.org/en/latest/adminguide/reliability.html#database-high-availability -- This message is intended only for the use of the individual or entity to which it is addressed and may contain confidential and/or privileged information. If you are not the intended recipient, please delete the original message and any copy of it from your computer system. You are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited unless proper authorization has been obtained for such action. If you have received this communication in error, please notify the sender immediately. Although IndiQus attempts to sweep e-mail and attachments for viruses, it does not guarantee that both are virus-free and accepts no liability for any damage sustained as a result of viruses.
Re: [Consultation] Remove DB HA feature (db.ha.enabled)
But what do you think of the removal of DB HA code? When using Galera you need to query against a single node, don't spread the load among all 3, as this will break certain locking functionality in Cloudstack and lead to problems. In a Haproxy configuration you should be keeping just one active, eg: server galera1 10.0.3.2:3306 check server galera2 10.0.3.3:3306 check backup server galera3 10.0.3.4:3306 check backup Regards On 2023-08-22 15:36, K B Shiv Kumar wrote: We faced some issues when running Galera. We went back to master slave. Anyone using Galera in production for a long time? Regards, Shiv On 22-Aug-2023, at 19:34, Nux wrote: Happy to contribute a doc on how to achieve HA if we decide to remove this. Thanks On 2023-08-22 15:01, Rohit Yadav wrote: +1 it's a broken feature that at least doesn't work with MySQL 8.x, I'm not sure if it worked with prior versions of MySQL. However, we need to document some sort of suggested MySQL HA setup in our docs. Regards. From: Nux Sent: Tuesday, August 22, 2023 18:54 To: us...@cloudstack.apache.org ; Dev Subject: [Consultation] Remove DB HA feature (db.ha.enabled) Hello everyone, A few weeks ago I asked you if you use or managed to use the DB HA Cloudstack feature (db.ha.enabled)[1] and after reading some of the replies and doing intensive testing myself I have found out that the feature is indeed non-functional, it's broken. In my testing I discovered DB HA can easily be done outside of Cloudstack by employing load balancers and other techniques. Personally I have achieved that by using Haproxy in front of Galera cluster, but also introduced Keepalived (vrrp) in my setup to "balance" multiple Haproxies which also worked well. As such, since the feature is basically broken, it will not be trivial to fix it and there are better ways of doing HA, then I propose to remove it altogether. Thoughts? Anyone against it? Cheers [1] - https://docs.cloudstack.apache.org/en/latest/adminguide/reliability.html#database-high-availability
Re: [Consultation] Remove DB HA feature (db.ha.enabled)
Well, if it is broken and it is not prominently mentioned anywhere new adopters may go ahead with that on production. So I guess best to remove or at least mention that it is not production grade. Thanks Shiv > On 22-Aug-2023, at 20:12, Nux wrote: > > But what do you think of the removal of DB HA code? > > When using Galera you need to query against a single node, don't spread the > load among all 3, as this will break certain locking functionality in > Cloudstack and lead to problems. > > In a Haproxy configuration you should be keeping just one active, eg: >server galera1 10.0.3.2:3306 check >server galera2 10.0.3.3:3306 check backup >server galera3 10.0.3.4:3306 check backup > > Regards > > On 2023-08-22 15:36, K B Shiv Kumar wrote: >> We faced some issues when running Galera. We went back to master slave. >> Anyone using Galera in production for a long time? >> Regards, >> Shiv >>> On 22-Aug-2023, at 19:34, Nux wrote: >>> Happy to contribute a doc on how to achieve HA if we decide to remove this. >>> Thanks >>> On 2023-08-22 15:01, Rohit Yadav wrote: +1 it's a broken feature that at least doesn't work with MySQL 8.x, I'm not sure if it worked with prior versions of MySQL. However, we need to document some sort of suggested MySQL HA setup in our docs. Regards. From: Nux Sent: Tuesday, August 22, 2023 18:54 To: us...@cloudstack.apache.org ; Dev Subject: [Consultation] Remove DB HA feature (db.ha.enabled) Hello everyone, A few weeks ago I asked you if you use or managed to use the DB HA Cloudstack feature (db.ha.enabled)[1] and after reading some of the replies and doing intensive testing myself I have found out that the feature is indeed non-functional, it's broken. In my testing I discovered DB HA can easily be done outside of Cloudstack by employing load balancers and other techniques. Personally I have achieved that by using Haproxy in front of Galera cluster, but also introduced Keepalived (vrrp) in my setup to "balance" multiple Haproxies which also worked well. As such, since the feature is basically broken, it will not be trivial to fix it and there are better ways of doing HA, then I propose to remove it altogether. Thoughts? Anyone against it? Cheers [1] - https://docs.cloudstack.apache.org/en/latest/adminguide/reliability.html#database-high-availability -- This message is intended only for the use of the individual or entity to which it is addressed and may contain confidential and/or privileged information. If you are not the intended recipient, please delete the original message and any copy of it from your computer system. You are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited unless proper authorization has been obtained for such action. If you have received this communication in error, please notify the sender immediately. Although IndiQus attempts to sweep e-mail and attachments for viruses, it does not guarantee that both are virus-free and accepts no liability for any damage sustained as a result of viruses.
Re: [Consultation] Remove DB HA feature (db.ha.enabled)
Shiv, Lucian, all, It's a known limitation for all available MySQL clustering solutions such as Galera, Percona XtraDB, Innodb Cluster that GET_LOCK [1] isn't supported [2][3]. The GET_LOCK is used by CloudStack for global locking critical code when more than one management server(s) are running against the same database/server. (MySQL NDB, InnoDB cluster could be something to experiment, as well as, coming up with a locking service framework which could help get around the mysql/native get_lock limitations). [1] https://dev.mysql.com/doc/refman/8.0/en/locking-functions.html#:~:text=MySQL%20enforces%20a%20maximum%20length,lock%20with%20the%20same%20name. [2] https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/ [3] https://docs.percona.com/percona-xtradb-cluster/8.0/limitation.html Regards. From: Nux Sent: Tuesday, August 22, 2023 20:12 To: dev@cloudstack.apache.org Cc: us...@cloudstack.apache.org ; K B Shiv Kumar Subject: Re: [Consultation] Remove DB HA feature (db.ha.enabled) But what do you think of the removal of DB HA code? When using Galera you need to query against a single node, don't spread the load among all 3, as this will break certain locking functionality in Cloudstack and lead to problems. In a Haproxy configuration you should be keeping just one active, eg: server galera1 10.0.3.2:3306 check server galera2 10.0.3.3:3306 check backup server galera3 10.0.3.4:3306 check backup Regards On 2023-08-22 15:36, K B Shiv Kumar wrote: > We faced some issues when running Galera. We went back to master slave. > > Anyone using Galera in production for a long time? > > Regards, > Shiv > > > On 22-Aug-2023, at 19:34, Nux wrote: >> >> Happy to contribute a doc on how to achieve HA if we decide to remove >> this. >> >> Thanks >> >> On 2023-08-22 15:01, Rohit Yadav wrote: >>> +1 it's a broken feature that at least doesn't work with MySQL 8.x, >>> I'm not sure if it worked with prior versions of MySQL. However, we >>> need to document some sort of suggested MySQL HA setup in our docs. >>> Regards. >>> >>> From: Nux >>> Sent: Tuesday, August 22, 2023 18:54 >>> To: us...@cloudstack.apache.org ; Dev >>> >>> Subject: [Consultation] Remove DB HA feature (db.ha.enabled) >>> Hello everyone, >>> A few weeks ago I asked you if you use or managed to use the DB HA >>> Cloudstack feature (db.ha.enabled)[1] and after reading some of the >>> replies and doing intensive testing myself I have found out that the >>> feature is indeed non-functional, it's broken. >>> In my testing I discovered DB HA can easily be done outside of >>> Cloudstack by employing load balancers and other techniques. >>> Personally I have achieved that by using Haproxy in front of Galera >>> cluster, but also introduced Keepalived (vrrp) in my setup to >>> "balance" >>> multiple Haproxies which also worked well. >>> As such, since the feature is basically broken, it will not be >>> trivial >>> to fix it and there are better ways of doing HA, then I propose to >>> remove it altogether. >>> Thoughts? Anyone against it? >>> Cheers >>> [1] - >>> https://docs.cloudstack.apache.org/en/latest/adminguide/reliability.html#database-high-availability
Re: [Consultation] Remove DB HA feature (db.ha.enabled)
Thanks for elaborating, Rohit. On 2023-08-22 16:25, Rohit Yadav wrote: Shiv, Lucian, all, It's a known limitation for all available MySQL clustering solutions such as Galera, Percona XtraDB, Innodb Cluster that GET_LOCK [1] isn't supported [2][3]. The GET_LOCK is used by CloudStack for global locking critical code when more than one management server(s) are running against the same database/server. (MySQL NDB, InnoDB cluster could be something to experiment, as well as, coming up with a locking service framework which could help get around the mysql/native get_lock limitations). [1] https://dev.mysql.com/doc/refman/8.0/en/locking-functions.html#:~:text=MySQL%20enforces%20a%20maximum%20length,lock%20with%20the%20same%20name. [2] https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/ [3] https://docs.percona.com/percona-xtradb-cluster/8.0/limitation.html Regards. From: Nux Sent: Tuesday, August 22, 2023 20:12 To: dev@cloudstack.apache.org Cc: us...@cloudstack.apache.org ; K B Shiv Kumar Subject: Re: [Consultation] Remove DB HA feature (db.ha.enabled) But what do you think of the removal of DB HA code? When using Galera you need to query against a single node, don't spread the load among all 3, as this will break certain locking functionality in Cloudstack and lead to problems. In a Haproxy configuration you should be keeping just one active, eg: server galera1 10.0.3.2:3306 check server galera2 10.0.3.3:3306 check backup server galera3 10.0.3.4:3306 check backup Regards On 2023-08-22 15:36, K B Shiv Kumar wrote: We faced some issues when running Galera. We went back to master slave. Anyone using Galera in production for a long time? Regards, Shiv On 22-Aug-2023, at 19:34, Nux wrote: Happy to contribute a doc on how to achieve HA if we decide to remove this. Thanks On 2023-08-22 15:01, Rohit Yadav wrote: +1 it's a broken feature that at least doesn't work with MySQL 8.x, I'm not sure if it worked with prior versions of MySQL. However, we need to document some sort of suggested MySQL HA setup in our docs. Regards. From: Nux Sent: Tuesday, August 22, 2023 18:54 To: us...@cloudstack.apache.org ; Dev Subject: [Consultation] Remove DB HA feature (db.ha.enabled) Hello everyone, A few weeks ago I asked you if you use or managed to use the DB HA Cloudstack feature (db.ha.enabled)[1] and after reading some of the replies and doing intensive testing myself I have found out that the feature is indeed non-functional, it's broken. In my testing I discovered DB HA can easily be done outside of Cloudstack by employing load balancers and other techniques. Personally I have achieved that by using Haproxy in front of Galera cluster, but also introduced Keepalived (vrrp) in my setup to "balance" multiple Haproxies which also worked well. As such, since the feature is basically broken, it will not be trivial to fix it and there are better ways of doing HA, then I propose to remove it altogether. Thoughts? Anyone against it? Cheers [1] - https://docs.cloudstack.apache.org/en/latest/adminguide/reliability.html#database-high-availability
Re: [Consultation] Remove DB HA feature (db.ha.enabled)
New adopters may not go ahead with it in production because they won't get it working, unless they fix a lot of code, that would be a nice pull request. :) On 2023-08-22 16:25, K B Shiv Kumar wrote: Well, if it is broken and it is not prominently mentioned anywhere new adopters may go ahead with that on production. So I guess best to remove or at least mention that it is not production grade. Thanks Shiv On 22-Aug-2023, at 20:12, Nux wrote: But what do you think of the removal of DB HA code? When using Galera you need to query against a single node, don't spread the load among all 3, as this will break certain locking functionality in Cloudstack and lead to problems. In a Haproxy configuration you should be keeping just one active, eg: server galera1 10.0.3.2:3306 check server galera2 10.0.3.3:3306 check backup server galera3 10.0.3.4:3306 check backup Regards On 2023-08-22 15:36, K B Shiv Kumar wrote: We faced some issues when running Galera. We went back to master slave. Anyone using Galera in production for a long time? Regards, Shiv On 22-Aug-2023, at 19:34, Nux wrote: Happy to contribute a doc on how to achieve HA if we decide to remove this. Thanks On 2023-08-22 15:01, Rohit Yadav wrote: +1 it's a broken feature that at least doesn't work with MySQL 8.x, I'm not sure if it worked with prior versions of MySQL. However, we need to document some sort of suggested MySQL HA setup in our docs. Regards. From: Nux Sent: Tuesday, August 22, 2023 18:54 To: us...@cloudstack.apache.org ; Dev Subject: [Consultation] Remove DB HA feature (db.ha.enabled) Hello everyone, A few weeks ago I asked you if you use or managed to use the DB HA Cloudstack feature (db.ha.enabled)[1] and after reading some of the replies and doing intensive testing myself I have found out that the feature is indeed non-functional, it's broken. In my testing I discovered DB HA can easily be done outside of Cloudstack by employing load balancers and other techniques. Personally I have achieved that by using Haproxy in front of Galera cluster, but also introduced Keepalived (vrrp) in my setup to "balance" multiple Haproxies which also worked well. As such, since the feature is basically broken, it will not be trivial to fix it and there are better ways of doing HA, then I propose to remove it altogether. Thoughts? Anyone against it? Cheers [1] - https://docs.cloudstack.apache.org/en/latest/adminguide/reliability.html#database-high-availability
Re: [Consultation] Remove DB HA feature (db.ha.enabled)
Hello Lucian and all, I am -1 on removing the whole DB HA feature from CloudStack. As we discussed on July[1], the current properties we have on "db.properties" regarding DB HA are hardcoded and only address some MySQL properties, which are not fully compatible with the properties for configuring DB HA on MariaDB. It indeed has some problems; however, I think we should keep the functionality and improve it, to enrich CloudStack and avoid using other layers to accomplish the goals. It is good to have a workaround, though. João Jandre and I are already working on a solution to flexibilize the DB parameters in order to allow one to configure DB HA properly when using MariaDB (and also do several other configurations). João, could you point to the PR that addresses the changes and share the configurations and tests we have done so far? Best regards, Daniel Salvador (gutoveronezi) [1] - https://lists.apache.org/thread/j0mmwy9dfr9k2kbnnjxcr2m7y8zwd34c On Tue, Aug 22, 2023 at 12:42 PM Nux wrote: > New adopters may not go ahead with it in production because they won't > get it working, unless they fix a lot of code, that would be a nice pull > request. :) > > > On 2023-08-22 16:25, K B Shiv Kumar wrote: > > Well, if it is broken and it is not prominently mentioned anywhere new > > adopters may go ahead with that on production. So I guess best to > > remove or at least mention that it is not production grade. > > > > Thanks > > Shiv > > > >> On 22-Aug-2023, at 20:12, Nux wrote: > >> > >> But what do you think of the removal of DB HA code? > >> > >> When using Galera you need to query against a single node, don't > >> spread the load among all 3, as this will break certain locking > >> functionality in Cloudstack and lead to problems. > >> > >> In a Haproxy configuration you should be keeping just one active, eg: > >>server galera1 10.0.3.2:3306 check > >>server galera2 10.0.3.3:3306 check backup > >>server galera3 10.0.3.4:3306 check backup > >> > >> Regards > >> > >> On 2023-08-22 15:36, K B Shiv Kumar wrote: > >>> We faced some issues when running Galera. We went back to master > >>> slave. > >>> Anyone using Galera in production for a long time? > >>> Regards, > >>> Shiv > On 22-Aug-2023, at 19:34, Nux wrote: > Happy to contribute a doc on how to achieve HA if we decide to > remove this. > Thanks > On 2023-08-22 15:01, Rohit Yadav wrote: > > +1 it's a broken feature that at least doesn't work with MySQL 8.x, > > I'm not sure if it worked with prior versions of MySQL. However, we > > need to document some sort of suggested MySQL HA setup in our docs. > > Regards. > > > > From: Nux > > Sent: Tuesday, August 22, 2023 18:54 > > To: us...@cloudstack.apache.org ; Dev > > > > Subject: [Consultation] Remove DB HA feature (db.ha.enabled) > > Hello everyone, > > A few weeks ago I asked you if you use or managed to use the DB HA > > Cloudstack feature (db.ha.enabled)[1] and after reading some of the > > replies and doing intensive testing myself I have found out that > > the > > feature is indeed non-functional, it's broken. > > In my testing I discovered DB HA can easily be done outside of > > Cloudstack by employing load balancers and other techniques. > > Personally I have achieved that by using Haproxy in front of Galera > > cluster, but also introduced Keepalived (vrrp) in my setup to > > "balance" > > multiple Haproxies which also worked well. > > As such, since the feature is basically broken, it will not be > > trivial > > to fix it and there are better ways of doing HA, then I propose to > > remove it altogether. > > Thoughts? Anyone against it? > > Cheers > > [1] - > > > https://docs.cloudstack.apache.org/en/latest/adminguide/reliability.html#database-high-availability >
Re: [Consultation] Remove DB HA feature (db.ha.enabled)
Sure, Daniel PR #7895 is currently in draft as we need to do some more tests. However, the intention is to enable users to configure the DB connection URI directly through `db.properties` file. These are the tests that have been done so far with ACS without this PR changeset: Using the current version in a setup with MariaDB and Galera, with a cluster size of 3 and the following configuration on the db.properties file: ``` # High Availability And Cluster Properties db.ha.enabled=true db.ha.loadBalanceStrategy=com.cloud.utils.db.StaticStrategy # cloud stack Database db.cloud.replicas=192.168.201.161,192.168.201.162 db.cloud.autoReconnect=false db.cloud.failOverReadOnly=false db.cloud.reconnectAtTxEnd=false db.cloud.autoReconnectForPools=true db.cloud.secondsBeforeRetrySource=1800 db.cloud.queriesBeforeRetrySource=5000 db.cloud.initialTimeout=3600 ``` When the MariaDB service stops in the main node, ACS switches to one of the other two nodes. However, if the host is shut down, the switch never occurs. Then, we also did tests using the changes proposed in the PR, by configuring the db.cloud.uri: ``` db.cloud.uri=jdbc:mariadb:sequential://192.168.201.160:3306,192.168.201.161:3306,192.168.201.162:3306/cloud?autoReconnect=true&prepStmtCacheSize=517&cachePrepStmts=true&sessionVariables=sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION'&serverTimezone=UTC # These properties are ignored when setting the URI manually, so no need to set them. # High Availability And Cluster Properties # db.ha.enabled=true # db.ha.loadBalanceStrategy=com.cloud.utils.db.StaticStrategy # cloud stack Database # db.cloud.replicas=192.168.201.161,192.168.201.162 # db.cloud.autoReconnect=false # db.cloud.failOverReadOnly=false # db.cloud.reconnectAtTxEnd=false # db.cloud.autoReconnectForPools=true # db.cloud.secondsBeforeRetrySource=1800 # db.cloud.queriesBeforeRetrySource=5000 # db.cloud.initialTimeout=3600 ``` I was able to configure and use the sequential failover mode. This way, when the MariaDB service stops in the main node and even if the host is shut down, ACS is able to switch to the other DBs. There are two differences between defining the URI manually (which is proposed with PR#7895) and the generated by ACS. The first one is the `jdbc:mariadb`, which is the driver that makes the connection with the DBMS, this enables usage of MariaDB URL configurations, this driver is being introduced into ACS with PR#7895. The second one is the usage of the `sequential` [1] failover mode, that will try to connect to hosts in the order in which they were declared in the connection URL, so the first available host is used for all queries, and if one of the hosts is shut down, it will try to reconnect with the other on the list. As this mode only connects to a single DB, the problems referenced by Rohit are avoided. But the failover mechanism is still in place. Best regards, João Jandre [1] - https://mariadb.com/kb/en/about-mariadb-connector-j/ On 22/08/2023 16:03, Daniel Salvador wrote: Hello Lucian and all, I am -1 on removing the whole DB HA feature from CloudStack. As we discussed on July[1], the current properties we have on "db.properties" regarding DB HA are hardcoded and only address some MySQL properties, which are not fully compatible with the properties for configuring DB HA on MariaDB. It indeed has some problems; however, I think we should keep the functionality and improve it, to enrich CloudStack and avoid using other layers to accomplish the goals. It is good to have a workaround, though. João Jandre and I are already working on a solution to flexibilize the DB parameters in order to allow one to configure DB HA properly when using MariaDB (and also do several other configurations). João, could you point to the PR that addresses the changes and share the configurations and tests we have done so far? Best regards, Daniel Salvador (gutoveronezi) [1] - https://lists.apache.org/thread/j0mmwy9dfr9k2kbnnjxcr2m7y8zwd34c On Tue, Aug 22, 2023 at 12:42 PM Nux wrote: New adopters may not go ahead with it in production because they won't get it working, unless they fix a lot of code, that would be a nice pull request. :) On 2023-08-22 16:25, K B Shiv Kumar wrote: Well, if it is broken and it is not prominently mentioned anywhere new adopters may go ahead with that on production. So I guess best to remove or at least mention that it is not production grade. Thanks Shiv On 22-Aug-2023, at 20:12, Nux wrote: But what do you think of the removal of DB HA code? When using Galera you need to query against a single node, don't spread the load among all 3, as this will break certain locking functionality in Cloudstack and lead to problems. In a Haproxy configuration you should be keeping just one active, eg: server galera1 10.0.3.2:3306 check server galera2 10.0.3.3:3306 check backup server galera3 10.0.3.4:3306 c