So, this fixed the problem? Can you keep this running for a while longer? Just to make sure. Then, I can open a PR to fix it in master.
On Mon, Dec 18, 2017 at 9:02 AM, Alireza Eskandari <astro.alir...@gmail.com> wrote: > Thank you Rafael, > I test your fix and it seems that I have got the expected result. You > can see the exception raised for database failover. > I should notice I replace the file for cloudstack-mnagement and > cloudstack-usage: > /usr/share/cloudstack-usage/lib/cloud-framework-cluster-4.9.3.0.jar > /usr/share/cloudstack-management/webapps/client/WEB- > INF/lib/cloud-framework-cluster-4.9.3.0.jar > > > Logs: > > WARN [c.c.c.d.ManagementServerHostDaoImpl] > (Cluster-Heartbeat-1:ctx-073cca55) (logid:e652d00b) Unexpected > exception, > com.cloud.utils.exception.CloudRuntimeException: Unable to commit or > close the connection. > at com.cloud.utils.db.TransactionLegacy.commit( > TransactionLegacy.java:740) > at com.cloud.cluster.dao.ManagementServerHostDaoImpl.update( > ManagementServerHostDaoImpl.java:140) > at sun.reflect.GeneratedMethodAccessor103.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.springframework.aop.support.AopUtils. > invokeJoinpointUsingReflection(AopUtils.java:317) > at org.springframework.aop.framework.ReflectiveMethodInvocation. > invokeJoinpoint(ReflectiveMethodInvocation.java:183) > at org.springframework.aop.framework.ReflectiveMethodInvocation. > proceed(ReflectiveMethodInvocation.java:150) > at com.cloud.utils.db.TransactionContextInterceptor.invoke( > TransactionContextInterceptor.java:34) > at org.springframework.aop.framework.ReflectiveMethodInvocation. > proceed(ReflectiveMethodInvocation.java:161) > at org.springframework.aop.interceptor. > ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91) > at org.springframework.aop.framework.ReflectiveMethodInvocation. > proceed(ReflectiveMethodInvocation.java:172) > at org.springframework.aop.framework.JdkDynamicAopProxy. > invoke(JdkDynamicAopProxy.java:204) > at com.sun.proxy.$Proxy203.update(Unknown Source) > at com.cloud.cluster.ClusterManagerImpl$4.runInContext( > ClusterManagerImpl.java:555) > at org.apache.cloudstack.managed.context. > ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) > at org.apache.cloudstack.managed.context.impl. > DefaultManagedContext$1.call(DefaultManagedContext.java:56) > at org.apache.cloudstack.managed.context.impl. > DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) > at org.apache.cloudstack.managed.context.impl. > DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) > at org.apache.cloudstack.managed.context. > ManagedContextRunnable.run(ManagedContextRunnable.java:46) > at java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:473) > at java.util.concurrent.FutureTask.runAndReset( > FutureTask.java:304) > at java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1152) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:748) > Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackExcept > ion: > Deadlock found when trying to get lock; try restarting transaction > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at sun.reflect.NativeConstructorAccessorImpl.newInstance( > NativeConstructorAccessorImpl.java:57) > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance( > DelegatingConstructorAccessorImpl.java:45) > ... 46 more > INFO [o.a.c.f.j.i.AsyncJobManagerImpl] > (AsyncJobMgr-Heartbeat-1:ctx-5ef0f4d1) (logid:4bfa48b2) Begin cleanup > expired async-jobs > INFO [o.a.c.f.j.i.AsyncJobManagerImpl] > (AsyncJobMgr-Heartbeat-1:ctx-5ef0f4d1) (logid:4bfa48b2) End cleanup > expired async-jobs > ERROR [c.c.u.d.ConnectionConcierge] > (ConnectionConcierge-1:ctx-d3460aeb) (logid:b8c62262) Unable to keep > the db connection for LockMaster1 > java.sql.SQLException: Connection was killed > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529) > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990) > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151) > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2625) > at com.mysql.jdbc.LoadBalancedMySQLConnection.execSQL( > LoadBalancedMySQLConnection.java:155) > at com.mysql.jdbc.PreparedStatement.executeInternal( > PreparedStatement.java:2119) > at com.mysql.jdbc.PreparedStatement.executeQuery( > PreparedStatement.java:2283) > at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at com.mysql.jdbc.LoadBalancingConnectionProxy$ > ConnectionErrorFiringInvocationHandler.invoke( > LoadBalancingConnectionProxy.java:103) > at com.mysql.jdbc.FailoverConnectionProxy$ > FailoverInvocationHandler.invoke(FailoverConnectionProxy.java:51) > at com.sun.proxy.$Proxy257.executeQuery(Unknown Source) > at org.apache.commons.dbcp.DelegatingPreparedStatement. > executeQuery(DelegatingPreparedStatement.java:96) > at org.apache.commons.dbcp.DelegatingPreparedStatement. > executeQuery(DelegatingPreparedStatement.java:96) > at com.cloud.utils.db.ConnectionConcierge$ > ConnectionConciergeManager.testValidity(ConnectionConcierge.java:148) > at com.cloud.utils.db.ConnectionConcierge$ > ConnectionConciergeManager$1.runInContext(ConnectionConcierge.java:203) > at org.apache.cloudstack.managed.context. > ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) > at org.apache.cloudstack.managed.context.impl. > DefaultManagedContext$1.call(DefaultManagedContext.java:56) > at org.apache.cloudstack.managed.context.impl. > DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) > at org.apache.cloudstack.managed.context.impl. > DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) > at org.apache.cloudstack.managed.context. > ManagedContextRunnable.run(ManagedContextRunnable.java:46) > at java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:473) > at java.util.concurrent.FutureTask.runAndReset( > FutureTask.java:304) > at java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1152) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:748) > INFO [o.a.c.f.j.i.AsyncJobManagerImpl] > (AsyncJobMgr-Heartbeat-1:ctx-22399f19) (logid:0deff5fe) Begin cleanup > expired async-jobs > INFO [o.a.c.f.j.i.AsyncJobManagerImpl] > (AsyncJobMgr-Heartbeat-1:ctx-22399f19) (logid:0deff5fe) End cleanup > expired async-jobs > INFO [o.a.c.f.j.i.AsyncJobManagerImpl] > (AsyncJobMgr-Heartbeat-1:ctx-6004a86b) (logid:64e4a3b3) Begin cleanup > expired async-jobs > INFO [o.a.c.f.j.i.AsyncJobManagerImpl] > (AsyncJobMgr-Heartbeat-1:ctx-6004a86b) (logid:64e4a3b3) End cleanup > expired async-jobs > INFO [c.c.h.v.m.HostMO] (DirectAgent-19:ctx-351cbbac > host01.cloud.local, cmd: GetVmStatsCommand) (logid:56c8e250) VM > i-2-124-VM not found in host cache > INFO [c.c.h.v.m.HostMO] (DirectAgent-1:ctx-9c0f5042 > host02.cloud.local, cmd: GetVmStatsCommand) (logid:56c8e250) VM > i-2-125-VM not found in host cache > ERROR [c.c.u.d.ConnectionConcierge] > (ConnectionConcierge-1:ctx-544154d7) (logid:bd0f585a) Unable to keep > the db connection for LockMaster1 > com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: > Communications link failure > > The last packet successfully received from the server was 19,479 > milliseconds ago. The last packet sent successfully to the server was > 19,479 milliseconds ago. > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at sun.reflect.NativeConstructorAccessorImpl.newInstance( > NativeConstructorAccessorImpl.java:57) > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance( > DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) > at com.mysql.jdbc.SQLError.createCommunicationsException( > SQLError.java:1116) > at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3352) > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1971) > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151) > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2625) > at com.mysql.jdbc.LoadBalancedMySQLConnection.execSQL( > LoadBalancedMySQLConnection.java:155) > at com.mysql.jdbc.PreparedStatement.executeInternal( > PreparedStatement.java:2119) > at com.mysql.jdbc.PreparedStatement.executeQuery( > PreparedStatement.java:2283) > at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at com.mysql.jdbc.LoadBalancingConnectionProxy$ > ConnectionErrorFiringInvocationHandler.invoke( > LoadBalancingConnectionProxy.java:103) > at com.mysql.jdbc.FailoverConnectionProxy$ > FailoverInvocationHandler.invoke(FailoverConnectionProxy.java:51) > at com.sun.proxy.$Proxy257.executeQuery(Unknown Source) > at org.apache.commons.dbcp.DelegatingPreparedStatement. > executeQuery(DelegatingPreparedStatement.java:96) > at org.apache.commons.dbcp.DelegatingPreparedStatement. > executeQuery(DelegatingPreparedStatement.java:96) > at com.cloud.utils.db.ConnectionConcierge$ > ConnectionConciergeManager.testValidity(ConnectionConcierge.java:148) > at com.cloud.utils.db.ConnectionConcierge$ > ConnectionConciergeManager$1.runInContext(ConnectionConcierge.java:203) > at org.apache.cloudstack.managed.context. > ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) > at org.apache.cloudstack.managed.context.impl. > DefaultManagedContext$1.call(DefaultManagedContext.java:56) > at org.apache.cloudstack.managed.context.impl. > DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) > at org.apache.cloudstack.managed.context.impl. > DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) > at org.apache.cloudstack.managed.context. > ManagedContextRunnable.run(ManagedContextRunnable.java:46) > at java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:473) > at java.util.concurrent.FutureTask.runAndReset( > FutureTask.java:304) > at java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1152) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at java.net.SocketOutputStream.socketWrite( > SocketOutputStream.java:115) > at java.net.SocketOutputStream.write(SocketOutputStream.java:161) > ... 36 more > > On Mon, Dec 18, 2017 at 2:03 PM, Rafael Weingärtner > <rafaelweingart...@gmail.com> wrote: > > Here is a fix: > > https://www.dropbox.com/s/kgakhs3v05uz88x/cloud- > framework-cluster-4.9.3.0.jar?dl=1 > > You need to replace this jar file in CloudStack installation. You should > > also backup the original jar and restore it as soon as you finish > testing. > > To replace the JARs, you need to stop ACS, and just then start it. > > > > If everything works fine, I will open a PR against master, and with a bit > > of luck we can push it into 4.11 > > > > On Sat, Dec 16, 2017 at 8:03 AM, Alireza Eskandari < > astro.alir...@gmail.com> > > wrote: > > > >> I'm using CS 4.9.3.0-shapeblue0 > >> > >> On Sat, Dec 16, 2017 at 12:49 PM, Rafael Weingärtner > >> <rafaelweingart...@gmail.com> wrote: > >> > Awesome! > >> > I found one method that might seem the cause of the problem. > >> > What is the version of ACS that you are using? > >> > > >> > On Sat, Dec 16, 2017 at 4:10 AM, Alireza Eskandari < > >> astro.alir...@gmail.com> > >> > wrote: > >> > > >> >> Hi > >> >> > >> >> Gabriel, > >> >> My configuration is same as your suggestion, but I get the errors. > >> >> > >> >> Rafael, > >> >> You are right. I confirm that CS works normally but I get those > >> warnings. > >> >> I would make me happy to help you for this fix :) > >> >> > >> >> > >> >> On Tue, Dec 12, 2017 at 3:30 PM, Rafael Weingärtner > >> >> <rafaelweingart...@gmail.com> wrote: > >> >> > Alireza, > >> >> > This is a warning and should not cause you much trouble. I have > been > >> >> trying > >> >> > to pin point this problem for quite some time now. > >> >> > If I generate a fix, would you be willing to test it? > >> >> > > >> >> > On Tue, Dec 12, 2017 at 8:56 AM, Gabriel Beims Bräscher < > >> >> > gabrasc...@gmail.com> wrote: > >> >> > > >> >> >> Hi Alireza, > >> >> >> > >> >> >> I have production environments with Master to Master replication > and > >> >> >> we have no problems. We may need more details of your > configuration. > >> >> >> Have you configured the slave database? Are you sure that you > >> configured > >> >> >> correctly the ha heuristic? > >> >> >> > >> >> >> Considering that you already configured replication and "my.cnf", > I > >> will > >> >> >> focus on the CloudSack db.properties file. > >> >> >> > >> >> >> When configuring Master-Master replication, you should have at > >> >> >> /etc/cloudstack/management/db.properties something like: > >> >> >> ----------------------------- > >> >> >> db.cloud.autoReconnectForPools=true > >> >> >> > >> >> >> #High Availability And Cluster Properties > >> >> >> db.ha.enabled=true > >> >> >> > >> >> >> db.cloud.queriesBeforeRetryMaster=5000 > >> >> >> db.usage.failOverReadOnly=false > >> >> >> db.cloud.slaves=acs-db-02 > >> >> >> > >> >> >> cluster.node.IP=<cluster node IP> > >> >> >> > >> >> >> db.usage.autoReconnect=true > >> >> >> > >> >> >> db.cloud.host=acs-db-01 > >> >> >> db.usage.host=acs-db-01 > >> >> >> > >> >> >> #db.ha.loadBalanceStrategy=com.mysql.jdbc. > SequentialBalanceStrategy > >> >> >> db.ha.loadBalanceStrategy=com.cloud.utils.db.StaticStrategy > >> >> >> > >> >> >> db.cloud.failOverReadOnly=false > >> >> >> db.usage.slaves=acs-db-02 > >> >> >> ----------------------------- > >> >> >> > >> >> >> "db.ha.loadBalanceStrategy" is confiugured with the heuristic > >> >> >> "com.cloud.utils.db.StaticStrategy" > >> >> >> > >> >> >> "db.ha.enabled" need to be “true” > >> >> >> > >> >> >> The primary database is configured with the variable > “db.cloud.host”. > >> >> The > >> >> >> secondary database(s) is(are) configured with the variable > >> >> >> “db.usage.slaves”. One variable that is different from both Apache > >> >> >> CloudStack servers is “cluster.node.IP”, being the ACS mgt IP. > >> >> >> Additionally, you will need to create a folder > >> >> >> “/usr/share/cloudstack-mysql-ha/lib/” and move the jar file > >> >> >> “cloud-plugin-database-mysqlha-4.9.3.0.jar” into the new folder. > >> >> >> > >> >> >> ----------------------------- > >> >> >> mkdir -p /usr/share/cloudstack-mysql-ha/lib/ > >> >> >> cp > >> >> >> /usr/share/cloudstack-management/webapps/client/WEB- > >> >> >> INF/lib/cloud-plugin-database-mysqlha-4.9.3.0.jar > >> >> >> /usr/share/cloudstack-mysql-ha/lib/ > >> >> >> ----------------------------- > >> >> >> > >> >> >> Cheers, > >> >> >> Gabriel. > >> >> >> > >> >> >> 2017-12-12 6:30 GMT-02:00 Alireza Eskandari < > astro.alir...@gmail.com > >> >: > >> >> >> > >> >> >> > I have opened a new jira ticket about this problem: > >> >> >> > https://issues.apache.org/jira/browse/CLOUDSTACK-10186 > >> >> >> > > >> >> >> > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Rafael Weingärtner > >> >> > >> > > >> > > >> > > >> > -- > >> > Rafael Weingärtner > >> > > > > > > > > -- > > Rafael Weingärtner > -- Rafael Weingärtner