Hi,

On 07/07/2013 08:45 AM, Marcus Sorensen wrote:
I see that my db.properties has db.cloud.autoReconnect=true, which
translates to setting autoReconnect in the jdbc driver connection in
utils/src/com/cloud/utils/db/Transaction.java. I also see that if I
manually trigger the issue I get:


Just to confirm, I see the same issues. I haven't looked into this yet, but this is also one of the things I want to have fixed.

Maybe create an issue for it?

Wido

013-07-07 00:42:50,502 ERROR [cloud.cluster.ClusterManagerImpl]
(Cluster-Heartbeat-1:null) Runtime DB exception
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
Communications link failure

The last packet successfully received from the server was 1,503
milliseconds ago.  The last packet sent successfully to the server was
0 milliseconds ago.
at sun.reflect.GeneratedConstructorAccessor159.newInstance(Unknown Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3567)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2318)
at 
org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
at 
org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
at 
com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:409)
at 
com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
at 
com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:350)
at 
com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
at 
com.cloud.utils.db.GenericDaoBase.listIncludingRemovedBy(GenericDaoBase.java:907)
at 
com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
at 
com.cloud.utils.db.GenericDaoBase.listIncludingRemovedBy(GenericDaoBase.java:912)
at 
com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
at 
com.cloud.cluster.dao.ManagementServerHostDaoImpl.getActiveList(ManagementServerHostDaoImpl.java:158)
at 
com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
at com.cloud.cluster.ClusterManagerImpl.peerScan(ClusterManagerImpl.java:1057)
at com.cloud.cluster.ClusterManagerImpl.access$1200(ClusterManagerImpl.java:95)
at com.cloud.cluster.ClusterManagerImpl$4.run(ClusterManagerImpl.java:789)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.io.EOFException: Can not read response from server.
Expected to read 4 bytes, read 0 bytes before connection was
unexpectedly lost.
... 55 more
2013-07-07 00:42:50,505 ERROR [cloud.cluster.ClusterManagerImpl]
(Cluster-Heartbeat-1:null) DB communication problem detected, fence it

And I have only to restart cloudstack-management so it can connect to
another member in the loadbalanced multimaster database to get things
running again.


On Sun, Jul 7, 2013 at 12:35 AM, Marcus Sorensen <shadow...@gmail.com> wrote:
I've noticed that the cloudstack management server creates persistent
connections to the database, and crashes if the database connection is
lost. I haven't looked at the code yet, but I was wondering if anyone
knew about what was going on here, if it's simply not set up to
gracefully handle reconnect, or something else.  We have a
multi-master database setup, but cloudstack doesn't take advantage of
it since it doesn't attempt graceful reconnect, if the particular node
it connected to on startup goes down, it simply crashes.

Reply via email to