Hi Likitha, You are absolutely spot on! This is the error message I found: INFO [c.c.c.ClusterManagerImpl] (main:null) Management server 90520734207775 is being started WARN [c.c.u.d.T.Transaction] (main:null) txn: Commit called when it is not a transaction: -Transaction.execute:38-Transaction.execute:46-ClusterManagerImpl.start:940-CloudStackExtendedLifeCycle$1.with:75-CloudStackExtendedLifeCycle.with:153-CloudStackExtendedLifeCycle.startBeans:72-CloudStackExtendedLifeCycleStart.run:46-DefaultModuleDefinitionSet$1.with:105-DefaultModuleDefinitionSet.withModule:245-DefaultModuleDefinitionSet.withModule:250-DefaultModuleDefinitionSet.withModule:250-DefaultModuleDefinitionSet.withModule:233 INFO [c.c.c.ClusterManagerImpl] (main:null) Management server (host id : 1) is being started at 127.0.0.1:9090
Cheers, Funs > -----Original Message----- > From: Likitha Shetty [mailto:likitha.she...@citrix.com] > Sent: Thursday, July 10, 2014 5:35 PM > To: dev@cloudstack.apache.org > Subject: RE: "Bug" introduced by part of commit > a600d8408ea86782318139c17cf346c8497943d0 > > Are you guys are referring to the update method that is throwing the error, > 'update(long id, long runid, Date lastUpdate) ?. Agreed that this is just a > symptom. > > But I was referring to the first update method in the file which is > 'update(long id, long runid, String name, String version, String serviceIP, > int > servicePort, Date lastUpdate)'. > From what I understand this is the method that updates the MS runId when it > is started. And the problem is that the transaction commit made in this > update is getting rolled back, which is why we have a wrong runId in the DB > for the MS. Since the second update method is based on the right runId it > throws an exception. > And I suspect the original transaction is getting rolled back because we are > trying to close a transaction that is already closed. > > Thanks, > Likitha > > -----Original Message----- > From: Funs Kessen [mailto:fkes...@schubergphilis.com] > Sent: Thursday, July 10, 2014 5:10 PM > To: dev@cloudstack.apache.org > Subject: RE: "Bug" introduced by part of commit > a600d8408ea86782318139c17cf346c8497943d0 > > Hi Santhosh, > > You're right indeed, it's the update statement in the update itself that adds > the runid in the where clause, it came in three years and three months ago by > the looks of it. > > Cheers, > > Funs > > > -----Original Message----- > > From: Santhosh Edukulla [mailto:santhosh.eduku...@citrix.com] > > Sent: Thursday, July 10, 2014 1:22 PM > > To: dev@cloudstack.apache.org > > Subject: RE: "Bug" introduced by part of commit > > a600d8408ea86782318139c17cf346c8497943d0 > > > > The mentioned issue by Funs was existing earlier as well but was > > hidden by just logging it when update was failed, now with new change, > > we resurfaced the problem by throwing it. Its not because of close > > though. We can see to fix the original problem. > > > > Santhosh > > ________________________________________ > > From: Likitha Shetty [likitha.she...@citrix.com] > > Sent: Thursday, July 10, 2014 7:18 AM > > To: dev@cloudstack.apache.org > > Subject: RE: "Bug" introduced by part of commit > > a600d8408ea86782318139c17cf346c8497943d0 > > > > Santhosh, I pulled the latest code and unfortunately the issue still exists. > > > > Looks like the problem is because we try to close the transaction > > after it has been committed (which implicitly closes the transaction) > > - > /framework/cluster/src/com/cloud/cluster/dao/ManagementServerHostDao > > Impl.java > > txn.commit(); > > } catch (RuntimeException e) { > > txn.rollback(); > > s_logger.warn("Unexpected exception, ", e); > > throw new RuntimeException(e.getMessage(), e); > > }finally { > > txn.close(); > > } > > > > Thanks, > > Likitha > > > > -----Original Message----- > > From: Funs Kessen [mailto:fkes...@schubergphilis.com] > > Sent: Thursday, July 10, 2014 2:38 PM > > To: dev@cloudstack.apache.org > > Subject: RE: "Bug" introduced by part of commit > > a600d8408ea86782318139c17cf346c8497943d0 > > > > Yeah I see that Ipasted the wrong bit of code *doh*, it was update > > that was the problem from the same file: > > > > @@ -129,24 +133,29 @@ > > @DB > > public void update(long id, long runid, Date lastUpdate) { > > TransactionLegacy txn = TransactionLegacy.currentTxn(); > > - PreparedStatement pstmt = null; > > try { > > txn.start(); > > + try( PreparedStatement pstmt = > > + txn.prepareStatement("update mshost > > set last_update=?, removed=null, alert_count=0 where id=? and > > runid=?");) { > > + pstmt.setString(1, > > DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), > > lastUpdate)); > > + pstmt.setLong(2, id); > > + pstmt.setLong(3, runid); > > > > - pstmt = txn.prepareAutoCloseStatement("update mshost set > > last_update=?, removed=null, alert_count=0 where id=? and runid=?"); > > - pstmt.setString(1, > > DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), lastUpdate)); > > - pstmt.setLong(2, id); > > - pstmt.setLong(3, runid); > > + int count = pstmt.executeUpdate(); > > + txn.commit(); > > > > - int count = pstmt.executeUpdate(); > > - txn.commit(); > > - > > - if (count < 1) { > > - throw new CloudRuntimeException("Invalid cluster session > > detected", new ClusterInvalidSessionException("runid " + runid + " is > > no longer valid")); > > + if (count < 1) { > > + throw new CloudRuntimeException("Invalid cluster > > + session > > detected", new ClusterInvalidSessionException("runid " + runid + " is > > no longer valid")); > > + } > > + }catch (SQLException e) { > > + throw new > > + CloudRuntimeException("update:Exception:"+e.getMessage(), e); > > } > > - } catch (Exception e) { > > - s_logger.warn("Unexpected exception, ", e); > > - throw new RuntimeException(e.getMessage(), e); > > + } catch (RuntimeException e) { > > + txn.rollback(); > > + s_logger.warn("update:Exception:"+e.getMessage(), e); > > + throw new RuntimeException("update:Exception:"+e.getMessage(), > e); > > + } > > + finally { > > + txn.close(); > > } > > } > > > > > > > -----Original Message----- > > > From: Trippie [mailto:trip...@gmail.com] On Behalf Of Hugo Trippaers > > > Sent: Thursday, July 10, 2014 11:00 AM > > > To: dev@cloudstack.apache.org > > > Subject: Re: "Bug" introduced by part of commit > > > a600d8408ea86782318139c17cf346c8497943d0 > > > > > > Fixed already, good job Santosh > > > > > > Cheers, > > > > > > Hugo > > > > > > On 10 jul. 2014, at 10:58, Hugo Trippaers <trip...@gmail.com> wrote: > > > > > > > Doesn't seem related to that particular commit, the function > > > invalidateRunSession appears to be dead code. > > > > > > > > I'll do some digging. > > > > > > > > > > > > Cheers, > > > > > > > > Hugo > > > > > > > > On 10 jul. 2014, at 10:40, Funs Kessen > > > > <fkes...@schubergphilis.com> > > > wrote: > > > > > > > >> Hi Devs, > > > >> > > > >> I recently noticed on master that after starting, stopping and > > > >> starting the > > > management server again I get stacktraces about every second. > > > >> After some digging with Daan we found that part of the > > > a600d8408ea86782318139c17cf346c84979943d0 commit causes this. > > The > > > problem is however deeper rooted, as the code is supposed to update > > > the mshost table, but because another Session ID (actually _runId > > > from ClusterManagerImpl.java from the previous run) is already in > > > there the update fails. The Session ID is based on time in > > > milliseconds, and offcourse changes when you stop and start the > management server again. > > > >> Prior to the commit it failed silently, and has done so since the > > > >> initial > > > checkin it seems. The real question is what the original idea is > > > behind the _runId, and is it something that requires fixing ? > > > >> > > > >> I've pasted part of the commit, sql and stacktrace below. > > > >> > > > >> Cheers, > > > >> > > > >> Funs > > > >> > > > >> === > > > >> mysql> select * from mshost; > > > >> +----+----------------+---------------+---------+-------+----------------+------------+------ > -- > > -- > > > ----+---------------------+---------+-------------+ > > > >> | id | msid | runid | name | state | version > > > >> | service_ip > | > > > service_port | last_update | removed | alert_count | > > > >> +----+----------------+---------------+---------+-------+----------------+------------+------ > -- > > -- > > > ----+---------------------+---------+-------------+ > > > >> | 1 | 90520734207775 | 1404924979461 | cs-mgmt | Up | 4.5.0- > > > SNAPSHOT | 127.0.0.1 | 9090 | 2014-07-09 16:58:07 | NULL | > 0 > > | > > > >> +----+----------------+---------------+---------+-------+----------------+------------+------ > -- > > -- > > > ----+---------------------+---------+-------------+ > > > >> 1 row in set (0.00 sec) > > > >> > > > >> === > > > >> commit a600d8408ea86782318139c17cf346c8497943d0 > > > >> Author: Santhosh Edukulla <santhosh.eduku...@gmail.com> 2014-07- > > 02 > > > >> 10:38:16 > > > >> Committer: Santhosh Edukulla <santhosh.eduku...@gmail.com> 2014- > > 07- > > > 04 > > > >> 12:47:58 > > > >> > > > >> Fixed Resource Leaks, null dereferences, few other issues > > > >> reported by coverity > > > >> > > > >> - > > > >> > > > > > > framework/cluster/src/com/cloud/cluster/dao/ManagementServerHostDaoI > > > m > > > >> pl.java diff --git > > > >> > > > > > > a/framework/cluster/src/com/cloud/cluster/dao/ManagementServerHostDa > > > o > > > >> Impl.java > > > >> > > > > > > b/framework/cluster/src/com/cloud/cluster/dao/ManagementServerHostDa > > > o > > > >> Impl.java > > > >> index 3d0c3f5..89d7d27 100644 > > > >> --- > > > >> > > > > > > a/framework/cluster/src/com/cloud/cluster/dao/ManagementServerHostDa > > > o > > > >> Impl.java > > > >> +++ > > > b/framework/cluster/src/com/cloud/cluster/dao/ManagementServerHos > > > >> +++ tDaoImpl.java > > > >> @@ -53,15 +53,14 @@ > > > >> @Override > > > >> public void invalidateRunSession(long id, long runid) { > > > >> TransactionLegacy txn = TransactionLegacy.currentTxn(); > > > >> - PreparedStatement pstmt = null; > > > >> - try { > > > >> - pstmt = txn.prepareAutoCloseStatement("update mshost set > > > runid=0, state='Down' where id=? and runid=?"); > > > >> - pstmt.setLong(1, id); > > > >> - pstmt.setLong(2, runid); > > > >> - > > > >> - pstmt.executeUpdate(); > > > >> + try (PreparedStatement pstmt = > > > >> + txn.prepareStatement("update > > > mshost set runid=0, state='Down' where id=? and runid=?");){ > > > >> + if(pstmt != null) { > > > >> + pstmt.setLong(1, id); > > > >> + pstmt.setLong(2, runid); > > > >> + pstmt.executeUpdate(); > > > >> + } > > > >> } catch (SQLException e) { > > > >> - throw new CloudRuntimeException("DB exception on " + > > > pstmt.toString(), e); > > > >> + throw new > > > >> + CloudRuntimeException("invalidateRunSession:Exception:"+ > > > >> + e.getMessage(), e); > > > >> } > > > >> } > > > >> > > > >> === > > > >> 2014-07-09 18:45:06,541 WARN > > [c.c.c.d.ManagementServerHostDaoImpl] > > > >> (Cluster-Heartbeat-1:ctx-5f2f8ad5) update:Exception:Invalid > > > >> cluster session detected > > > >> com.cloud.utils.exception.CloudRuntimeException: Invalid cluster > > > >> session > > > detected > > > >> at > > > > > > com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(Management > > > ServerHostDaoImpl.java:147) > > > >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > >> at > > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j > > > a > > > va:57) > > > >> at > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess > > > orImpl.java:43) > > > >> at java.lang.reflect.Method.invoke(Method.java:606) > > > >> at > > > org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflect > > > io > > > n( > > > AopUtils.java:317) > > > >> at > > > org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJ > > > oi > > > n > > > point(ReflectiveMethodInvocation.java:183) > > > >> at > > > > > > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R > > > eflectiveMethodInvocation.java:150) > > > >> at > > > com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionC > > > on > > > tex > > > tInterceptor.java:34) > > > >> at > > > > > > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R > > > eflectiveMethodInvocation.java:161) > > > >> at > > > org.springframework.aop.interceptor.ExposeInvocationInterceptor.invo > > > ke > > > (Ex > > > poseInvocationInterceptor.java:91) > > > >> at > > > > > > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R > > > eflectiveMethodInvocation.java:172) > > > >> at > > > > > > org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynam > > > icAopProxy.java:204) > > > >> at com.sun.proxy.$Proxy158.update(Unknown Source) > > > >> at > > > > > > com.cloud.cluster.ClusterManagerImpl$4.runInContext(ClusterManagerImpl > > > .java:545) > > > >> at > > > > > > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(M > > > anagedContextRunnable.java:49) > > > >> at > > > > > > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.cal > > > l > > > (DefaultManagedContext.java:56) > > > >> at > > > > > > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callW > > > ithContext(DefaultManagedContext.java:103) > > > >> at > > > > > > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWi > > > thContext(DefaultManagedContext.java:53) > > > >> at > > > > > > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Man > > > agedContextRunnable.java:46) > > > >> at > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4 > > > 71 > > > ) > > > >> at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > > > >> at > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask > > > .a > > > c > > > cess$301(ScheduledThreadPoolExecutor.java:178) > > > >> at > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask > > > .r > > > u > > > n(ScheduledThreadPoolExecutor.java:293) > > > >> at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor > > > .j > > > av > > > a:1145) > > > >> at > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > > > ja > > > va:615) > > > >> at java.lang.Thread.run(Thread.java:744) > > > >> Caused by: com.cloud.cluster.ClusterInvalidSessionException: > > > >> runid > > > 1404924278923 is no longer valid > > > >> at > > > > > > com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(Management > > > ServerHostDaoImpl.java:147) > > > >> ... 26 more > > > >> 2014-07-09 18:45:06,543 ERROR [c.c.c.ClusterManagerImpl] > > > >> (Cluster-Heartbeat-1:ctx-5f2f8ad5) Unexpected exception in > > > >> cluster heartbeat > > > >> java.lang.RuntimeException: update:Exception:Invalid cluster > > > >> session > > > detected > > > >> at > > > > > > com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(Management > > > ServerHostDaoImpl.java:155) > > > >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > >> at > > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j > > > a > > > va:57) > > > >> at > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess > > > orImpl.java:43) > > > >> at java.lang.reflect.Method.invoke(Method.java:606) > > > >> at > > > org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflect > > > io > > > n( > > > AopUtils.java:317) > > > >> at > > > org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJ > > > oi > > > n > > > point(ReflectiveMethodInvocation.java:183) > > > >> at > > > > > > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R > > > eflectiveMethodInvocation.java:150) > > > >> at > > > com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionC > > > on > > > tex > > > tInterceptor.java:34) > > > >> at > > > > > > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R > > > eflectiveMethodInvocation.java:161) > > > >> at > > > org.springframework.aop.interceptor.ExposeInvocationInterceptor.invo > > > ke > > > (Ex > > > poseInvocationInterceptor.java:91) > > > >> at > > > > > > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R > > > eflectiveMethodInvocation.java:172) > > > >> at > > > > > > org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynam > > > icAopProxy.java:204) > > > >> at com.sun.proxy.$Proxy158.update(Unknown Source) > > > >> at > > > > > > com.cloud.cluster.ClusterManagerImpl$4.runInContext(ClusterManagerImpl > > > .java:545) > > > >> at > > > > > > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(M > > > anagedContextRunnable.java:49) > > > >> at > > > > > > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.cal > > > l > > > (DefaultManagedContext.java:56) > > > >> at > > > > > > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callW > > > ithContext(DefaultManagedContext.java:103) > > > >> at > > > > > > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWi > > > thContext(DefaultManagedContext.java:53) > > > >> at > > > > > > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Man > > > agedContextRunnable.java:46) > > > >> at > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4 > > > 71 > > > ) > > > >> at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > > > >> at > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask > > > .a > > > c > > > cess$301(ScheduledThreadPoolExecutor.java:178) > > > >> at > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask > > > .r > > > u > > > n(ScheduledThreadPoolExecutor.java:293) > > > >> at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor > > > .j > > > av > > > a:1145) > > > >> at > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > > > ja > > > va:615) > > > >> at java.lang.Thread.run(Thread.java:744) > > > >> Caused by: com.cloud.utils.exception.CloudRuntimeException: > > > >> Invalid > > > cluster session detected > > > >> at > > > > > > com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(Management > > > ServerHostDaoImpl.java:147) > > > >> ... 26 more > > > >> Caused by: com.cloud.cluster.ClusterInvalidSessionException: > > > >> runid > > > 1404924278923 is no longer valid > > > >> at > > > > > > com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(Management > > > ServerHostDaoImpl.java:147) > > > >> ... 26 more > > > >> > > > >> > > > >> > > > >