Yes to both of those, I should have mentioned I have tried to make sure 
connectivity is still good.  I'll try nulling out the mgmt_server_id in the 
host table and see if that works.

Thanks

On Feb 11, 2013, at 3:23 PM, Ahmad Emneina 
<aemne...@gmail.com<mailto:aemne...@gmail.com>>
 wrote:

from the management server, can you ssh to that host? can you execute xe 
commands on that host? if yes to both those, null out the mgmt_server_id from 
your host in the host table... then issue the force reconnect. see if that 
helps.


On Mon, Feb 11, 2013 at 2:17 PM, Caleb Call 
<cc...@overstock.com<mailto:cc...@overstock.com>> wrote:
We have a zone that has a single host in it.  We also recently updated to 4.0 
from 3.0.2 (this may not be relevant but figured I'd mention it anyways).  We 
put our host in maintenance mode (all VMs were shutdown, etc) and applied some 
patches that were waiting to be applied.  After coming back up, it now is 
unable to reconnect, when I try to force reconnect, I get the following in the 
management log:

2013-02-11 15:04:34,541 DEBUG [ehcache.store.MemoryStore] 
(catalina-exec-19:null) UserDaoCache: UserDaoMemoryStore hit for 10
2013-02-11 15:04:34,578 DEBUG [cloud.async.AsyncJobManagerImpl] 
(catalina-exec-19:null) submit async job-4806, details: AsyncJobVO {id:4806, 
userId: 10, accountId: 7, sessionKey: null, instanceType: Host, instanceId: 25, 
cmd: com.cloud.api.commands.ReconnectHostCmd, cmdOriginator: null, cmdInfo: 
{"id":"6bc87ba4-52d4-4477-a417-46886d03698d","response":"json","sessionkey":"6IzB5H0fVA9f9FgdWtbNG9GdB5E\u003d","ctxUserId":"10","_":"1360620274351","ctxAccountId":"7","ctxStartEventId":"15461"},
 cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0, 
processStatus: 0, resultCode: 0, result: null, initMsid: 145320940120008, 
completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
2013-02-11 15:04:34,579 DEBUG [cloud.async.AsyncJobManagerImpl] 
(Job-Executor-3:job-4806) Executing com.cloud.api.commands.ReconnectHostCmd for 
job-4806
2013-02-11 15:04:34,587 INFO  [agent.manager.AgentManagerImpl] 
(Job-Executor-3:job-4806) Unable to disconnect host because it is not connected 
to this server: 25
2013-02-11 15:04:34,587 WARN  [api.commands.ReconnectHostCmd] 
(Job-Executor-3:job-4806) Exception:
com.cloud.api.ServerApiException
        at 
com.cloud.api.commands.ReconnectHostCmd.execute(ReconnectHostCmd.java:108)
        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
        at 
com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:432)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
2013-02-11 15:04:34,587 WARN  [cloud.api.ApiDispatcher] 
(Job-Executor-3:job-4806) class com.cloud.api.ServerApiException : null
2013-02-11 15:04:34,588 DEBUG [cloud.async.AsyncJobManagerImpl] 
(Job-Executor-3:job-4806) Complete async job-4806, jobStatus: 2, resultCode: 
530, result: Error Code: 534 Error text: null
2013-02-11 15:04:39,624 DEBUG [ehcache.store.MemoryStore] 
(catalina-exec-17:null) UserDaoCache: UserDaoMemoryStore hit for 10
2013-02-11 15:04:39,635 DEBUG [cloud.async.AsyncJobManagerImpl] 
(catalina-exec-17:null) Async job-4806 completed


I can't find in the logs where it's trying (besides the force reconnect) to 
reconnect on it's own.  I do see where it acknowledges the state of Alert for 
the host, but doesn't give any reasoning as to why.

The only thing I can see any indication it's even trying is this line:

2013-02-11 11:47:05,670 DEBUG [xen.resource.XenServerConnectionPool] 
(ClusteredAgentManager Timer:null) Failed to slave local login to 10.5.1.14
2013-02-11 11:47:05,671 WARN  [cloud.resource.DiscovererBase] 
(ClusteredAgentManager Timer:null) Unable to configure resource due to Can not 
create slave connection to 10.5.1.14

10.5.1.14 is the host that should be reconnecting but is not.

Anything else I can look at as to why it's not connecting?  Any suggestions on 
why my host won't reconnect?

Thanks


________________________________

CONFIDENTIALITY NOTICE: This message is intended only for the use and review of 
the individual or entity to which it is addressed and may contain information 
that is privileged and confidential. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message solely to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify 
sender immediately by telephone or return email. Thank you.



________________________________

CONFIDENTIALITY NOTICE: This message is intended only for the use and review of 
the individual or entity to which it is addressed and may contain information 
that is privileged and confidential. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message solely to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify 
sender immediately by telephone or return email. Thank you.

Reply via email to