Re: Rebuilding management server

Leeno Jose.P.A Tue, 16 Jul 2013 09:02:59 -0700

CS startup logs,

2013-07-16 11:25:30,702 INFO  [utils.component.ComponentContext]
(Timer-1:null) Starting
com.cloud.network.guru.NiciraNvpGuestNetworkGuru_EnhancerByCloudStack_1f6b4bb6
2013-07-16 11:25:30,702 INFO  [utils.component.ComponentContext]
(Timer-1:null) Starting
com.cloud.server.ManagementServerImpl_EnhancerByCloudStack_d54e1bb1
2013-07-16 11:25:30,702 INFO  [cloud.server.ManagementServerImpl]
(Timer-1:null) Startup CloudStack management server...
2013-07-16 11:25:30,707 INFO
[cloud.cluster.ClusterServiceServletContainer] (Thread-18:null) Cluster
service servlet container listening on port 9090
2013-07-16 11:25:31,832 DEBUG [utils.db.ConnectionConcierge]
(Cluster-Heartbeat-1:null) Registering a database connection for
ClusterManagerHeartBeat2
2013-07-16 11:25:31,845 INFO  [cloud.cluster.ClusterManagerImpl]
(Cluster-Heartbeat-1:null) We are good, no orphan management server msid in
host table is found
2013-07-16 11:25:31,845 INFO  [cloud.cluster.ClusterManagerImpl]
(Cluster-Heartbeat-1:null) Found 1 inactive management server node based on
timestamp
2013-07-16 11:25:31,846 INFO  [cloud.cluster.ClusterManagerImpl]
(Cluster-Heartbeat-1:null) management server node msid: 130602634328, name:
cstagcms, service ip: 192.168.10.251, version: 4.1.0
2013-07-16 11:25:31,846 INFO  [cloud.cluster.ClusterManagerImpl]
(Cluster-Heartbeat-1:null) Trying to connect to 192.168.10.251
2013-07-16 11:25:31,860 DEBUG [cloud.cluster.ClusterManagerImpl]
(Cluster-Heartbeat-1:null) Detected management node joined, id:2,
nodeIP:192.168.10.251
2013-07-16 11:25:33,348 DEBUG [cloud.cluster.ClusterManagerImpl]
(Cluster-Notification-1:null) Notify management server node join to
listeners.
2013-07-16 11:25:33,349 DEBUG [cloud.cluster.ClusterManagerImpl]
(Cluster-Notification-1:null) Joining node, IP: 192.168.10.251, msid:
81375086018793
2013-07-16 11:25:33,350 DEBUG [cloud.alert.ClusterAlertAdapter]
(Cluster-Notification-1:null) Receive cluster alert, EventArgs:
com.cloud.cluster.ClusterNodeJoinEventArgs
2013-07-16 11:25:33,350 DEBUG [cloud.alert.ClusterAlertAdapter]
(Cluster-Notification-1:null) Handle cluster node join alert, joined node:
192.168.10.251, msidL: 81375086018793
2013-07-16 11:25:33,350 DEBUG [cloud.alert.ClusterAlertAdapter]
(Cluster-Notification-1:null) Management server node 192.168.10.251 is up,
send alert
2013-07-16 11:25:33,361 WARN  [cloud.cluster.ClusterManagerImpl]
(Cluster-Notification-1:null) Notifying management server join event took
12 ms
2013-07-16 11:25:45,450 DEBUG [cloud.server.StatsCollector]
(StatsCollector-2:null) HostStatsCollector is running...
2013-07-16 11:25:45,452 DEBUG [cloud.server.StatsCollector]
(StatsCollector-1:null) VmStatsCollector is running...
2013-07-16 11:25:45,467 DEBUG [cloud.server.StatsCollector]
(StatsCollector-3:null) StorageCollector is running...
2013-07-16 11:25:45,498 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(StatsCollector-2:null) create forwarding ClusteredAgentAttache for 39
2013-07-16 11:25:45,491 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(StatsCollector-3:null) create forwarding ClusteredAgentAttache for 50
2013-07-16 11:25:45,751 INFO  [agent.manager.ClusteredAgentManagerImpl]
(StatsCollector-3:null) SSL: Handshake done
2013-07-16 11:25:45,752 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(StatsCollector-3:null) Connection to peer opened: 130602634328, ip:
192.168.10.251
2013-07-16 11:25:45,757 DEBUG [agent.manager.ClusteredAgentAttache]
(StatsCollector-2:null) Seq 39-282525697: Forwarding null to 130602634328
2013-07-16 11:25:45,758 DEBUG [agent.manager.ClusteredAgentAttache]
(StatsCollector-3:null) Seq 50-1962541057: Forwarding null to 130602634328
2013-07-16 11:25:45,804 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-2:null) Seq 39-282525697: Routing from 81375086018793
2013-07-16 11:25:45,804 DEBUG [agent.manager.ClusteredAgentAttache]
(AgentManager-Handler-2:null) Seq 39-282525697: Link is closed
2013-07-16 11:25:45,806 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentManager-Handler-2:null) Seq 39-282525697: MgmtId 81375086018793: Req:
Resource [Host:39] is unreachable: Host 39: Link is closed



Thanks
Leeno


On Tue, Jul 16, 2013 at 6:10 PM, Leeno Jose.P.A <[email protected]> wrote:

> Hi Todd,
>
> Thanks for the help.
>
> I executed the steps as you mentioned above but that did not help. Still I
> get same error message. But I can do ping, telnet ports 22, 80 and 443 on
> XS hosts from CS.
>
> Thanks
> Leeno
>
>
> On Tue, Jul 16, 2013 at 5:12 PM, Todd Pigram <[email protected]> wrote:
>
>> Did you remove the Tags on each XenServer host prior to starting?
>>
>> Management Controller Failure and Replacement
>>
>> <https://cwiki.apache.org/confluence/pages/editpage.action?pageId=30755366>
>>  Edit 
>> Page<https://cwiki.apache.org/confluence/pages/editpage.action?pageId=30755366>
>>    
>> <https://cwiki.apache.org/confluence/pages/listpages.action?key=CLOUDSTACK>
>>  Browse 
>> Space<https://cwiki.apache.org/confluence/pages/listpages.action?key=CLOUDSTACK>
>>    
>> <https://cwiki.apache.org/confluence/pages/createpage.action?spaceKey=CLOUDSTACK&fromPageId=30755366>
>>  Add 
>> Page<https://cwiki.apache.org/confluence/pages/createpage.action?spaceKey=CLOUDSTACK&fromPageId=30755366>
>>    
>> <https://cwiki.apache.org/confluence/pages/createblogpost.action?spaceKey=CLOUDSTACK&fromPageId=30755366>
>>  Add 
>> News<https://cwiki.apache.org/confluence/pages/createblogpost.action?spaceKey=CLOUDSTACK&fromPageId=30755366>
>>
>> In setting up your cloud, you should have a backup routine for your
>> controller. The most important item to back up is the MySQL databases that
>> Cloudstack uses. A suitable backup script is attached to this page. In the
>> even of a cloud management controller failure, the steps to replace the
>> controller with a new one are:
>>
>> These instructions assume your cluster is Xenserver - Contributors using
>> other Hypervisor OSs, please contribute.
>>
>>    1. Setup new management server hardware
>>    2. Install your OS
>>    3. Install Cloudstack, up to and including the "Install Database step"
>>    4. Import your database backup
>>    5. In Xencenter, connect to your Cloudstack host pool.
>>    6. On each host, remove the tags on Host > General Tab > Tags by
>>    editing the tags and un-checking each one.
>>    7. On the management controller, start Cloudstack
>>       1. service cloud-management start
>>    8. the new cloud management controller will connect to each host in
>>    the database and push out new tags and keys to each host in the pool.
>>
>>
>> On Jul 16, 2013, at 1:13 AM, Leeno Jose.P.A <[email protected]> wrote:
>>
>> After restoring the old database dump to new installation. CS is unable to
>> contact Xenserver hosts. I getting following errors in
>> mamangement-server.log,
>>
>>
>> 2013-07-15 11:57:49,646 DEBUG [agent.manager.ClusteredAgentManagerImpl]
>> (StatsCollector-1:null) Connection to peer opened: 130602634328, ip:
>> 192.168.10.251
>> 2013-07-15 11:57:49,652 DEBUG [agent.manager.ClusteredAgentAttache]
>> (StatsCollector-2:null) Seq 50-185008129: Forwarding null to 130602634328
>> 2013-07-15 11:57:49,662 DEBUG [agent.manager.ClusteredAgentAttache]
>> (StatsCollector-1:null) Seq 39-1272840193: Forwarding null to 130602634328
>> 2013-07-15 11:57:49,699 DEBUG [agent.manager.ClusteredAgentAttache]
>> (AgentManager-Handler-2:null) Seq 50-185008129: Routing from
>> 81375086018793
>> 2013-07-15 11:57:49,699 DEBUG [agent.manager.ClusteredAgentAttache]
>> (AgentManager-Handler-2:null) Seq 50-185008129: Link is closed
>> 2013-07-15 11:57:49,699 DEBUG [agent.manager.ClusteredAgentAttache]
>> (AgentManager-Handler-3:null) Seq 39-1272840193: Routing from
>> 81375086018793
>> 2013-07-15 11:57:49,700 DEBUG [agent.manager.ClusteredAgentAttache]
>> (AgentManager-Handler-3:null) Seq 39-1272840193: Link is closed
>> 2013-07-15 11:57:49,700 DEBUG [agent.manager.ClusteredAgentManagerImpl]
>> (AgentManager-Handler-3:null) Seq 39-1272840193: MgmtId 81375086018793:
>> Req: Resource [Host:39] is unreachable: Host 39: Link is closed
>>
>>
>> 2013-07-15 11:57:49,861 DEBUG [agent.manager.ClusteredAgentManagerImpl]
>> (AgentManager-Handler-8:null) Seq 39--1: MgmtId 81375086018793: Req:
>> Cancel
>> request received
>> 2013-07-15 11:57:49,861 DEBUG [agent.manager.AgentAttache]
>> (AgentManager-Handler-8:null) Seq 39-1272840194: Cancelling.
>> 2013-07-15 11:57:49,861 DEBUG [agent.manager.AgentAttache]
>> (StatsCollector-2:null) Seq 39-1272840194: Waiting some more time because
>> this is the current command
>> 2013-07-15 11:57:49,862 DEBUG [agent.manager.AgentAttache]
>> (StatsCollector-2:null) Seq 39-1272840194: Waiting some more time because
>> this is the current command
>> 2013-07-15 11:57:49,862 INFO  [utils.exception.CSExceptionErrorCode]
>> (StatsCollector-2:null) Could not find exception:
>> com.cloud.exception.OperationTimedoutException in error code list for
>> exceptions
>> 2013-07-15 11:57:49,862 WARN  [agent.manager.AgentAttache]
>> (StatsCollector-2:null) Seq 39-1272840194: Timed out on null
>> 2013-07-15 11:57:49,862 DEBUG [agent.manager.AgentAttache]
>> (StatsCollector-2:null) Seq 39-1272840194: Cancelling.
>> 2013-07-15 11:57:49,863 DEBUG [cloud.storage.StorageManagerImpl]
>> (StatsCollector-2:null) Unable to send storage pool command to
>> Pool[210|NetworkFilesystem] via 39
>> com.cloud.exception.OperationTimedoutException: Commands 1272840194 to
>> Host
>> 39 timed out after 3600
>>        at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>>        at
>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:511)
>>        at
>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:464)
>>        at
>>
>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2347)
>>        at
>>
>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
>>        at
>>
>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
>>        at
>>
>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
>>        at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>        at
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>>        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>        at
>>
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
>>        at
>>
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
>>        at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>        at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:679)
>> 2013-07-15 11:57:49,863 INFO  [cloud.server.StatsCollector]
>> (StatsCollector-2:null) Unable to reach Pool[210|NetworkFilesystem]
>> com.cloud.exception.StorageUnavailableException: Resource
>> [StoragePool:210]
>> is unreachable: Unable to send command to the pool
>>        at
>>
>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:2357)
>>        at
>>
>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:422)
>>        at
>>
>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:436)
>>        at
>>
>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:316)
>>        at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>        at
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>>        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>        at
>>
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
>>        at
>>
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
>>        at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>        at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:679)
>>
>>
>> Thanks
>> Leeno
>>
>>
>> On Tue, Jul 16, 2013 at 10:21 AM, Leeno Jose.P.A <[email protected]>
>> wrote:
>>
>> This is a dev box. We are planning a HA enabled environment for prod
>> setup. Thanks Geoff.
>>
>>
>> On Tue, Jul 16, 2013 at 12:11 AM, Geoff Higginbottom <
>> [email protected]> wrote:
>>
>> Hi Leeno,
>>
>> It theory that should work, but obviously you will lose all changes made
>> since the dump was taken.  If any new VMs have been created, they will get
>> purged by the system etc.
>>
>> I would highly recommend splitting the DB and the Management Server, and
>> if possible add a 2nd instance of each.
>>
>> Regards
>>
>> Geoff Higginbottom
>>
>> D: +44 20 3603 0542 | S: +44 20 3603 0540 | M: +447968161581
>>
>> [email protected]
>>
>> -----Original Message-----
>> From: Leeno Jose.P.A [mailto:[email protected]]
>> Sent: 15 July 2013 18:46
>> To: [email protected]
>> Subject: Re: Rebuilding management server
>>
>> Hi Geoff,
>>
>> 1. I have only one management server.
>> 2. Management server is not functioning now but 'cloud' database dump is
>> available in backup. CS version was 4.1.0 Hosts were Xenserver 6.1.0 3. DB
>> server was on same machine where management server installed.
>>
>> Now I am planning to do a fresh install of CS 4.1.0 and restore cloud
>> database with old installation dump, which is available in backup. Will it
>> work?
>>
>> Thanks
>> Leeno
>>
>>
>> On Mon, Jul 15, 2013 at 9:56 PM, Geoff Higginbottom <
>> [email protected]> wrote:
>>
>> The Management Servers are 'Stateless' so as Chip points out, it's the
>> DB that stores all the info.
>>
>> How you actually go about it depends on your current setup.
>>
>> 1. How many management servers do you currently have?
>> 2. Are the original Management Server(s) still functioning, or are
>> they down?
>> 3. Is DB on a separate server, or the same as the Management Server?
>>
>> Regards
>>
>> Geoff Higginbottom
>>
>> D: +44 20 3603 0542 | S: +44 20 3603 0540 | M: +447968161581
>>
>> [email protected]
>>
>> -----Original Message-----
>> From: Chip Childers [mailto:[email protected]]
>> Sent: 15 July 2013 15:50
>> To: [email protected]
>> Subject: Re: Rebuilding management server
>>
>> On Mon, Jul 15, 2013 at 03:19:42PM +0530, Leeno Jose.P.A wrote:
>>
>> Hi Users,
>>
>> Has anyone tried to rebuild management server with Xenserver hosts?
>> If yes, could you please share experience?
>>
>>
>> --
>> Leeno Jose .P.A
>>
>>
>> I have not, but one of the most critical aspects of this is to ensure
>> that your database is retained.
>>
>> This email and any attachments to it may be confidential and are
>> intended solely for the use of the individual to whom it is addressed.
>> Any views or opinions expressed are solely those of the author and do
>> not necessarily represent those of Shape Blue Ltd or related
>> companies. If you are not the intended recipient of this email, you
>> must neither take any action based upon its contents, nor copy or show
>> it to anyone. Please contact the sender if you believe you have
>> received this email in error. Shape Blue Ltd is a company incorporated
>> in England & Wales. ShapeBlue Services India LLP is operated under
>> license from Shape Blue Ltd. ShapeBlue is a registered trademark.
>>
>>
>>
>>
>> --
>> Leeno Jose .P.A
>> This email and any attachments to it may be confidential and are intended
>> solely for the use of the individual to whom it is addressed. Any views or
>> opinions expressed are solely those of the author and do not necessarily
>> represent those of Shape Blue Ltd or related companies. If you are not the
>> intended recipient of this email, you must neither take any action based
>> upon its contents, nor copy or show it to anyone. Please contact the
>> sender
>> if you believe you have received this email in error. Shape Blue Ltd is a
>> company incorporated in England & Wales. ShapeBlue Services India LLP is
>> operated under license from Shape Blue Ltd. ShapeBlue is a registered
>> trademark.
>>
>>
>>
>>
>> --
>> Leeno Jose .P.A
>>
>>
>>
>>
>> --
>> Leeno Jose .P.A
>>
>>
>>
>>
>>
>>
>> Todd Pigram
>> [email protected]
>>
>>
>>
>
>
> --
> Leeno Jose .P.A
>



-- 
Leeno Jose .P.A

Re: Rebuilding management server

Reply via email to