Hi, Thanks for your comments everyone.
Wido - I was going to share that little later :) I would also like to avoid Java daemon (like CPVM/SSVM). Alternatively to gain more throughput, we can implement a webservice (or with a Thrift interface so that we can have Java based mgmt server call this service using native bindings) written in Python or Go to keep a smaller process footprint and make it reliable. Earlier, in large deployments password server was seen as a bottleneck by us and the fix for this was to upgrade VR memory (to 4-12 GB RAM). If have a look at the new password server - it is multi-threaded (instead of fork/process based, so less memory consumption) and does not use file based locks (so operations are faster). After doing this work, I feel there is lot more to be done. VR to me seems to be one of the fragile pieces that needs to be test-able and robust. I was thinking to slowly and gradually move all the services we need to control to be wrapped in a client at mgmt server’s side which talks to the VR agent that is highly available (let’s say be controlled by say circus or supervisord), concurrent (tornado/twisted or go based), fast (connection pooling and multiplexing) and fault tolerant (command journaling or retrying, some kind of service/network state sync). We can even then run individual services inside docker (something Sebastien shared in the past), and if that’s possible replace the Debian base with CoreOS or something else (so it updates critical packages by itself such as openssl etc and systemvm template is more light weight), and possible more way to control/manage VR packages. > On 03-Apr-2015, at 9:39 pm, Suresh Sadhu <suresh.sa...@citrix.com> wrote: > > That’s true Somesh ,the recent VR aggregation functionality yielding better > VR performance in the customer location. i.e. VR upgrade time reduces from > hours to minute and it's there in ACS as > well(https://issues.apache.org/jira/browse/CLOUDSTACK-5779). Earlier without > this feature, MS is to ssh to router to execute each and every commands but > now MS will ssh to router only once and runs all the commands at once. > > Rohit : Agent on VR is good idea but I believe agent implementation is > heavy for router . we need to dig more. > > > Regards > Sadhu > > -----Original Message----- > From: Somesh Naidu [mailto:somesh.na...@citrix.com] > Sent: 03 April 2015 20:23 > To: dev@cloudstack.apache.org > Subject: RE: [DISCUSS] How to fix failing VR-mgmt server links > > It is true and I like the idea. I would just want to make sure the agent > footprint isn't too high. As opposed to CP/SS VM, we expect to be a lot more > VRs running in an environment. > > Also, the recent VR aggregation, I believe ACS 4.5 has it, did reduce a lot > of that barbaric stuff so we are still better than where we were earlier. > > Somesh > CloudPlatform Escalations > Citrix Systems, Inc. > > > -----Original Message----- > From: Rohit Yadav [mailto:rohit.ya...@shapeblue.com] > Sent: Friday, April 03, 2015 5:53 AM > To: dev > Subject: [DISCUSS] How to fix failing VR-mgmt server links > > Hi, > > In large environments, one of the issues of a VM deployment or a network rule > failing that I find commonly is that the mgmt server is unable to reach to > the VR via the host because of network lag or issues between the host and the > mgmt server. The sending operation on the link is tried about 5 times > (hardcoded) before it gives up and we see something like this in the logs: > "Unable to reach the peer that the agent is connected”. > > Should we add a global setting to allow sysadmins to configure the agent > link/socket (in various AgentAttaches in > engine/orchestration/src/com/cloud/agent/manager ?) timeout or please share > if something like this already exists or any other solution to this problem? > > The other issue I see is that since VRs don’t have an agent running in it, to > execute an operation mgmt server SSH-es into it to run scripts, for a high > load the number of open FDs (so also TCP ports) on a VR/systemvm are limited > which again can cause connections to fail/timeout due to high number of > requests VR is processing. A long term solution could be to implement an > agent (like ssvm/cpvm) that runs inside of the VR and talks to mgmt server > over multiplexed connection so we limit the number of connections from one > mgmt server and we can get rid of the SSH code and execution of barbaric > scripts. Comments, suggestions, flames? > > Regards, > Rohit Yadav > Software Architect, ShapeBlue > M. +91 88 262 30892 | rohit.ya...@shapeblue.com > Blog: bhaisaab.org | Twitter: @_bhaisaab > > Find out more about ShapeBlue and our range of CloudStack related services > > IaaS Cloud Design & Build<http://shapeblue.com/iaas-cloud-design-and-build//> > CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/> > CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/> > CloudStack Software > Engineering<http://shapeblue.com/cloudstack-software-engineering/> > CloudStack Infrastructure > Support<http://shapeblue.com/cloudstack-infrastructure-support/> > CloudStack Bootcamp Training > Courses<http://shapeblue.com/cloudstack-training/> > > This email and any attachments to it may be confidential and are intended > solely for the use of the individual to whom it is addressed. Any views or > opinions expressed are solely those of the author and do not necessarily > represent those of Shape Blue Ltd or related companies. If you are not the > intended recipient of this email, you must neither take any action based upon > its contents, nor copy or show it to anyone. Please contact the sender if you > believe you have received this email in error. Shape Blue Ltd is a company > incorporated in England & Wales. ShapeBlue Services India LLP is a company > incorporated in India and is operated under license from Shape Blue Ltd. > Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is > operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company > registered by The Republic of South Africa and is traded under license from > Shape Blue Ltd. ShapeBlue is a registered trademark. Regards, Rohit Yadav Software Architect, ShapeBlue M. +91 88 262 30892 | rohit.ya...@shapeblue.com Blog: bhaisaab.org | Twitter: @_bhaisaab Find out more about ShapeBlue and our range of CloudStack related services IaaS Cloud Design & Build<http://shapeblue.com/iaas-cloud-design-and-build//> CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/> CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/> CloudStack Software Engineering<http://shapeblue.com/cloudstack-software-engineering/> CloudStack Infrastructure Support<http://shapeblue.com/cloudstack-infrastructure-support/> CloudStack Bootcamp Training Courses<http://shapeblue.com/cloudstack-training/> This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Shape Blue Ltd or related companies. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue Services India LLP is a company incorporated in India and is operated under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The Republic of South Africa and is traded under license from Shape Blue Ltd. ShapeBlue is a registered trademark.