Rohit, Completely agree with the scalability and maintenance issues with systems Vms. Using an agent inside VR will probably work well for KVM, since the KVM agent (Resource layer) is remote. Communication using link-local network also won't be a problem is such case. For other hypervisors, it will increase load on the mgmt server due to direct agents. Also, mgmt server cannot access VRs directly.
The options you mentioned are definitely worth exploring. Are you also looking at improving system Vm upgrade procedure? -----Original Message----- From: Rohit Yadav [mailto:rohit.ya...@shapeblue.com] Sent: Friday, February 20, 2015 1:24 PM To: dev; us...@cloudstack.apache.org Subject: [DISCUSS] Improving VR services such as password server Hi, I'm trying to explore how to make systemvms more robust and fault-tolerant, and the manual/automated QA of systemvms. One of the common user facing issues related to scalability was the reset password/key servers where the VR serves data using socat etc using forking mechanisms and global locks. This slows down the processes such as reset password. More here: https://issues.apache.org/jira/browse/CLOUDSTACK-8272 One of the blindly thrown solutions includes increasing the VR RAM which works for at scale but then seems to fail again when the load is increased beyond a point. I don't know of any performance and stress testing reports that tell us about these bottlenecks. Please share if you have done anything in this regard. I want to do couple of things: - Explore systemvm build changes using newer tools such as packer - Cleanup script execution and code in resource layer - Start replacing bash scripts with more robust implementations, perhaps a single or few agents on VRs that provide non-hardcoded well-documented interfaces - Right now everything in VR/systemvms is sort of hardcoded and the services/interfaces are not well-documented. The idea is to refactor and wrap everything we want to do with the systemvms in a general agents framework that provides monitoring and managing the VRs (do stuff like upgrades etc to combat things like ghost, poodle issues): https://cwiki.apache.org/confluence/display/CLOUDSTACK/Agents+Framework What are the other issues you've had in past that you would like to be improved? -- Regards, Rohit Yadav Software Architect, ShapeBlue M. +91 8826230892 | rohit.ya...@shapeblue.com Blog: bhaisaab.org | Twitter: @_bhaisaab PS. If you see any footer below, I did not add it :) Find out more about ShapeBlue and our range of CloudStack related services IaaS Cloud Design & Build<http://shapeblue.com/iaas-cloud-design-and-build//> CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/> CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/> CloudStack Software Engineering<http://shapeblue.com/cloudstack-software-engineering/> CloudStack Infrastructure Support<http://shapeblue.com/cloudstack-infrastructure-support/> CloudStack Bootcamp Training Courses<http://shapeblue.com/cloudstack-training/> This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Shape Blue Ltd or related companies. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue Services India LLP is a company incorporated in India and is operated under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The Republic of South Africa and is traded under license from Shape Blue Ltd. ShapeBlue is a registered trademark.