Dennis, Essentially it sounds like you're describing exactly what Amazon Launch Configurations (LC) and Auto Scaling Groups (ASG) are meant to do with the launching of new instances based on triggered events. You either can use the CloudWatch metrics as triggers or your own processes that trigger an event through an SNS/SQS queue to increase or decrease the number of nodes running. Which could pretty much covers all of this with some caveats for C.
The trick, or sticky part to use your phrasing, is the certificates and the dynamic hostnames used within Amazon. You can work around this by utilizing an autosigning script similar to what I have in my environment [1]. To use it requires utilizing CSR attributes [2]. This allows you to automatically sign the CSR that comes in for valid instances. The next trick is to be able to designate how this host is supposed to be configured. For this I look for Facts that I can use within my Hiera configuration on the Master side of things and then use server roles. It's not hard to be able to drop custom facts on instances initialization and than be able to use those on the Agent run to determine this hosts purpose and role in life. Everything I need to do this in Amazon is made very easy with the use of cloud-init to be able to setup the required configuration as user-data that can be included in the LC for the ASG. You just need to customize the user-data for each LC for different server roles, the ASG then handles the scaling of the cluster of nodes using the assigned LC. You don't have the instances just sitting around if the load doesn't justify it, but when the triggered events occur it can go up or down as defined by your policy in the ASG. I believe this accomplishes the end state goal of what you're looking for and is very possible. 1. https://github.com/UGNS/standard-modules/blob/production/scripts/autosigner.rb 2. https://docs.puppetlabs.com/puppet/latest/reference/ssl_attributes_extensions.html On 10/10/2014 09:10 PM, Dennis Gearon wrote: > I want to be able to scale up different types of nodes both quickly and > moderately quickly, AUTOMATICALLY WITH NO USER INPUT, with Puppet doing > the configureation. > > My idea so far is: > > A/ There are several nodes of each type sleeping, enough per type to > last 15 minutes of normal to abnormally high peak loads. > > B/ There is a base Amazon image that I create for each node type, with > Puppet on it stored, in case the load continues or business does :-) > > C/ The system detects that the average load is increasing, or the short > term load is getting excessive. A new instance is made from the correct > type of instance, AND HERE:S the STICKY PART, the launching of the > istance includes installing a unique cert on the instance for puppet, > and also storing that cert/key in the puppet master (puppetmasters). > Whomever needs to of the agent or puppetmaster contacts first and the > new instance is under puppet control. > > D/ Whatever algorithm is needed to wait till things have calmed down, or > immediately do an update using a catalog is to be determined (TBD). > > E/ When the load is gone, the newer instance goes to sleep with the > older instances. > > F/ If the load average stays down, or to update the OS on older > instances, the older instances are woken up one at a time, their > contents switched to a new node (or they are just retired if there is a > lot of reduncy), the node is destroyed, a new node is created from the > correct Amazon image, it is updated using it's catalog, and then any > contents needed to function are pushed to it, and then it is put to sleep. > > How does this sound? Is it possible? Also, I read somewhere that > updating the OS of a puppet node requires COMPLETELY REMOVING PUPPET, > EVERYTHING, doing the update, then putting puppet back on. Is this > really true? Any automated way to do this? >
signature.asc
Description: OpenPGP digital signature