On Tue, Dec 19, 2006 at 06:23:16AM -0700, Clint Pachl wrote: > >A pull-only system assumes that the clients actually pull. What if > >they don't? How do you know when their last successful pull was? > > If you implement a "push" system, how do you know if something was > actually pushed? What if something was pushed, how do you know the > "pushee" did the right thing with what it was given? This argument goes > both ways, but solved simply. A system should report what it does after > it pushes or pulls. The other end should also report. So if the results > show someone is pushing, but no one is pulling or visa-versa, you have a > problem. This system could be implemented using mail or central syslog. > > A good argument for "pull" systems: > http://www.infrastructures.org/bootstrap/pushpull.shtml > > What do others think about push vs pull management systems? What tools > are you using to implement your push/pull management system?
An orthogonal issue, which I don't think has been explicitly mentioned so far, is whether you make config changes on the central repository (and replicate them out to the target), or locally on the target system (and replicate them back to the central repository) >From infrastructures.org: We have developed a rule that works very well in practice and saves us a lot of heartache: "Never log into a machine to change anything on it. Always make the change on the gold server and let the change propagate out." That makes a lot of sense. But enforcing that policy might be difficult. This is important if you're relying on your gold server for disaster recovery purposes - if the target machines had some change made which nobody remembers and weren't reflected in the gold server, then any freshly-built machines will be non-functional. You could have some Tripwire-like system to monitor periodically for unauthorised changes, so you can slap the wrist of anyone who breaks the policy - and more importantly, bring the central repository back into sync with what was done. Or you could block root logins entirely, but then you need to carefully select a list of sudo actions which are needed for (e.g.) restarting daemons and diagnosing and correcting common problems. The alternative is that changes are allowed to be made on target machines, and then later checked into a central repository as a record after the fact. This makes it harder to make identical changes to a large number of machines in a cluster. It's also possible again to get out of sync between the real machine and the repository, if the procedures are not properly followed. A similar issue occurs with init scripts, interface configuration, and starting and stopping daemons. On many occasions I have come across problems where a box has been running perfectly for 2 years, but when it was rebooted for some reason, it stopped working. It turned out this was because someone made a manual change, such as starting some daemon perhaps with particular command-line flags, or changing filewall rules, but when the box rebooted it did not come up the same way at startup. Since the original change may have been made a long time ago by someone who has long-since left, you can end up with emergency situations which are difficult to fix quickly. This problem seems to be more difficult to solve. Ideally there would be a single interface through which you performed any sysadmin action, such as configuring an interface or starting a daemon, which kept a persistent record of this and performed the same action at startup. That would mean, for example, being forbidden to use 'ifconfig' directly, but being allowed to change /etc/hostname.* and run an rc script to apply the changes. This is more difficult with rc.conf: you would need a supervisor script which noticed (say) that run_foo="NO" had changed to run_foo="YES", or vice versa, and performed the appropriate actions. It might actually be easier if using something like daemontools, which has separate control files for each daemon. I've never seen a centralised management system which works directly in this way, but I'd love to have one. Finally, a similar problem occurs when deciding how to do configuration management of, say, Cisco routers. However, your hand is forced a bit more there: you generally can't just push a new config out to each box, because to make the changes active you'd need to reboot it (a Cisco doesn't have the ability to take a diff between its current active state and a target state, and perform only the changes necessary to bring it up to that state) So often you end up having to make changes directly on the target device line by line, and then tftp'ing the updated configs back to a central repository. That is, the central repository is not the place where changes are made, but just a record of changes which were made. Again, you can get into problems with procedures not being followed and the repository coming out of sync with reality. Regards, Brian.