I noticed that since the agent is started at (short) regular intervals, if something goes wrong and the previous run doesn't complete within 5 minutes, the next run will overlap the earlier run.
Again, if the run doesn't finish in time, then I will have 3 runs... then 4... etc. processes pile up, the system starts to run slower... this happened today upon running "/etc/init.d/iptables restart" to reload the firewall on CentOS 5. it hangs sometimes unloading the iptables kernel module. What's the practical answer to this scenario? I'm sure other people have run into this in production. I can imagine: 1. Abort the CFEngine agent run if there is an earlier instance of it already running. (Do this via abortclasses) 2. Make a promise that cf-agent will kill earlier instances of cf-agent. I noticed cf-agent won't signal itself, haven't played around if it will signal another instance of cf-agent. even if it's averse to shooting another cf-agent, we can kill it using an external shell command. 3. Set a timeout on every commands type promise (doesn't really address the scenario where a complete native cf-agent run takes 5 minutes and 01 seconds, so runs overlap, thus loading the host server) Comments? Best, -at _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine