I noticed that since the agent is started at (short) regular
intervals, if something goes wrong and the previous run doesn't
complete within 5 minutes, the next run will overlap the earlier run.

Again, if the run doesn't finish in time, then I will have 3 runs...
then 4... etc.   processes pile up, the system starts to run slower...
  this happened today upon running "/etc/init.d/iptables restart" to
reload the firewall on CentOS 5.  it hangs sometimes unloading the
iptables kernel module.

What's the practical answer to this scenario? I'm sure other people
have run into this in production.

I can imagine:

1. Abort the CFEngine agent run if there is an earlier instance of it
already running.  (Do this via abortclasses)

2. Make a promise that cf-agent will kill earlier instances of
cf-agent.  I noticed cf-agent won't signal itself, haven't played
around if it will signal another instance of cf-agent.  even if it's
averse to shooting another cf-agent, we can kill it using an external
shell command.

3. Set a timeout on every commands type promise (doesn't really
address the scenario where a complete native cf-agent run takes 5
minutes and 01 seconds, so runs overlap, thus loading the host server)

Comments?

Best,
-at
_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to