Hey Ron

I have the following in a promise.  You may want to adjust the values of
"ago" there.  I execute cf-agent once an hour.  If I  see a cf-agent
process older than 2 hours old, I have the current execution kill that
process and I raise a class that I report on.  You could use this same
thing to kill cf-exced.

Hope this helps.  Really, you should look at the verbose output of
cf-agent -v to determine what the agent is hanging on.  I've usually found
this to be something like a stale NFS mount point.

Cheers
Mike



processes:
 linux|sunos_5_10::
  "cf-agent"
   handle  => "verify_cf_agent_doesnt_pile_up",
   process_select => cfagent_cruft,
   signals  => {"kill"},
   classes  => if_repaired("cfagent_haywire");



###########################################################################
#########
body process_select cfagent_cruft
{
 command   => ".*cf-agent$";
 # argments for the ago function
 # arg1 : Years, in the range 0,1000
 # arg2 : Months, in the range 0,1000
 # arg3 : Days, in the range 0,1000
 # arg4 : Hours, in the range 0,1000
 # arg5 : Minutes, in the range 0,1000
 # arg6 : Seconds, in the range 0,40000
 # Kill any cf-agent process thats been lingering around, but stop from -2
hours ago so we dont kill our current execution.
      stime_range  => irange(0,ago(0,0,0,2,0,0));
 process_result  => "command.stime";
}





On 6/21/12 11:01 AM, "Ron Parker" <rdpar...@gmail.com> wrote:

>In testing that the promises for a given machine were sufficient to
>reconfigure it from scratch, I did a fresh OS install to a VM,
>bootstrapped CFE and manually started cf-agent.  After about 30
>minutes and the fourth email from the system, it had converged. I
>noted a few things that were missing, e.g. ssh configuration. So I
>made the policy changes over the course of the day.
>
>Last night before leaving I rolled the VM back to the baseline OS
>install and bootstrapped CFE to test my changes. After about 10
>minutes I got an email reporting what happened during initial run but
>no reports thereafter. This morning I get in and find the machine has
>182 (and climbing) copies of cf-execd and cf-agent. Other than my
>seemingly minor tweaks the only difference I am aware of is that the
>second time I did not run cf-agent manually at all, I let cf-execd
>start the processes.
>
>I suspect it is somehow related to initial package installation and
>there is a possibly related discussion on the list from last year
>https://cfengine.com/forum/read.php?3,19505,19598, but I saw no clear
>resolution.
>
>My questions are, how can I see what the active copy of cf-agent is
>doing that it did not complete? It does not have any child processes
>showing up in pstree.
>
>The second question is how may I prevent this in the future but still
>have the system converge in a reasonable amount of time?
>
>--
>Ron Parker
>Don't type things you find on the Internet into your computer!
>:(){ :|:&};:
>_______________________________________________
>Help-cfengine mailing list
>Help-cfengine@cfengine.org
>https://cfengine.org/mailman/listinfo/help-cfengine

_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to