Hey Ron
I have the following in a promise. You may want to adjust the values of
"ago" there. I execute cf-agent once an hour. If I see a cf-agent
process older than 2 hours old, I have the current execution kill that
process and I raise a class that I report on. You could use this same
thing to kill cf-exced.
Hope this helps. Really, you should look at the verbose output of
cf-agent -v to determine what the agent is hanging on. I've usually found
this to be something like a stale NFS mount point.
Cheers
Mike
processes:
linux|sunos_5_10::
"cf-agent"
handle => "verify_cf_agent_doesnt_pile_up",
process_select => cfagent_cruft,
signals => {"kill"},
classes => if_repaired("cfagent_haywire");
###########################################################################
#########
body process_select cfagent_cruft
{
command => ".*cf-agent$";
# argments for the ago function
# arg1 : Years, in the range 0,1000
# arg2 : Months, in the range 0,1000
# arg3 : Days, in the range 0,1000
# arg4 : Hours, in the range 0,1000
# arg5 : Minutes, in the range 0,1000
# arg6 : Seconds, in the range 0,40000
# Kill any cf-agent process thats been lingering around, but stop from -2
hours ago so we dont kill our current execution.
stime_range => irange(0,ago(0,0,0,2,0,0));
process_result => "command.stime";
}
On 6/21/12 11:01 AM, "Ron Parker" <[email protected]> wrote:
>In testing that the promises for a given machine were sufficient to
>reconfigure it from scratch, I did a fresh OS install to a VM,
>bootstrapped CFE and manually started cf-agent. After about 30
>minutes and the fourth email from the system, it had converged. I
>noted a few things that were missing, e.g. ssh configuration. So I
>made the policy changes over the course of the day.
>
>Last night before leaving I rolled the VM back to the baseline OS
>install and bootstrapped CFE to test my changes. After about 10
>minutes I got an email reporting what happened during initial run but
>no reports thereafter. This morning I get in and find the machine has
>182 (and climbing) copies of cf-execd and cf-agent. Other than my
>seemingly minor tweaks the only difference I am aware of is that the
>second time I did not run cf-agent manually at all, I let cf-execd
>start the processes.
>
>I suspect it is somehow related to initial package installation and
>there is a possibly related discussion on the list from last year
>https://cfengine.com/forum/read.php?3,19505,19598, but I saw no clear
>resolution.
>
>My questions are, how can I see what the active copy of cf-agent is
>doing that it did not complete? It does not have any child processes
>showing up in pstree.
>
>The second question is how may I prevent this in the future but still
>have the system converge in a reasonable amount of time?
>
>--
>Ron Parker
>Don't type things you find on the Internet into your computer!
>:(){ :|:&};:
>_______________________________________________
>Help-cfengine mailing list
>[email protected]
>https://cfengine.org/mailman/listinfo/help-cfengine
_______________________________________________
Help-cfengine mailing list
[email protected]
https://cfengine.org/mailman/listinfo/help-cfengine