On Friday, August 29, 2014 9:59:12 AM UTC-5, pmorel wrote:
>
> Hello,
>
> Recently 3 puppet agents instances (on 3 different servers yet with 
> quasi-similar configurations) started running at 100% on one CPU and thus 
> they cannot listen to the master anymore.
>
> Nothing appears to be wrong in the logs : no fail or execution problems.
>
> Starting the agent like so : puppet agent --debug --verbose -t, I can see 
> that puppet runs fine until some command that executes "chown ..." and then 
> no more debug/log. Personally I don't think the problem comes from 
> "chown"...
>


Such a symptom *could* arise from the chown under some circumstances.

 

>
> With strace, I can see that the puppet agent reads the file 
> /var/lib/ruby/1.8/puppet.rb, closes it and then reopens it and recloses 
> again in infinite loop (causes the 100% CPU).
>
>

That does seem strange.  Is there any chance that your agents are 
corrupted?  For instance, could you have Puppet installs both from a 
package and from a gem on the same machines?

 

> I've tried the solution of executing date -s "`date`" (solution that I've 
> seen in threads with similar problem", also I reinstalled puppet (with 
> purge) and ruby but no change.
>
>

If there is some kind of conflict, such as native package vs. gem, then 
reinstalling just Puppet might not be sufficient to fix it.  You might need 
to go as far as completely purging Ruby itself, too, along with everything 
that depends on it.  But before you try that, read on....

 

> Also I'm running ruby1.8 and puppet2.7.22 but I don't want to update those 
> packages.
>
> The servers are running Ubuntu 12.04.4 LTS
>
>

If you suspect that the problem is with a resource or resource combination 
rather than with Puppet itself, then you should consider how you might 
identify the problematic resource.  If you keep your manifests and data in 
a VCS repository (highly recommended) then you could try reverting manifest 
and/or data changes affecting the servers in question.  In particular, you 
could try reverting to the last configuration that you know to have worked, 
and then step forward from there.

You could also just start trimming classes and/or resources from these 
nodes until the misbehavior goes away, to help you identify the resource(s) 
with which it is associated.  Perhaps your strace gives you a good starting 
point for that.

Note, too, that in most cases, unless you make explicit provisions to the 
contrary, removing a resource from your nodes' catalogs simply leaves that 
resource unmanaged -- it does not (normally) cause that resource to be 
removed from the node itself.


John

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/38eede5b-c4d0-48d3-bcd0-8bf2819b2f2a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to