On Monday, June 15, 2015 at 9:12:03 PM UTC-5, Franck wrote:
>
> We've been experiencing a lot of "Command exceeded timeouts" on basic 
> shell commands using the "exec" type for tasks that should execute fairly 
> fast: 
>
> Jun 15 15:45:44 host1 puppet-agent[57648]: 
> (/Stage[main]/Timezone::Utc/Exec[/bin/rm -f /etc/localtime && /bin/ln -s 
> /usr/share/zoneinfo/UTC /etc/localtime]) Command exceeded timeout
> Jun 10 21:15:24 host1 puppet-agent[57081]: 
> (/Stage[main]/Open-vm-tools::Package/Exec[/usr/bin/
> vmware-uninstall-tools.pl]/onlyif) Check "/usr/bin/test -f /usr/bin/
> vmware-uninstall-tools.pl" exceeded timeout
> Jun 10 23:56:02 host1 puppet-agent[40286]: 
> (/Stage[main]/Open-vm-tools::Package/Exec[/usr/bin/yum install -y 
> open-vm-tools.x86_64]/unless) Check "/bin/rpm -q open-vm-tools" exceeded 
> timeout
>
> All these commands can be run locally to the host and return fairly 
> quickly, but when puppet executes them they time out.
>


Very strange.

 

> Extending the timeout is an option but ridiculous since default is 300 
> seconds and none of these commands should take 5 minutes or more to return. 
>  
>


No, probably not a viable option.  If these particular commands are not 
completing within the standard timeout, then there's no particular reason 
to think that they would *ever* complete, no matter what timeout you set.
 

>
> Some of the things observed is that this only affects CentOS 6.x hosts as 
> we also have Ubuntu 14.x hosts and they do not experience these problems. 
>  Also, we've played around with different versions of the puppet agent 
> along with different versions of Ruby and none of them had any effect as 
> this condition persists regardless.  Also, this does not seem to affect all 
> of our CentOS 6.x hosts but only certain ones -- randomly.
>


There is surely some pattern to which machines are affected and which not.  
Discovering that pattern would be a big step in solving the problem.

 

>  Running puppet agent in debug mode does not seem to uncover what's going 
> on as it just hangs when it gets to the "exec".    
>
>

You could try running Puppet under strace to get a low-level view of 
exactly what Puppet gets stuck on.  Nevertheless, if the problem sticks to 
particular computers across different Puppet versions and different Ruby 
versions, then the root of the problem must be outside Puppet itself.

You could compare the lists of installed packages between an affected 
machine and a non-affected one.  Perhaps the problem is caused by a 
specific package or package version.

You should compare the catalogs applied to the machines that suffer from 
this problem with those for the machines that are not affected.  It may 
help to narrow down the problem if you find that it is associated with a 
small number of specific resources.

You should check how Puppet is running on affected machines vs. 
non-affected ones.  Is it running as a privileged user?  The same one?

 

> It's very annoying and actually dangerous in some cases as the puppet 
> agent will continue spawning multiple "applying configuration" processes 
> which will cause hosts to swap memory as each takes up more and more memory 
> and in some instances will hose them entirely.  
>


Have you actually observed that behavior?  If so, then something is 
dreadfully wrong.  Puppet should never start a new catalog run when one is 
already underway.  It has safeguards in place to prevent that.  If you have 
stumbled across a way in which those can be circumvented, then I'm sure 
PuppetLabs would appreciate a bug report.

 

> We've had to remove these manifests that cause these conditions in the 
> interim but right now we have a lot of hosts we need to manage with puppet 
> so we need to be able to use this.
>
> Basic info on the hosts in question:
>
>    - Puppet: 3.7.5
>    - Ruby: 2.1.2
>    - CentOS 6.6
>
> Anyone have any ideas as to what could be causing this?
>
>

You haven't given us much to work with, and I, at least, have never before 
heard of such an issue.  I do not know what is causing it, but I suggest 
you try narrowing down the configuration being applied to one of the 
affected nodes to find a minimal set that is sufficient to trigger the 
issue.  For example, if you apply only class Timezone::Utc, is that 
sufficient to cause puppet to exhibit the problem?  Please provide the 
actual manifests involved.


John

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-users/2b617318-fe37-44d7-8461-b36ceadccde9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to