On Wed, 2011-01-26 at 14:36 -0800, Daniel Pittman wrote: > On Wed, Jan 26, 2011 at 13:56, Jason Wright <jwri...@google.com> wrote: > > On Wed, Jan 26, 2011 at 1:17 PM, Daniel Pittman <dan...@puppetlabs.com> > > wrote: > > > >> For what it is worth I have been looking at this quietly in the > >> background, and come to the conclusion that to progress further I am > >> going to have to either reproduce this myself (failed, so far), or get > >> a bit of state instrumentation into that code to track down exactly > >> what conditions are being hit to trigger the failure. > > > > I haven't been able to reproduce it either. So far, I've tried > > annexing a bunch of machines and running puppetd in a tight loop > > against an otherwise idle puppetmaster VM and I can get the rate of > > API calls and catalog compiles up to the correct level for one of our > > busy VMs, but no 500s (or even 400s) so far. If this fails, I have > > some code which fetches pluginsync metadata and then proceeeds to make > > fileserver calls for every .rb listed. I'll start using that generate > > traffic, since these are the sorts of operations which get the most > > errors. > > > >> Sounds like a good next step might be for y'all to let me know when > >> you might look at being able to do that instrumentation, and I can try > >> and send you a satisfactory patch to trial? > > > > What instrumentation would you be looking for? > > Specifically, around the "not mounted" fault, in the 'splitpath' > method, identify what the value of 'mount' in the outer 'unless' is, > and what @mounts and mount_name contain. My hope would be to use that > to narrow down the possible causes, and either confirm or eliminate a > thread race or something.
There are some thread races in this codepath: * we currently know that all cached_attrs (and splitpath uses one through the module accessor of the environment) are subject to a thread race in 0.25. * there is another one when reading fileserver.conf (in readconfig). But since normally passenger should make sure there is only one thread in a given running puppet process we should be immune. > I doubt that will be the complete data set, but it should help move > forward. Annoyingly, I don't have a super-solid picture of what the > problem is at this stage, because it looks like it shouldn't be > possible to hit the situation but, clearly, it is getting there... Yes, so we're certainly missing something, and instrumenting this codepath will help understand the root cause. -- Brice Figureau Follow the latest Puppet Community evolutions on www.planetpuppet.org! -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.