Re: [Puppet-dev] Various compiler performance patches

Brice Figureau Fri, 06 Jul 2012 11:07:13 -0700

On 06/07/12 18:58, Andrew Parker wrote:
> 
> On Jul 6, 2012, at 2:07 AM, Brice Figureau wrote:
> 
>> 
>>> By moving a couple of things around I think we can cache those
>>> calls instead of needing to keep the cache of not found types.
>>> We'll keep you posted on what we come up with.
>> 
>> Thanks, I'm looking forward to that!
> 
> Brice, maybe you can shed some light on this. One of your commits
> uses Thread.current for storing another cache of missing data. The
> thing that confuses us, though, is that the other side of that (the
> cache for present data) is not stored in Thread.current.


Well, in my patches there are two places with negative caching.
The one you pointed in a previous e-mail, which uses Thread.current
caching (in P::MetaType::Manager). In this specific case it is not
needed to cache the "loaded" case, because, once the type (native ruby
type) is loaded it ends up in @types, so it is already cached.

On the other case, the one in P::Resource::TypeCollection,
P::Parser::TypeLoader#try_load_fqname will register the loaded puppet
type (ie class or definition) into the P::Resource::TypeCollection
(through calling import_ast). Once there, find_or_load will find it the
next time it is needed.

Now to answer why one side is cached in Thread.current and the other
side not. It's simply that Puppet types are inherently global (and in
the master we hope they're immutable), so I suppose they can safely be
shared. You might be well wrong that @types in manager should be
protected in some ways, because it's well possible to have a race
condition. The good news is that when the master starts it loads a lot
of the default types during initialization which certainly will make the
race condition less frequent.

But the main use of Thread.current (in my patch) is to be able to reset
the negative cache before each compilation without having any impact on
other threads that might be compiling at the exact same time (and also
without having to add hooks in the type system to reset the cache).

You'll note that I forgot in this patch to commit the reset part of the
negative cache before the compilation takes place (as we're doing for
the know_resource_types).

> Throughout
> the codebase Patrick and I have been looking at the use of
> Thread.current and synchronization and haven't been able to figure
> out the strategy for when to use threading constructs and when not
> to. Can you maybe explain a bit about when multiple threads of
> execution need to be taken into consideration and when they don't?

Well, most current master deployements out there are mono-thread (ie
Passenger doesn't use threads to my knowledge, but spawns processes).
Mongrel on the other hand uses a thread per-request, but the yielding
points of MRI green threads are so coarse that it never gave any issues
(before we added the threading protection).
On the other hand, it is well possible to run a puppet master on JRuby,
which uses a native preemtible threading model. In this case there can
be two threads compiling in parallel.

This wouldn't be a problem if each thread would have its own full
context. But the way the puppet code is laid out, we have a set of
singletons containing the various environments. Those environments
contain the known resource types that puppet at one time parsed and
loaded. Unfortunately this environments can't be shared as is, because
there can be race conditions when loading/parsing the manifests.
Also, we don't want each thread to do the parsing phase for performance
reasons, so sharing the known resource type is a good goal.
So basically, when a thread starts compiling it fetches the known
resource types in a lazy way and stores a reference to those in a thread
local variable. Later on, while this 1st thread still compiles and uses
the known resource types, if a new thread comes to compile and the
manifests haven't change, it will use the same known resource types for
compiling. But, if the manifests have changed, we can't reset the shared
known resource types without giving some trouble to the 1st thread
(which is still compiling). So this new thread will itself parse the
modified manifests and get its own known resource types, and the 1st
thread is still seeing the previous known resource types by vertue of
the Thread local variable. When a new thread comes to compile it will
see the latest and good known resource types references.
This reference is what is stored in Thread.current[:known_resource_types].

So, the rule is:
* if you need to share data, but isolate each thread so they have a
coherent view, use a Thread local variable.

* if you need to share data, but each thread need to see an actual view
of the data, protect with synchronization.

* prefer immutable data :)

I'm not sure what I wrote will be helpful to you or not, let me know if
you need more (accurate or different) information :)
-- 
Brice Figureau
My Blog: http://www.masterzen.fr/


-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to puppet-dev@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-dev+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Re: [Puppet-dev] Various compiler performance patches

Reply via email to