On 06/07/12 18:58, Andrew Parker wrote: > > On Jul 6, 2012, at 2:07 AM, Brice Figureau wrote: > >> >>> By moving a couple of things around I think we can cache those >>> calls instead of needing to keep the cache of not found types. >>> We'll keep you posted on what we come up with. >> >> Thanks, I'm looking forward to that! > > Brice, maybe you can shed some light on this. One of your commits > uses Thread.current for storing another cache of missing data. The > thing that confuses us, though, is that the other side of that (the > cache for present data) is not stored in Thread.current.
Well, in my patches there are two places with negative caching. The one you pointed in a previous e-mail, which uses Thread.current caching (in P::MetaType::Manager). In this specific case it is not needed to cache the "loaded" case, because, once the type (native ruby type) is loaded it ends up in @types, so it is already cached. On the other case, the one in P::Resource::TypeCollection, P::Parser::TypeLoader#try_load_fqname will register the loaded puppet type (ie class or definition) into the P::Resource::TypeCollection (through calling import_ast). Once there, find_or_load will find it the next time it is needed. Now to answer why one side is cached in Thread.current and the other side not. It's simply that Puppet types are inherently global (and in the master we hope they're immutable), so I suppose they can safely be shared. You might be well wrong that @types in manager should be protected in some ways, because it's well possible to have a race condition. The good news is that when the master starts it loads a lot of the default types during initialization which certainly will make the race condition less frequent. But the main use of Thread.current (in my patch) is to be able to reset the negative cache before each compilation without having any impact on other threads that might be compiling at the exact same time (and also without having to add hooks in the type system to reset the cache). You'll note that I forgot in this patch to commit the reset part of the negative cache before the compilation takes place (as we're doing for the know_resource_types). > Throughout > the codebase Patrick and I have been looking at the use of > Thread.current and synchronization and haven't been able to figure > out the strategy for when to use threading constructs and when not > to. Can you maybe explain a bit about when multiple threads of > execution need to be taken into consideration and when they don't? Well, most current master deployements out there are mono-thread (ie Passenger doesn't use threads to my knowledge, but spawns processes). Mongrel on the other hand uses a thread per-request, but the yielding points of MRI green threads are so coarse that it never gave any issues (before we added the threading protection). On the other hand, it is well possible to run a puppet master on JRuby, which uses a native preemtible threading model. In this case there can be two threads compiling in parallel. This wouldn't be a problem if each thread would have its own full context. But the way the puppet code is laid out, we have a set of singletons containing the various environments. Those environments contain the known resource types that puppet at one time parsed and loaded. Unfortunately this environments can't be shared as is, because there can be race conditions when loading/parsing the manifests. Also, we don't want each thread to do the parsing phase for performance reasons, so sharing the known resource type is a good goal. So basically, when a thread starts compiling it fetches the known resource types in a lazy way and stores a reference to those in a thread local variable. Later on, while this 1st thread still compiles and uses the known resource types, if a new thread comes to compile and the manifests haven't change, it will use the same known resource types for compiling. But, if the manifests have changed, we can't reset the shared known resource types without giving some trouble to the 1st thread (which is still compiling). So this new thread will itself parse the modified manifests and get its own known resource types, and the 1st thread is still seeing the previous known resource types by vertue of the Thread local variable. When a new thread comes to compile it will see the latest and good known resource types references. This reference is what is stored in Thread.current[:known_resource_types]. So, the rule is: * if you need to share data, but isolate each thread so they have a coherent view, use a Thread local variable. * if you need to share data, but each thread need to see an actual view of the data, protect with synchronization. * prefer immutable data :) I'm not sure what I wrote will be helpful to you or not, let me know if you need more (accurate or different) information :) -- Brice Figureau My Blog: http://www.masterzen.fr/ -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To post to this group, send email to puppet-dev@googlegroups.com. To unsubscribe from this group, send email to puppet-dev+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en.