Re: [Puppet-dev] Re: The future of known_resource_types and loading puppet manifests

Erik Dalén Wed, 28 Aug 2013 09:46:47 -0700

On 28 August 2013 14:05, Henrik Lindberg <[email protected]>wrote:


> On 2013-26-08 19:49, Andy Parker wrote:
>
>> Adrien put a lot of effort into tracking down what was happening in
>> #15106 (Missing site.pp can cause error 'Cannot find definition Class').
>> That exact issue, as described in that bug, has been fixed, but in the
>> investigation Adrien figured out that there are a lot of other problems
>> that can crop up 
>> (https://projects.puppetlabs.**com/issues/15106#note-13<https://projects.puppetlabs.com/issues/15106#note-13>
>> ).
>>
>> Basically it comes down to the way puppet tracks what is loaded, what
>> can be loaded, and when things need to be reloaded. When compiling a
>> catalog from manifests, the autoloader (for puppet types, not for ruby
>> code) will be invoked at various times to parse the .pp files that it
>> thinks should contain the types that are needed. At the same time it
>> caches what it has already parsed in a Puppet::Resource::**
>> TypeCollection,
>> which throughout the code is known as known_resource_types. There are
>> also a few cases where the TypeCollection will be cleared, even part way
>> through a compile, that causes it to start reloading things.
>>
>> Charlie Sharpsteen, Adrien, and I talked about this around a week ago,
>> before puppetconf and came to the conclusion that the current method of
>> autoloading puppet manifests and tracking known types is just untenable.
>> There are multiple points in the code where it loses track of the
>> environment that it is working with, trying to pass that information
>> through (I tried it a few days ago) ends up uncovering more issues.
>>
>> The conclusion that we came to was that the current lazy-loading of
>> puppet manifests needs to go away. Lazy loading makes all of the
>> information to correctly load types at the right time and from the right
>> place very difficult to keep track of (not intrinsically so, but in our
>> current state).
>>
>> I think the system needs to change to eager loading of manifests (not
>> applying them all, but at least loading them all). For the development
>> case, this makes things maybe a little more expensive, but it should
>> make the stable, production case for manifests much faster, because it
>> will rarely, if ever need to look at the filesystem to find a type.
>>
>> Now the problem is that if we start going down this path, it becomes a
>> large change to the underlying architecture of the compiler.. It will be
>>
>> unnoticeable  to most users from a manifest standpoint (unless somehow
>> they were able to rely on certain manifests never being loaded), however
>> we may need to make changes that will break code at the ruby level
>> (maybe the testing wrappers, maybe types and providers, probably some
>> functions).
>>
>> I think something this large should be an ARM, but I wanted to put this
>> out here to get some feedback before working up an ARM. Maybe we are
>> missing something and we can salvage this without a larger change, but
>> at the moment I'm skeptical.
>>
>>
> I have read this, and the comments made to date, and it is somewhat
> difficult to understand exactly what someone means as we use language that
> is fuzzy (at least to me).
>
> Here is an attempt to define the terms (after that I have a proposal).
>
> Parsing
> -------
> The part of the process that goes "from source text to AST model".
>
> Validation
> ----------
> Checks/asserts the validity of the AST model.
>
> Loading
> -------
> Resolves symbolic name to something that can be evaluated. (i.e. AST model
> or Ruby, or whatever we may invent in the future). As an example this binds
> the name of a hostclass to the block of code that is the class' body.
>
> Linking
> -------
> Resolving name to object references. This is not done in puppet as a
> separate static step, it is done while evaluating.
>
> Evaluation
> ----------
> Evaluates the loaded logic (i.e. visits AST nodes and performs operations
> or calls Ruby).
>
> Compilation
> -----------
> The act of loading a given start point and evaluating it (and its
> transitive dependencies) for the purpose of compiling a catalog.
>
> Deferred Evaluation
> -------------------
> We have deferred evaluation of language constructs that define classes
> (and custom resource types? I have to check) - or rather, when evaluated
> they only define the mapping of symbolic name to code to evaluate on demand
> (either a singleton evaluation (class), or a potentially multiple times
> (resource).
>
> (In puppet a hostclass is not evaluated, instead there is a search for
> instantiable objects, these are transitively instantiated on "loading".
> Later it's "code" (body) is evaluated).
>
> (In contrast the term "lazy loading" throws me; what is it that is lazy?
> The parsing, the binding of name to code, or the evaluation of the bound
> code?).
>
> Proposal
> ========
> To me, the problem we are discussing is that "autoloading" performs
> evaluation of an unlinked model. The result therefore depends on the
> transitive dependency graph of resolved links. We cache the result and then
> try to figure out what needs to be invalidated based on a changed file.
>
> At the other extreme, if we cache nothing, manifests are processed from
> scratch for every request, we have a potential long startup.
>
> A simple solution is to cache the validated parse result. This is a simple
> mapping from source URI (e.g. a file path) to an AST. This is always a 1:1
> mapping - the source and the AST are two different representations of
> exactly the same thing.  Then when we evaluate, we always evaluate
> everything. There is one special case, when none of the files have changed
> there is an opportunity to avoid recomputing the catalog, but it assumes
> that no external data has changed. (There are several different ways to
> deal with such optimizations including asking something external "have
> something changed" to using "valid until" information in the external touch
> points).
>

If this also includes data from functions (they are evaluated at the
evaluate step, right?) I think it would be very difficult to know if they
will give different values without just running them again. So only caching
the stuff after the validate step seems good to me.

I'd probably rather have the autoloader than any caching at all if there is
a conflict between the two.


>
> Yet another problem is a change in files "mid-transaction". We could solve
> that by performing a scan of the system, noting all potential URIs
> affecting the result and their "expiration-timestamp" (no parsing takes
> place). If we during the evaluation finds a change in timestamp we fail the
> transaction (or restart it (backing off in time and having a cap on retries
> if we want to be fancy)).
>
> I use the term "URI affecting the result" to mean a reference to a .pp
> source file, data-bindings in some form, or an external service (proving
> say ENC data/bindings)), or similar.
>
> I think the above is a combination of "autoloading" and "load everything
> up front".
>
> I would like to get rid of "import" because it is path based, not because
> it "imports" (loads code). I.e. I think we should have a loader that
> resolves symbolic names to URIs and loads evaluatable content.
> This loader should be able to search for what to "run" without having to
> resort to explicit "run this path" - if not then there is IMO something
> missing in the language itself. I can live with the entry point being a
> file (e.g. site.pp), or possibly a set of files if users for some reason
> want to split a site.pp into multiple files.
>
> - henrik
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to 
> puppet-dev+unsubscribe@**googlegroups.com<puppet-dev%[email protected]>
> .
> To post to this group, send email to [email protected].
> Visit this group at 
> http://groups.google.com/**group/puppet-dev<http://groups.google.com/group/puppet-dev>
> .
> For more options, visit 
> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
> .
>



-- 
Erik Dalén

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/puppet-dev.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Puppet-dev] Re: The future of known_resource_types and loading puppet manifests

Reply via email to