[Puppet-dev] Re: Ideas for Batch Processing of Packages

Henrik Lindberg Sun, 15 Sep 2013 20:57:09 -0700

On 2013-16-09 5:41, Luke Kanies wrote:

Hi Henrik,


I know we have some users who just batch all package installs up front.  It'd 
be interesting to see if that was a feasible solution.  it would by pass the 
graph entirely, which I'm sure could have problems, but it would, at least, be 
easy to build and understand.  Would that suffice for a sufficient number of 
cases?

Well, it naturally misses the optimization opportunity and is obviouslydifficult to maintain for users since they then have to compose the setof packages manually without the help of the graph / catalog.

The explicit batching being discusses would allow users to partially dothis - there seems to be use cases where it is really important to beable to do this (when a package manager must be given a certaincombination of packages/versions to do the right thing).

I think we should be able to come up with an optimization that works inthe general case. The next step is to write a simple utility to getmetrics from real large scale system deployments to better understandthe value of the opportunity.


- henrik

On Sep 13, 2013, at 9:52 AM, Henrik Lindberg <[email protected]> 
wrote:

Hi,
Ideas regarding a potential performance boost that can be gained by performing 
batch processing of package installs/operations has been floating around in the 
Puppet echo system for quite some time.

There is a discussion (and a somewhat dated implementation/proposal) in 
http://projects.puppetlabs.com/issues/2198 which is good background reading for 
this topic.

In issue #2198 (if you skipped reading it ;-)), the idea is that Puppet should 
have the feature to install a list of packages given by the user.

It seems doable to generalize this idea and let puppet automatically optimize 
package installs under certain conditions. Performing individual package 
installs is quite expensive and even if the optimization opportunities may not 
be extensive (e.g. say that 20% (number completely made up) of packages could 
at least be paired with one other package) this is still a worth while activity.

To kick this off, we need to do some research and design. So, here is an 
attempt to get this started by asking a bunch of questions.

Under what conditions can two (or more) packages operations be batched?
-----------------------------------------------------------------------
As an example, say that a class contains a series of package resources without 
any explicit dependencies between them. The idea is that this could be 
optimized. Are there any conditions that makes this impossible?

What if the resources are chained with explicit dependencies? (Guess is that 
the dependencies were added for a reason, and should be done as individual 
dependencies).

What if the list of packages are of different type? Is an chain of implicit 
dependencies between packages of the same type required to make it possible to 
batch them? (Does it depend on the policy for implicit dependencies; 
parse-order, random, etc.)?

What if there are other implicit dependencies. Can it be deduced that an 
intermixed resource has no effect on the outcome of a following package 
operation? (Exec's can for sure do things).

Is it possible to optimize across class boundaries?

Is it enough to look at the queue of actions in the "planned catalog" and 
simply look-ahead for packages handled by the same provider. An unbroken chain of 
operations handled by the same provider is collected and then handed off to the provider? 
Does this provide enough optimization, or are we then likely to miss optimization 
opportunities?

How can we collect metrics for this?

What needs to be done to providers?
-----------------------------------
Clearly the capability to handle multiple requests must be implemented for 
package managers that support this. What should the API look like?

What needs to be done to the Package type?
------------------------------------------
Is it all an ordering issue and handing off resources to the provider, or do we 
need to do things to the Package type as well?

Are there situations were it is of value to veto batching per resource?
(depending on how much optimization than can be deduced by looking at 
resource-dependencies).

Explicit group/list?
--------------------
If we want users to be able to explicitly give a group of packages to manage - 
how should that work? A new resource type? An attribute on Package? A defined 
type?

If we cannot optimize across classes, can we support explicit grouping/batch 
operation? (Seems complex with yet another containment hierarchy - or can this 
be done by introspecting a dependency chain of custom resources/classes perhaps 
used specifically for this purpose?

- henrik


--
You received this message because you are subscribed to the Google Groups "Puppet 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/puppet-dev.
For more options, visit https://groups.google.com/groups/opt_out.



--
You received this message because you are subscribed to the Google Groups "Puppet 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/puppet-dev.
For more options, visit https://groups.google.com/groups/opt_out.

[Puppet-dev] Re: Ideas for Batch Processing of Packages

Reply via email to