Re: [Puppet-dev] Ideas for Batch Processing of Packages

John Bollinger Tue, 17 Sep 2013 13:20:07 -0700


On Monday, September 16, 2013 11:12:58 AM UTC-5, Andy Parker wrote:
>
> On Mon, Sep 16, 2013 at 6:56 AM, John Bollinger 
> <[email protected]<javascript:>
> > wrote:
>
>>
>>
>> On Monday, September 16, 2013 6:48:17 AM UTC-5, Andy Parker wrote:
>>>
>>>
>>> The problem with this picture for being able to batch operations 
>>> together, is that everything turns into calls on the Puppet::Type instance, 
>>> which then makes all of the individual calls to the provider. To batch, we 
>>> need to group resources together and then give the groups to the provider.
>>>
>>
>>
>> So you don't think my suggestion above to let providers assemble and 
>> apply batches is workable?  I think it requires only one or two extra 
>> signals to the provider (via the Type), to mark batch boundaries.  Most of 
>> the magic would happen in those providers that choose to perform it, and 
>> those that don't choose to perform it can just ignore the new signals.  The 
>> main part of the protocol between the agent core and types/providers 
>> remains unchanged.
>>
>> I haven't delved into the details of how exactly it would be implemented, 
>> so perhaps there is a show-stopper there, but I'm not seeing a flaw in the 
>> basic idea.
>>
>>
> No, you are right. I forgot about that one. I was just running through the 
> code, the biggest problem that I can see so far is simply that there isn't 
> "the provider". We end up with a provider instance per resource, as far as 
> I can tell. Others have solved that by tracking data on the provider class.
>
> I think that for the batching we just need a way of asking a provider if 
> two resources (for the same provider class) are batchable. The comparison 
> of batchable needs to be transitive (so if A and B are batchable, and so 
> are B and C, then all A, B, and C are). In fact it needs to also be 
> symmetric and reflexive, since it is really just another form of equality. 
> That helps us to define what can be batched together.
>
>


I think an equivalence relation may be stronger than is needed.  It should 
be sufficient to be able to answer this weaker question: given a set S of 
mutually batchable resources and a resource R not in S, can R be batched 
together with all the resources of S?  It is possible for a provider type 
to be able to batch resources on that basis, but not meaningfully to batch 
resources based on a full equivalence relation.

 

> Once we know that, then the problem is how to decide what exactly the 
> batches are. Since we don't actually have a complete view of all of the 
> resources, the decision is going to be based off of incomplete information. 
> Also the batches might need to take into account other factors that are out 
> of the control of the provider such as constraints from the "ordering" 
> selected.
>


There are questions related to what can be batched together that are better 
answered by the agent core, and other parts that can be answered only by 
provider types (as distinguished from provider instances).  There must 
therefore be some type of communication between the two about the matter in 
order to do it right.  I remain enamored of the idea of putting the reins 
in the hands of provider types, partly because I think it affords a simpler 
API, and partly because providers are able to provide appropriate 
specialization.

Consider, for example, the "yum" Package provider.  Because of yum's 
nature, the provider cannot easily support batching out-of-sync packages 
ensured 'absent' with out-of-sync packages ensured 'installed', 'latest', 
or <version>, but as long as external considerations (e.g. relationships 
with other resources) do not preclude it, the yum provider could 
simultaneously build separate batches for the two categories.  That would 
allow for larger batches to be formed under some circumstances, and it 
could be essential for correct operation of removals in others.

More generally, batching under control of providers would allow batches of 
different provider types and even of different resource types to be formed 
simultaneously, provided always that the application order of the relevant 
resources is not constrained.



> So, for instance, for "--ordering manifest" we would probably want to do a 
> kind of run length encoding of the resources. Start a batch on a resource, 
> and end the batch when the next resource is not batchable.
>
>

That's similar to my idea for how the agent core would choose when to 
signal that batches must be flushed, but my concept does not require the 
core to have an absolute sense of what is batchable.  It can focus on the 
external considerations alone, such as relationships between resources, and 
let providers worry about the other details.  In any case, the fewer 
ordering constraints the agent must obey, the bigger the batches that can 
be formed.  As such, a requirement to adhere strictly to manifest order is 
potentially a great inhibitor of batching.


John

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/puppet-dev.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Puppet-dev] Ideas for Batch Processing of Packages

Reply via email to