Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

Zane Bitter Mon, 08 Dec 2014 14:21:07 -0800

On 08/12/14 07:00, Murugan, Visnusaran wrote:


Hi Zane & Michael,

Please have a look @ 
https://etherpad.openstack.org/p/execution-stream-and-aggregator-based-convergence

Updated with a combined approach which does not require persisting graph and 
backup stack removal.

Well, we still have to persist the dependencies of each version of aresource _somehow_, because otherwise we can't know how to clean them upin the correct order. But what I think you meant to say is that thisapproach doesn't require it to be persisted in a separate table wherethe rows are marked as traversed as we work through the graph.

This approach reduces DB queries by waiting for completion notification on a topic. The 
drawback I see is that delete stack stream will be huge as it will have the entire graph. 
We can always dump such data in ResourceLock.data Json and pass a simple flag 
"load_stream_from_db" to converge RPC call as a workaround for delete operation.

This seems to be essentially equivalent to my 'SyncPoint' proposal[1],with the key difference that the data is stored in-memory in a Heatengine rather than the database.

I suspect it's probably a mistake to move it in-memory for similarreasons to the argument Clint made against synchronising the marking offof dependencies in-memory. The database can handle that and the problemof making the DB robust against failures of a single machine has alreadybeen solved by someone else. If we do it in-memory we are just creatinga single point of failure for not much gain. (I guess you could argue itdoesn't matter, since if any Heat engine dies during the traversal thenwe'll have to kick off another one anyway, but it does limit our optionsif that changes in the future.)

It's not clear to me how the 'streams' differ in practical terms fromjust passing a serialisation of the Dependencies object, other thanbeing incomprehensible to me ;). The current Dependencies implementation(1) is a very generic implementation of a DAG, (2) works and has plentyof unit tests, (3) has, with I think one exception, a prettystraightforward API, (4) has a very simple serialisation, returned bythe edges() method, which can be passed back into the constructor torecreate it, and (5) has an API that is to some extent relied upon byresources, and so won't likely be removed outright in any event.Whatever code we need to handle dependencies ought to just build on thisexisting implementation.

I think the difference may be that the streams only include the*shortest* paths (there will often be more than one) to each resource. i.e.


     A <------- B <------- C
     ^                     |
     |                     |
     +---------------------+

can just be written as:

     A <------- B <------- C

because there's only one order in which that can execute anyway. (Ifwe're going to do this though, we should just add a method to thedependencies.Graph class to delete redundant edges, not create a wholenew data structure.) There is a big potential advantage here in that itreduces the theoretical maximum number of edges in the graph from O(n^2)to O(n). (Although in practice real templates are typically not likelyto have such dense graphs.)

There's a downside to this too though: say that A in the above diagramis replaced during an update. In that case not only B but also C willneed to figure out what the latest version of A is. One option here isto pass that data along via B, but that will become very messy toimplement in a non-trivial example. The other would be for C to gosearch in the database for resources with the same name as A and thecurrent traversal_id marked as the latest. But that not only creates aconcurrency problem we didn't have before (A could have been updatedwith a new traversal_id at some point after C had established that thecurrent traversal was still valid but before it went looking for A), italso eliminates all of the performance gains from removing that edge inthe first place.

[1]https://github.com/zaneb/heat-convergence-prototype/blob/distributed-graph/converge/sync_point.py

To Stop current stack operation, we will use your traversal_id based approach.


+1 :)

If in case you feel Aggregator model creates more queues, then we might have to 
poll DB to get resource status. (Which will impact performance adversely :) )

For the reasons given above I would vote for doing this in the DB. Iagree there will be a performance penalty for doing so, because we'll bepaying for robustness.

Lock table: name(Unique - Resource_id), stack_id, engine_id, data (Json to 
store stream dict)

Based on our call on Thursday, I think you're taking the idea of theLock table too literally. The point of referring to locks is that we canuse the same concepts as the Lock table relies on to do atomic updateson a particular row of the database, and we can use those atomic updatesto prevent race conditions when implementingSyncPoints/Aggregators/whatever you want to call them. It's not thatwe'd actually use the Lock table itself, which implements a mutex andtherefore offers only a much slower and more stateful way of doing whatwe want (lock mutex, change data, unlock mutex).


cheers,
Zane.

Your thoughts.
Vishnu (irc: ckmvishnu)
Unmesh (irc: unmeshg)



_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

Reply via email to