Hi zaneb, Etherpad updated.
https://etherpad.openstack.org/p/execution-stream-and-aggregator-based-convergence > -----Original Message----- > From: Murugan, Visnusaran > Sent: Friday, December 12, 2014 4:00 PM > To: OpenStack Development Mailing List (not for usage questions) > Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept > showdown > > > > > -----Original Message----- > > From: Zane Bitter [mailto:zbit...@redhat.com] > > Sent: Friday, December 12, 2014 6:37 AM > > To: openstack-dev@lists.openstack.org > > Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept > > showdown > > > > On 11/12/14 08:26, Murugan, Visnusaran wrote: > > >>> [Murugan, Visnusaran] > > >>> In case of rollback where we have to cleanup earlier version of > > >>> resources, > > >> we could get the order from old template. We'd prefer not to have a > > >> graph table. > > >> > > >> In theory you could get it by keeping old templates around. But > > >> that means keeping a lot of templates, and it will be hard to keep > > >> track of when you want to delete them. It also means that when > > >> starting an update you'll need to load every existing previous > > >> version of the template in order to calculate the dependencies. It > > >> also leaves the dependencies in an ambiguous state when a resource > > >> fails, and although that can be worked around it will be a giant pain to > implement. > > >> > > > > > > Agree that looking to all templates for a delete is not good. But > > > baring Complexity, we feel we could achieve it by way of having an > > > update and a delete stream for a stack update operation. I will > > > elaborate in detail in the etherpad sometime tomorrow :) > > > > > >> I agree that I'd prefer not to have a graph table. After trying a > > >> couple of different things I decided to store the dependencies in > > >> the Resource table, where we can read or write them virtually for > > >> free because it turns out that we are always reading or updating > > >> the Resource itself at exactly the same time anyway. > > >> > > > > > > Not sure how this will work in an update scenario when a resource > > > does not change and its dependencies do. > > > > We'll always update the requirements, even when the properties don't > > change. > > > > Can you elaborate a bit on rollback. We had an approach with depends_on > and needed_by columns in ResourceTable. But dropped it when we figured > out we had too many DB operations for Update. > > > > Also taking care of deleting resources in order will be an issue. > > > > It works fine. > > > > > This implies that there will be different versions of a resource > > > which will even complicate further. > > > > No it doesn't, other than the different versions we already have due > > to UpdateReplace. > > > > >>>> This approach reduces DB queries by waiting for completion > > >>>> notification > > >> on a topic. The drawback I see is that delete stack stream will be > > >> huge as it will have the entire graph. We can always dump such data > > >> in ResourceLock.data Json and pass a simple flag > > >> "load_stream_from_db" to converge RPC call as a workaround for > > >> delete > > operation. > > >>> > > >>> This seems to be essentially equivalent to my 'SyncPoint' > > >>> proposal[1], with > > >> the key difference that the data is stored in-memory in a Heat > > >> engine rather than the database. > > >>> > > >>> I suspect it's probably a mistake to move it in-memory for similar > > >>> reasons to the argument Clint made against synchronising the > > >>> marking off > > >> of dependencies in-memory. The database can handle that and the > > >> problem of making the DB robust against failures of a single > > >> machine has already been solved by someone else. If we do it > > >> in-memory we are just creating a single point of failure for not > > >> much gain. (I guess you could argue it doesn't matter, since if any > > >> Heat engine dies during the traversal then we'll have to kick off > > >> another one anyway, but it does limit our options if that changes > > >> in the > > >> future.) [Murugan, Visnusaran] Resource completes, removes itself > > >> from resource_lock and notifies engine. Engine will acquire parent > > >> lock and initiate parent only if all its children are satisfied (no > > >> child entry in > > resource_lock). > > >> This will come in place of Aggregator. > > >> > > >> Yep, if you s/resource_lock/SyncPoint/ that's more or less exactly > > >> what I > > did. > > >> The three differences I can see are: > > >> > > >> 1) I think you are proposing to create all of the sync points at > > >> the start of the traversal, rather than on an as-needed basis. This > > >> is probably a good idea. I didn't consider it because of the way my > > >> prototype evolved, but there's now no reason I can see not to do this. > > >> If we could move the data to the Resource table itself then we > > >> could even get it for free from an efficiency point of view. > > > > > > +1. But we will need engine_id to be stored somewhere for recovery > > purpose (easy to be queried format). > > > > Yeah, so I'm starting to think you're right, maybe the/a Lock table is > > the right thing to use there. We could probably do it within the > > resource table using the same select-for-update to set the engine_id, > > but I agree that we might be starting to jam too much into that one table. > > > > yeah. Unrelated values in resource table. Upon resource completion we > have to unset engine_id as well as compared to dropping a row from > resource lock. > Both are good. Having engine_id in resource_table will reduce db operaions > in half. We should go with just resource table along with engine_id. > > > > Sync points are created as-needed. Single resource is enough to > > > restart > > that entire stream. > > > I think there is a disconnect in our understanding. I will detail it > > > as well in > > the etherpad. > > > > OK, that would be good. > > > > >> 2) You're using a single list from which items are removed, rather > > >> than two lists (one static, and one to which items are added) that > > >> get > > compared. > > >> Assuming (1) then this is probably a good idea too. > > > > > > Yeah. We have a single list per active stream which work by removing > > > Complete/satisfied resources from it. > > > > I went to change this and then remembered why I did it this way: the > > sync point is also storing data about the resources that are > > triggering it. Part of this is the RefID and attributes, and we could > > replace that by storing that data in the Resource itself and querying > > it rather than having it passed in via the notification. But the other > > part is the ID/key of those resources, which we _need_ to know in > > order to update the requirements in case one of them has been replaced > > and thus the graph doesn't reflect it yet. (Or, for that matter, we > > need it to know where to go looking for the RefId and/or attributes if > > they're in the > > DB.) So we have to store some data, we can't just remove items from > > the required list (although we could do that as well). > > > > >> 3) You're suggesting to notify the engine unconditionally and let > > >> the engine decide if the list is empty. That's probably not a good > > >> idea - not only does it require extra reads, it introduces a race > > >> condition that you then have to solve (it can be solved, it's just more > work). > > >> Since the update to remove a child from the list is atomic, it's > > >> best to just trigger the engine only if the list is now empty. > > >> > > > > > > No. Notify only if stream has something to be processed. The newer > > > Approach based on db lock will be that the last resource will > > > initiate its > > parent. > > > This is opposite to what our Aggregator model had suggested. > > > > OK, I think we're on the same page on this one then. > > > > > Yeah. > > > >>> It's not clear to me how the 'streams' differ in practical terms > > >>> from just passing a serialisation of the Dependencies object, > > >>> other than being incomprehensible to me ;). The current > > >>> Dependencies implementation > > >>> (1) is a very generic implementation of a DAG, (2) works and has > > >>> plenty of > > >> unit tests, (3) has, with I think one exception, a pretty > > >> straightforward API, > > >> (4) has a very simple serialisation, returned by the edges() > > >> method, which can be passed back into the constructor to recreate > > >> it, and (5) has an API that is to some extent relied upon by > > >> resources, and so won't likely be removed outright in any event. > > >>> Whatever code we need to handle dependencies ought to just build > > >>> on > > >> this existing implementation. > > >>> [Murugan, Visnusaran] Our thought was to reduce payload size > > >> (template/graph). Just planning for worst case scenario (million > > >> resource > > >> stack) We could always dump them in ResourceLock.data to be loaded > > >> by Worker. > > >> > > >> If there's a smaller representation of a graph than a list of edges > > >> then I don't know what it is. The proposed stream structure > > >> certainly isn't it, unless you mean as an alternative to storing > > >> the entire graph once for each resource. A better alternative is to > > >> store it once centrally - in my current implementation it is passed > > >> down through the trigger messages, but since only one traversal can > > >> be in progress at a time it could just as easily be stored in the > > >> Stack table of the > > database at the slight cost of an extra write. > > >> > > > > > > Agree that edge is the smallest representation of a graph. But it > > > does not give us a complete picture without doing a DB lookup. Our > > > assumption was to store streams in IN_PROGRESS resource_lock.data > > > column. This could be in resource table instead. > > > > That's true, but I think in practice at any point where we need to > > look at this we will always have already loaded the Stack from the DB > > for some other reason, so we actually can get it for free. (See > > detailed discussion in my reply to Anant.) > > > > Aren't we planning to stop loading stack with all resource objects in future > to > Address scalability concerns we currently have? > > > >> I'm not opposed to doing that, BTW. In fact, I'm really interested > > >> in your input on how that might help make recovery from failure > > >> more robust. I know Anant mentioned that not storing enough data to > > >> recover when a node dies was his big concern with my current > approach. > > >> > > > > > > With streams, We feel recovery will be easier. All we need is a > > > trigger :) > > > > > >> I can see that by both creating all the sync points at the start of > > >> the traversal and storing the dependency graph in the database > > >> instead of letting it flow through the RPC messages, we would be > > >> able to resume a traversal where it left off, though I'm not sure > > >> what that buys > > us. > > >> > > >> And I guess what you're suggesting is that by having an explicit > > >> lock with the engine ID specified, we can detect when a resource is > > >> stuck in IN_PROGRESS due to an engine going down? That's actually > > >> pretty > > interesting. > > >> > > > > > > Yeah :) > > > > > >>> Based on our call on Thursday, I think you're taking the idea of > > >>> the Lock > > >> table too literally. The point of referring to locks is that we can > > >> use the same concepts as the Lock table relies on to do atomic > > >> updates on a particular row of the database, and we can use those > > >> atomic updates to prevent race conditions when implementing > > >> SyncPoints/Aggregators/whatever you want to call them. It's not > > >> that we'd actually use the Lock table itself, which implements a > > >> mutex and therefore offers only a much slower and more stateful way > > >> of doing what we want (lock mutex, change data, unlock mutex). > > >>> [Murugan, Visnusaran] Are you suggesting something like a > > >>> select-for- > > >> update in resource table itself without having a lock table? > > >> > > >> Yes, that's exactly what I was suggesting. > > > > > > DB is always good for sync. But we need to be careful not to overdo it. > > > > Yeah, I see what you mean now, it's starting to _feel_ like there'd be > > too many things mixed together in the Resource table. Are you aware of > > some concrete harm that might cause though? What happens if we overdo > > it? Is select-for-update on a huge row more expensive than the whole > > overhead of manipulating the Lock? > > > > Just trying to figure out if intuition is leading me astray here. > > > > You are right. There should be no difference apart from little bump In > memory usage. But I think it should be fine. > > > > Will update etherpad by tomorrow. > > > > OK, thanks. > > > > cheers, > > Zane. > > > > _______________________________________________ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev