Ilya, I like your idea. Let's create IEP and jira issues. I will be glad to take a part in this journey once we'll discuss every optimization in details.
чт, 13 сент. 2018 г. в 22:51, Dmitriy Setrakyan <dsetrak...@apache.org>: > Ilya, > > Thanks for the detailed explanation. Everything you suggested makes sense. > Needless to say, PME effect on the grid should be minimized. Let's start > with the simpler and less risky changes first. > > D. > > On Thu, Sep 13, 2018 at 5:58 AM Ilya Lantukh <ilant...@gridgain.com> > wrote: > > > Igniters, > > > > As most of you know, Ignite has a concept of AffinityTopologyVersion, > which > > is associated with nodes that are currently present in topology and a > > global cluster state (active/inactive, baseline topology, started > caches). > > Modification of either of them involves process called Partition Map > > Exchange (PME) and results in new AffinityTopologyVersion. At that moment > > all new cache and compute grid operations are globally "frozen". This > might > > lead to indeterminate cache downtimes. > > > > However, our recent changes (esp. introduction of Baseline Topology) > caused > > me to re-think those concept. Currently there are many cases when we > > trigger PME, but it isn't necessary. For example, adding/removing client > > node or server node not in BLT should never cause partition map > > modifications. Those events modify the *topology*, but *affinity* in > > unaffected. On the other hand, there are events that affect only > *affinity* > > - most straightforward example is CacheAffinityChange event, which is > > triggered after rebalance is finished to assign new primary/backup nodes. > > So the term *AffinityTopologyVersion* now looks weird - it tries to > "merge" > > two entities that aren't always related. To me it makes sense to > introduce > > separate *AffinityVersion *and *TopologyVersion*, review all events that > > currently modify AffinityTopologyVersion and split them into 3 > categories: > > those that modify only AffinityVersion, only TopologyVersion and both. It > > will allow us to process such events using different mechanics and avoid > > redundant steps, and also reconsider mapping of operations - some will be > > mapped to topology, others - to affinity. > > > > Here is my view about how different event types theoretically can be > > optimized: > > 1. Client node start / stop: as stated above, no PME is needed, ticket > > https://issues.apache.org/jira/browse/IGNITE-9558 is already in > progress. > > 2. Server node start / stop not from baseline: should be similar to the > > previous case, since nodes outside of baseline cannot be partition > owners. > > 3. Start node in baseline: both affinity and topology versions should be > > incremented, but it might be possible to optimize PME for such case and > > avoid cluster-wide freeze. Partition assignments for such node are > already > > calculated, so we can simply put them all into MOVING state. However, it > > might take significant effort to avoid race conditions and redesign our > > architecture. > > 4. Cache start / stop: starting or stopping one cache doesn't modify > > partition maps for other caches. It should be possible to change this > > procedure to skip PME and perform all necessary actions (compute > affinity, > > start/stop cache contexts on each node) in background, but it looks like > a > > very complex modification too. > > 5. Rebalance finish: it seems possible to design a "lightweight" PME for > > this case as well. If there were no node failures (and if there were, PME > > should be triggered and rebalance should be cancelled anyways) all > > partition states are already known by coordinator. Furthermore, no new > > MOVING or OWNING node for any partition is introduced, so all previous > > mappings should still be valid. > > > > For the latter complex cases in might be necessary to introduce "is > > compatible" relationship between affinity versions. Operation needs to be > > remapped only if new version isn't compatible with the previous one. > > > > Please share your thoughts. > > > > -- > > Best regards, > > Ilya > > >