Ilya,

I like your idea.
Let's create IEP and jira issues.
I will be glad to take a part in this journey once we'll discuss every
optimization in details.


чт, 13 сент. 2018 г. в 22:51, Dmitriy Setrakyan <dsetrak...@apache.org>:

> Ilya,
>
> Thanks for the detailed explanation. Everything you suggested makes sense.
> Needless to say, PME effect on the grid should be minimized. Let's start
> with the simpler and less risky changes first.
>
> D.
>
> On Thu, Sep 13, 2018 at 5:58 AM Ilya Lantukh <ilant...@gridgain.com>
> wrote:
>
> > Igniters,
> >
> > As most of you know, Ignite has a concept of AffinityTopologyVersion,
> which
> > is associated with nodes that are currently present in topology and a
> > global cluster state (active/inactive, baseline topology, started
> caches).
> > Modification of either of them involves process called Partition Map
> > Exchange (PME) and results in new AffinityTopologyVersion. At that moment
> > all new cache and compute grid operations are globally "frozen". This
> might
> > lead to indeterminate cache downtimes.
> >
> > However, our recent changes (esp. introduction of Baseline Topology)
> caused
> > me to re-think those concept. Currently there are many cases when we
> > trigger PME, but it isn't necessary. For example, adding/removing client
> > node or server node not in BLT should never cause partition map
> > modifications. Those events modify the *topology*, but *affinity* in
> > unaffected. On the other hand, there are events that affect only
> *affinity*
> > - most straightforward example is CacheAffinityChange event, which is
> > triggered after rebalance is finished to assign new primary/backup nodes.
> > So the term *AffinityTopologyVersion* now looks weird - it tries to
> "merge"
> > two entities that aren't always related. To me it makes sense to
> introduce
> > separate *AffinityVersion *and *TopologyVersion*, review all events that
> > currently modify AffinityTopologyVersion and split them into 3
> categories:
> > those that modify only AffinityVersion, only TopologyVersion and both. It
> > will allow us to process such events using different mechanics and avoid
> > redundant steps, and also reconsider mapping of operations - some will be
> > mapped to topology, others - to affinity.
> >
> > Here is my view about how different event types theoretically can be
> > optimized:
> > 1. Client node start / stop: as stated above, no PME is needed, ticket
> > https://issues.apache.org/jira/browse/IGNITE-9558 is already in
> progress.
> > 2. Server node start / stop not from baseline: should be similar to the
> > previous case, since nodes outside of baseline cannot be partition
> owners.
> > 3. Start node in baseline: both affinity and topology versions should be
> > incremented, but it might be possible to optimize PME for such case and
> > avoid cluster-wide freeze. Partition assignments for such node are
> already
> > calculated, so we can simply put them all into MOVING state. However, it
> > might take significant effort to avoid race conditions and redesign our
> > architecture.
> > 4. Cache start / stop: starting or stopping one cache doesn't modify
> > partition maps for other caches. It should be possible to change this
> > procedure to skip PME and perform all necessary actions (compute
> affinity,
> > start/stop cache contexts on each node) in background, but it looks like
> a
> > very complex modification too.
> > 5. Rebalance finish: it seems possible to design a "lightweight" PME for
> > this case as well. If there were no node failures (and if there were, PME
> > should be triggered and rebalance should be cancelled anyways) all
> > partition states are already known by coordinator. Furthermore, no new
> > MOVING or OWNING node for any partition is introduced, so all previous
> > mappings should still be valid.
> >
> > For the latter complex cases in might be necessary to introduce "is
> > compatible" relationship between affinity versions. Operation needs to be
> > remapped only if new version isn't compatible with the previous one.
> >
> > Please share your thoughts.
> >
> > --
> > Best regards,
> > Ilya
> >
>

Reply via email to