Denis, Rebalance is not expected here since this optimization works only on a fully rebalanced cluster with baseline.
On Sat, May 9, 2020 at 12:48 AM Denis Magda <dma...@apache.org> wrote: > Hi Anton, > > Generally, it means that Ignite will keep executing operations/transactions > that are mapped into the partitions of those cells that won't be > rebalanced, is that correct? > > - > Denis > > > On Wed, May 6, 2020 at 3:24 AM Anton Vinogradov <a...@apache.org> wrote: > > > Igniters, > > > > PME-free switch [1] (since 2.8) skips PME on node left when possible > > (baseline + fully rebalanced cluster). > > This means we already wait for nothing (except recovery) to perform the > > switch. > > This optimization allows continuing already started operations during or > > after the switch if they are not affected by failed primary. > > But upcoming operations still can't be started until the switch is > finished > > cluster-wide. > > > > Let me propose an additional optimization - Cellular switch. > > Cellular Affinity [2] means that nodes combined into virtual cells where, > > for each partition, backups located at the same cell with primaries. > > The simplest way to gain Cellular Affinity is to use backup filters [3]. > > > > Cellular Affinity allows to finish the switch outside the affected cell > > instantly with the following assumptions: > > - Replicated caches should be recovered first since every node affected > (as > > a backup) by any failed primary. > > But, it is expected that replicated caches effectively read-only (has > > extremely rare updates), so, nothing to wait here. > > - Upcoming replicated transactions (with non-failed primaries) can be > > started but can't be committed until switch finished cluster-wide. > > - Upcoming transactions related to the broken cell will wait for cell > > recovery (cluster-wide switch finish). > > > > ... and this means: > > In addition to PME-free switch, where we able to continue already started > > operations during or after the switch, now we also able to perform most > of > > the upcoming operations during the switch. > > > > In other words, Cellular switch has little effect on the operation's > > latency, when operation not related to the failed cell. > > > > According to benchmark [4] which checks "how fast upcoming transactions > > (started after switch start) can be committed when we have thousands of > > prepared transactions (prepared before switch start)", we have 5326 ms > [5] > > operation's latency on master and 65 ms [6] with the proposed fix, which > is > > ~100 times faster. > > > > Fix [7] (as a part of IEP-45 [8]) ready to be reviewed. > > Waiting for your review! > > > > > > [1] > > > > > http://apache-ignite-developers.2346864.n4.nabble.com/Non-blocking-PME-Phase-One-Node-fail-tp43531p44586.html > > [2] > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up#IEP-45:CrashRecoverySpeed-Up-Cellularswitch > > [3] > > > > > https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5#file-bench-java-L417 > > [4] > > > https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5 > > [5] > > > > > https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-master-txt-L15 > > [6] > > > > > https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-fix-txt-L15 > > [7] https://issues.apache.org/jira/browse/IGNITE-12617 > > [8] > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up > > >