Hi Anton, Generally, it means that Ignite will keep executing operations/transactions that are mapped into the partitions of those cells that won't be rebalanced, is that correct?
- Denis On Wed, May 6, 2020 at 3:24 AM Anton Vinogradov <a...@apache.org> wrote: > Igniters, > > PME-free switch [1] (since 2.8) skips PME on node left when possible > (baseline + fully rebalanced cluster). > This means we already wait for nothing (except recovery) to perform the > switch. > This optimization allows continuing already started operations during or > after the switch if they are not affected by failed primary. > But upcoming operations still can't be started until the switch is finished > cluster-wide. > > Let me propose an additional optimization - Cellular switch. > Cellular Affinity [2] means that nodes combined into virtual cells where, > for each partition, backups located at the same cell with primaries. > The simplest way to gain Cellular Affinity is to use backup filters [3]. > > Cellular Affinity allows to finish the switch outside the affected cell > instantly with the following assumptions: > - Replicated caches should be recovered first since every node affected (as > a backup) by any failed primary. > But, it is expected that replicated caches effectively read-only (has > extremely rare updates), so, nothing to wait here. > - Upcoming replicated transactions (with non-failed primaries) can be > started but can't be committed until switch finished cluster-wide. > - Upcoming transactions related to the broken cell will wait for cell > recovery (cluster-wide switch finish). > > ... and this means: > In addition to PME-free switch, where we able to continue already started > operations during or after the switch, now we also able to perform most of > the upcoming operations during the switch. > > In other words, Cellular switch has little effect on the operation's > latency, when operation not related to the failed cell. > > According to benchmark [4] which checks "how fast upcoming transactions > (started after switch start) can be committed when we have thousands of > prepared transactions (prepared before switch start)", we have 5326 ms [5] > operation's latency on master and 65 ms [6] with the proposed fix, which is > ~100 times faster. > > Fix [7] (as a part of IEP-45 [8]) ready to be reviewed. > Waiting for your review! > > > [1] > > http://apache-ignite-developers.2346864.n4.nabble.com/Non-blocking-PME-Phase-One-Node-fail-tp43531p44586.html > [2] > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up#IEP-45:CrashRecoverySpeed-Up-Cellularswitch > [3] > > https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5#file-bench-java-L417 > [4] > https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5 > [5] > > https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-master-txt-L15 > [6] > > https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-fix-txt-L15 > [7] https://issues.apache.org/jira/browse/IGNITE-12617 > [8] > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up >