Folks, It seems, we have tacit agreement here. Going to merge fix May 15.
On Tue, May 12, 2020 at 10:08 AM Anton Vinogradov <a...@apache.org> wrote: > Denis, > > Rebalance is not expected here since this optimization works only on a > fully rebalanced cluster with baseline. > > On Sat, May 9, 2020 at 12:48 AM Denis Magda <dma...@apache.org> wrote: > >> Hi Anton, >> >> Generally, it means that Ignite will keep executing >> operations/transactions >> that are mapped into the partitions of those cells that won't be >> rebalanced, is that correct? >> >> - >> Denis >> >> >> On Wed, May 6, 2020 at 3:24 AM Anton Vinogradov <a...@apache.org> wrote: >> >> > Igniters, >> > >> > PME-free switch [1] (since 2.8) skips PME on node left when possible >> > (baseline + fully rebalanced cluster). >> > This means we already wait for nothing (except recovery) to perform the >> > switch. >> > This optimization allows continuing already started operations during or >> > after the switch if they are not affected by failed primary. >> > But upcoming operations still can't be started until the switch is >> finished >> > cluster-wide. >> > >> > Let me propose an additional optimization - Cellular switch. >> > Cellular Affinity [2] means that nodes combined into virtual cells >> where, >> > for each partition, backups located at the same cell with primaries. >> > The simplest way to gain Cellular Affinity is to use backup filters [3]. >> > >> > Cellular Affinity allows to finish the switch outside the affected cell >> > instantly with the following assumptions: >> > - Replicated caches should be recovered first since every node affected >> (as >> > a backup) by any failed primary. >> > But, it is expected that replicated caches effectively read-only (has >> > extremely rare updates), so, nothing to wait here. >> > - Upcoming replicated transactions (with non-failed primaries) can be >> > started but can't be committed until switch finished cluster-wide. >> > - Upcoming transactions related to the broken cell will wait for cell >> > recovery (cluster-wide switch finish). >> > >> > ... and this means: >> > In addition to PME-free switch, where we able to continue already >> started >> > operations during or after the switch, now we also able to perform most >> of >> > the upcoming operations during the switch. >> > >> > In other words, Cellular switch has little effect on the operation's >> > latency, when operation not related to the failed cell. >> > >> > According to benchmark [4] which checks "how fast upcoming transactions >> > (started after switch start) can be committed when we have thousands of >> > prepared transactions (prepared before switch start)", we have 5326 ms >> [5] >> > operation's latency on master and 65 ms [6] with the proposed fix, >> which is >> > ~100 times faster. >> > >> > Fix [7] (as a part of IEP-45 [8]) ready to be reviewed. >> > Waiting for your review! >> > >> > >> > [1] >> > >> > >> http://apache-ignite-developers.2346864.n4.nabble.com/Non-blocking-PME-Phase-One-Node-fail-tp43531p44586.html >> > [2] >> > >> > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up#IEP-45:CrashRecoverySpeed-Up-Cellularswitch >> > [3] >> > >> > >> https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5#file-bench-java-L417 >> > [4] >> > >> https://gist.github.com/anton-vinogradov/c50f9d0ce3e3e2997646f84ba7eba5f5 >> > [5] >> > >> > >> https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-master-txt-L15 >> > [6] >> > >> > >> https://gist.github.com/anton-vinogradov/a35a3a8151b7494aa84b83f58cb75889#file-fix-txt-L15 >> > [7] https://issues.apache.org/jira/browse/IGNITE-12617 >> > [8] >> > >> > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-45%3A+Crash+Recovery+Speed-Up >> > >> >