I would enable this option only after we confirm it's stable. Until it happens it should be treated as an experimental feature that is turned on manually with a parameter like Ilya is suggesting.
Ilya, the name sounds good to me. -- Denis On Mon, Apr 9, 2018 at 9:16 AM, Anton Vinogradov <a...@apache.org> wrote: > Ilya, > > WAL should be automatically disabled at initial rebalancing. > It can't be disabled at regilar rebalancing, since you never ready to lose > data you protected by WAL. > Is there any need to have special param in that case? > > 2018-04-09 16:41 GMT+03:00 Ilya Lantukh <ilant...@gridgain.com>: > > > Igniters, > > > > I am currently at the finish line of > > https://issues.apache.org/jira/browse/IGNITE-8017 ("Disable WAL during > > initial preloading") implementation. And I need that such behavior should > > be configurable. In my intermediate implementation I have parameter > called > > "disableWalDuringRebalancing" in IgniteConfiguration. Do you thing such > > name is meaningful and self-explanatory? Do we need to ensure that it has > > the same value on every node? Should I make it configurable per cache > > rather than globally? > > > > Please share your thoughts. > > > > On Mon, Apr 9, 2018 at 4:32 PM, Ilya Lantukh <ilant...@gridgain.com> > > wrote: > > > > > Denis, > > > > > > Those ticket are rather complex, and so I don't know when I'll be able > to > > > start working on them. > > > > > > On Fri, Mar 30, 2018 at 11:45 PM, Denis Magda <dma...@apache.org> > wrote: > > > > > >> Ilya, > > >> > > >> Just came across the IEP put together by you: > > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-16% > > >> 3A+Optimization+of+rebalancing > > >> > > >> Excellent explanation, thanks for aggregating everything there. > > >> > > >> Two tickets below don't have a fixed version assigned: > > >> https://issues.apache.org/jira/browse/IGNITE-8020 > > >> https://issues.apache.org/jira/browse/IGNITE-7935 > > >> > > >> Do you plan to work on them in 2.6 time frame, right? > > >> > > >> -- > > >> Denis > > >> > > >> On Tue, Mar 27, 2018 at 9:29 AM, Denis Magda <dma...@apache.org> > wrote: > > >> > > >> > Ilya, granted you all the required permissions. Please let me know > if > > >> you > > >> > still have troubles with the wiki. > > >> > > > >> > -- > > >> > Denis > > >> > > > >> > On Tue, Mar 27, 2018 at 8:56 AM, Ilya Lantukh < > ilant...@gridgain.com> > > >> > wrote: > > >> > > > >> >> Unfortunately, I don't have permission to create page for IEP on > > wiki. > > >> >> Denis, can you grant it? My username is ilantukh. > > >> >> > > >> >> On Mon, Mar 26, 2018 at 8:04 PM, Anton Vinogradov <a...@apache.org> > > >> wrote: > > >> >> > > >> >> > >> It is impossible to disable WAL only for certain partitions > > >> without > > >> >> > >> completely overhauling design of Ignite storage mechanism. > Right > > >> now > > >> >> we > > >> >> > can > > >> >> > >> afford only to change WAL mode per cache group. > > >> >> > > > >> >> > Cache group rebalancing is a one cache rebalancing, and then this > > >> cache > > >> >> > ("cache group") can be presented as a set of virtual caches. > > >> >> > So, there is no issues for initial rebalancing. > > >> >> > Lets disable WAL on initial rebalancing. > > >> >> > > > >> >> > 2018-03-26 16:46 GMT+03:00 Ilya Lantukh <ilant...@gridgain.com>: > > >> >> > > > >> >> > > Dmitry, > > >> >> > > It is impossible to disable WAL only for certain partitions > > without > > >> >> > > completely overhauling design of Ignite storage mechanism. > Right > > >> now > > >> >> we > > >> >> > can > > >> >> > > afford only to change WAL mode per cache group. > > >> >> > > > > >> >> > > The idea is to disable WAL when node doesn't have any partition > > in > > >> >> OWNING > > >> >> > > state, which means it doesn't have any consistent data and > won't > > be > > >> >> able > > >> >> > to > > >> >> > > restore from WAL anyway. I don't see any potential use for WAL > on > > >> such > > >> >> > > node, but we can keep a configurable parameter indicating can > we > > >> >> > > automatically disable WAL in such case or not. > > >> >> > > > > >> >> > > On Fri, Mar 23, 2018 at 10:40 PM, Dmitry Pavlov < > > >> >> dpavlov....@gmail.com> > > >> >> > > wrote: > > >> >> > > > > >> >> > > > Denis, as I understood, there is and idea to exclude only > > >> rebalanced > > >> >> > > > partition(s) data. All other data will go to the WAL. > > >> >> > > > > > >> >> > > > Ilya, please correct me if I'm wrong. > > >> >> > > > > > >> >> > > > пт, 23 мар. 2018 г. в 22:15, Denis Magda <dma...@apache.org > >: > > >> >> > > > > > >> >> > > > > Ilya, > > >> >> > > > > > > >> >> > > > > That's a decent boost (5-20%) even having WAL enabled. Not > > sure > > >> >> that > > >> >> > we > > >> >> > > > > should stake on the WAL "off" mode here because if the > whole > > >> >> cluster > > >> >> > > goes > > >> >> > > > > down, it's then the data consistency is questionable. As an > > >> >> > architect, > > >> >> > > I > > >> >> > > > > wouldn't disable WAL for the sake of rebalancing; it's too > > >> risky. > > >> >> > > > > > > >> >> > > > > If you agree, then let's create the IEP. This way it will > be > > >> >> easier > > >> >> > to > > >> >> > > > > track this endeavor. BTW, are you already ready to release > > any > > >> >> > > > > optimizations in 2.5 that is being discussed in a separate > > >> thread? > > >> >> > > > > > > >> >> > > > > -- > > >> >> > > > > Denis > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > On Fri, Mar 23, 2018 at 6:37 AM, Ilya Lantukh < > > >> >> ilant...@gridgain.com > > >> >> > > > > >> >> > > > > wrote: > > >> >> > > > > > > >> >> > > > > > Denis, > > >> >> > > > > > > > >> >> > > > > > > - Don't you want to aggregate the tickets under an IEP? > > >> >> > > > > > Yes, I think so. > > >> >> > > > > > > > >> >> > > > > > > - Does it mean we're going to update our B+Tree > > >> >> implementation? > > >> >> > Any > > >> >> > > > > ideas > > >> >> > > > > > how risky it is? > > >> >> > > > > > One of tickets that I created ( > > >> >> > > > > > https://issues.apache.org/jira/browse/IGNITE-7935) > > involves > > >> >> B+Tree > > >> >> > > > > > modification, but I am not planning to do it in the > nearest > > >> >> future. > > >> >> > > It > > >> >> > > > > > shouldn't affect existing tree operations, only introduce > > new > > >> >> ones > > >> >> > > > > (putAll, > > >> >> > > > > > invokeAll, removeAll). > > >> >> > > > > > > > >> >> > > > > > > - Any chance you had a prototype that shows performance > > >> >> > > optimizations > > >> >> > > > > the > > >> >> > > > > > approach you are suggesting to take? > > >> >> > > > > > I have a prototype for simplest improvements ( > > >> >> > > > https://issues.apache.org/ > > >> >> > > > > > jira/browse/IGNITE-8019 & https://issues.apache.org/ > > >> >> > > > > > jira/browse/IGNITE-8018) > > >> >> > > > > > - together they increase throughput by 5-20%, depending > on > > >> >> > > > configuration > > >> >> > > > > > and environment. Also, I've tested different WAL modes - > > >> >> switching > > >> >> > > from > > >> >> > > > > > LOG_ONLY to NONE gives over 100% boost - this is what I > > >> expect > > >> >> from > > >> >> > > > > > https://issues.apache.org/jira/browse/IGNITE-8017. > > >> >> > > > > > > > >> >> > > > > > On Thu, Mar 22, 2018 at 9:48 PM, Denis Magda < > > >> dma...@apache.org > > >> >> > > > >> >> > > > wrote: > > >> >> > > > > > > > >> >> > > > > > > Ilya, > > >> >> > > > > > > > > >> >> > > > > > > That's outstanding research and summary. Thanks for > > >> spending > > >> >> your > > >> >> > > > time > > >> >> > > > > on > > >> >> > > > > > > this. > > >> >> > > > > > > > > >> >> > > > > > > Not sure I have enough expertise to challenge your > > >> approach, > > >> >> but > > >> >> > it > > >> >> > > > > > sounds > > >> >> > > > > > > 100% reasonable to me. As side notes: > > >> >> > > > > > > > > >> >> > > > > > > - Don't you want to aggregate the tickets under an > > IEP? > > >> >> > > > > > > - Does it mean we're going to update our B+Tree > > >> >> > implementation? > > >> >> > > > Any > > >> >> > > > > > > ideas how risky it is? > > >> >> > > > > > > - Any chance you had a prototype that shows > > performance > > >> >> > > > > optimizations > > >> >> > > > > > of > > >> >> > > > > > > the approach you are suggesting to take? > > >> >> > > > > > > > > >> >> > > > > > > -- > > >> >> > > > > > > Denis > > >> >> > > > > > > > > >> >> > > > > > > On Thu, Mar 22, 2018 at 8:38 AM, Ilya Lantukh < > > >> >> > > ilant...@gridgain.com > > >> >> > > > > > > >> >> > > > > > > wrote: > > >> >> > > > > > > > > >> >> > > > > > > > Igniters, > > >> >> > > > > > > > > > >> >> > > > > > > > I've spent some time analyzing performance of > > rebalancing > > >> >> > > process. > > >> >> > > > > The > > >> >> > > > > > > > initial goal was to understand, what limits it's > > >> throughput, > > >> >> > > > because > > >> >> > > > > it > > >> >> > > > > > > is > > >> >> > > > > > > > significantly slower than network and storage device > > can > > >> >> > > > > theoretically > > >> >> > > > > > > > handle. > > >> >> > > > > > > > > > >> >> > > > > > > > Turns out, our current implementation has a number of > > >> issues > > >> >> > > caused > > >> >> > > > > by > > >> >> > > > > > a > > >> >> > > > > > > > single fundamental problem. > > >> >> > > > > > > > > > >> >> > > > > > > > During rebalance data is sent in batches called > > >> >> > > > > > > > GridDhtPartitionSupplyMessages. Batch size is > > >> configurable, > > >> >> > > > default > > >> >> > > > > > > value > > >> >> > > > > > > > is 512KB, which could mean thousands of key-value > > pairs. > > >> >> > However, > > >> >> > > > we > > >> >> > > > > > > don't > > >> >> > > > > > > > take any advantage over this fact and process each > > entry > > >> >> > > > > independently: > > >> >> > > > > > > > - checkpointReadLock is acquired multiple times for > > every > > >> >> > entry, > > >> >> > > > > > leading > > >> >> > > > > > > to > > >> >> > > > > > > > unnecessary contention - this is clearly a bug; > > >> >> > > > > > > > - for each entry we write (and fsync, if > configuration > > >> >> assumes > > >> >> > > it) > > >> >> > > > a > > >> >> > > > > > > > separate WAL record - so, if batch contains N > entries, > > we > > >> >> might > > >> >> > > end > > >> >> > > > > up > > >> >> > > > > > > > doing N fsyncs; > > >> >> > > > > > > > - adding every entry into CacheDataStore also happens > > >> >> > completely > > >> >> > > > > > > > independently. It means, we will traverse and modify > > each > > >> >> index > > >> >> > > > tree > > >> >> > > > > N > > >> >> > > > > > > > times, we will allocate space in FreeList N times and > > we > > >> >> will > > >> >> > > have > > >> >> > > > to > > >> >> > > > > > > > additionally store in WAL O(N*log(N)) page delta > > records. > > >> >> > > > > > > > > > >> >> > > > > > > > I've created a few tickets in JIRA with very > different > > >> >> levels > > >> >> > of > > >> >> > > > > scale > > >> >> > > > > > > and > > >> >> > > > > > > > complexity. > > >> >> > > > > > > > > > >> >> > > > > > > > Ways to reduce impact of independent processing: > > >> >> > > > > > > > - https://issues.apache.org/jira/browse/IGNITE-8019 > - > > >> >> > > > aforementioned > > >> >> > > > > > > bug, > > >> >> > > > > > > > causing contention on checkpointReadLock; > > >> >> > > > > > > > - https://issues.apache.org/jira/browse/IGNITE-8018 > - > > >> >> > > inefficiency > > >> >> > > > > in > > >> >> > > > > > > > GridCacheMapEntry implementation; > > >> >> > > > > > > > - https://issues.apache.org/jira/browse/IGNITE-8017 > - > > >> >> > > > automatically > > >> >> > > > > > > > disable > > >> >> > > > > > > > WAL during preloading. > > >> >> > > > > > > > > > >> >> > > > > > > > Ways to solve problem on more global level: > > >> >> > > > > > > > - https://issues.apache.org/jira/browse/IGNITE-7935 > - > > a > > >> >> ticket > > >> >> > > to > > >> >> > > > > > > > introduce > > >> >> > > > > > > > batch modification; > > >> >> > > > > > > > - https://issues.apache.org/jira/browse/IGNITE-8020 > - > > >> >> complete > > >> >> > > > > > redesign > > >> >> > > > > > > of > > >> >> > > > > > > > rebalancing process for persistent caches, based on > > file > > >> >> > > transfer. > > >> >> > > > > > > > > > >> >> > > > > > > > Everyone is welcome to criticize above ideas, suggest > > new > > >> >> ones > > >> >> > or > > >> >> > > > > > > > participate in implementation. > > >> >> > > > > > > > > > >> >> > > > > > > > -- > > >> >> > > > > > > > Best regards, > > >> >> > > > > > > > Ilya > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > -- > > >> >> > > > > > Best regards, > > >> >> > > > > > Ilya > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > -- > > >> >> > > Best regards, > > >> >> > > Ilya > > >> >> > > > > >> >> > > > >> >> > > >> >> > > >> >> > > >> >> -- > > >> >> Best regards, > > >> >> Ilya > > >> >> > > >> > > > >> > > > >> > > > > > > > > > > > > -- > > > Best regards, > > > Ilya > > > > > > > > > > > -- > > Best regards, > > Ilya > > >