Not sure I agree. I think feature toggle is best practice.

Eventually we can remove related code, but it always good to have migration
stages:
1) old and new provided but new is disabled,
2) both provided and new is enabled
3) new is enabled, old is not available.

It is common practice for enterprize software.

пн, 9 апр. 2018 г. в 20:08, Anton Vinogradov <a...@apache.org>:

> >> I would enable this option only after we confirm it's stable.
> We have to add system property in that case.
>
> It's a bad idea to have such options at API because it can be unstable.
> It's like allowMoreThan1024Node(bool) since we have issues with that.
>
> 2018-04-09 20:03 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:
>
> > How about skipWalOnRebalancing or disableWalOnRebalancing? I like the
> first
> > one better and both are shorter.
> >
> > --Yakov
> >
> > 2018-04-09 19:55 GMT+03:00 Denis Magda <dma...@apache.org>:
> >
> > > I would enable this option only after we confirm it's stable. Until it
> > > happens it should be treated as an experimental feature that is turned
> on
> > > manually with a parameter like Ilya is suggesting.
> > >
> > > Ilya, the name sounds good to me.
> > >
> > > --
> > > Denis
> > >
> > > On Mon, Apr 9, 2018 at 9:16 AM, Anton Vinogradov <a...@apache.org>
> wrote:
> > >
> > > > Ilya,
> > > >
> > > > WAL should be automatically disabled at initial rebalancing.
> > > > It can't be disabled at regilar rebalancing, since you never ready to
> > > lose
> > > > data you protected by WAL.
> > > > Is there any need to have special param in that case?
> > > >
> > > > 2018-04-09 16:41 GMT+03:00 Ilya Lantukh <ilant...@gridgain.com>:
> > > >
> > > > > Igniters,
> > > > >
> > > > > I am currently at the finish line of
> > > > > https://issues.apache.org/jira/browse/IGNITE-8017 ("Disable WAL
> > during
> > > > > initial preloading") implementation. And I need that such behavior
> > > should
> > > > > be configurable. In my intermediate implementation I have parameter
> > > > called
> > > > > "disableWalDuringRebalancing" in IgniteConfiguration. Do you thing
> > such
> > > > > name is meaningful and self-explanatory? Do we need to ensure that
> it
> > > has
> > > > > the same value on every node? Should I make it configurable per
> cache
> > > > > rather than globally?
> > > > >
> > > > > Please share your thoughts.
> > > > >
> > > > > On Mon, Apr 9, 2018 at 4:32 PM, Ilya Lantukh <
> ilant...@gridgain.com>
> > > > > wrote:
> > > > >
> > > > > > Denis,
> > > > > >
> > > > > > Those ticket are rather complex, and so I don't know when I'll be
> > > able
> > > > to
> > > > > > start working on them.
> > > > > >
> > > > > > On Fri, Mar 30, 2018 at 11:45 PM, Denis Magda <dma...@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > >> Ilya,
> > > > > >>
> > > > > >> Just came across the IEP put together by you:
> > > > > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-16%
> > > > > >> 3A+Optimization+of+rebalancing
> > > > > >>
> > > > > >> Excellent explanation, thanks for aggregating everything there.
> > > > > >>
> > > > > >> Two tickets below don't have a fixed version assigned:
> > > > > >> https://issues.apache.org/jira/browse/IGNITE-8020
> > > > > >> https://issues.apache.org/jira/browse/IGNITE-7935
> > > > > >>
> > > > > >> Do you plan to work on them in 2.6 time frame, right?
> > > > > >>
> > > > > >> --
> > > > > >> Denis
> > > > > >>
> > > > > >> On Tue, Mar 27, 2018 at 9:29 AM, Denis Magda <dma...@apache.org
> >
> > > > wrote:
> > > > > >>
> > > > > >> > Ilya, granted you all the required permissions. Please let me
> > know
> > > > if
> > > > > >> you
> > > > > >> > still have troubles with the wiki.
> > > > > >> >
> > > > > >> > --
> > > > > >> > Denis
> > > > > >> >
> > > > > >> > On Tue, Mar 27, 2018 at 8:56 AM, Ilya Lantukh <
> > > > ilant...@gridgain.com>
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> >> Unfortunately, I don't have permission to create page for IEP
> > on
> > > > > wiki.
> > > > > >> >> Denis, can you grant it? My username is ilantukh.
> > > > > >> >>
> > > > > >> >> On Mon, Mar 26, 2018 at 8:04 PM, Anton Vinogradov <
> > a...@apache.org
> > > >
> > > > > >> wrote:
> > > > > >> >>
> > > > > >> >> > >> It is impossible to disable WAL only for certain
> > partitions
> > > > > >> without
> > > > > >> >> > >> completely overhauling design of Ignite storage
> mechanism.
> > > > Right
> > > > > >> now
> > > > > >> >> we
> > > > > >> >> > can
> > > > > >> >> > >> afford only to change WAL mode per cache group.
> > > > > >> >> >
> > > > > >> >> > Cache group rebalancing is a one cache rebalancing, and
> then
> > > this
> > > > > >> cache
> > > > > >> >> > ("cache group") can be presented as a set of virtual
> caches.
> > > > > >> >> > So, there is no issues for initial rebalancing.
> > > > > >> >> > Lets disable WAL on initial rebalancing.
> > > > > >> >> >
> > > > > >> >> > 2018-03-26 16:46 GMT+03:00 Ilya Lantukh <
> > ilant...@gridgain.com
> > > >:
> > > > > >> >> >
> > > > > >> >> > > Dmitry,
> > > > > >> >> > > It is impossible to disable WAL only for certain
> partitions
> > > > > without
> > > > > >> >> > > completely overhauling design of Ignite storage
> mechanism.
> > > > Right
> > > > > >> now
> > > > > >> >> we
> > > > > >> >> > can
> > > > > >> >> > > afford only to change WAL mode per cache group.
> > > > > >> >> > >
> > > > > >> >> > > The idea is to disable WAL when node doesn't have any
> > > partition
> > > > > in
> > > > > >> >> OWNING
> > > > > >> >> > > state, which means it doesn't have any consistent data
> and
> > > > won't
> > > > > be
> > > > > >> >> able
> > > > > >> >> > to
> > > > > >> >> > > restore from WAL anyway. I don't see any potential use
> for
> > > WAL
> > > > on
> > > > > >> such
> > > > > >> >> > > node, but we can keep a configurable parameter indicating
> > can
> > > > we
> > > > > >> >> > > automatically disable WAL in such case or not.
> > > > > >> >> > >
> > > > > >> >> > > On Fri, Mar 23, 2018 at 10:40 PM, Dmitry Pavlov <
> > > > > >> >> dpavlov....@gmail.com>
> > > > > >> >> > > wrote:
> > > > > >> >> > >
> > > > > >> >> > > > Denis, as I understood, there is and idea to exclude
> only
> > > > > >> rebalanced
> > > > > >> >> > > > partition(s) data. All other data will go to the WAL.
> > > > > >> >> > > >
> > > > > >> >> > > > Ilya, please correct me if I'm wrong.
> > > > > >> >> > > >
> > > > > >> >> > > > пт, 23 мар. 2018 г. в 22:15, Denis Magda <
> > > dma...@apache.org
> > > > >:
> > > > > >> >> > > >
> > > > > >> >> > > > > Ilya,
> > > > > >> >> > > > >
> > > > > >> >> > > > > That's a decent boost (5-20%) even having WAL
> enabled.
> > > Not
> > > > > sure
> > > > > >> >> that
> > > > > >> >> > we
> > > > > >> >> > > > > should stake on the WAL "off" mode here because if
> the
> > > > whole
> > > > > >> >> cluster
> > > > > >> >> > > goes
> > > > > >> >> > > > > down, it's then the data consistency is questionable.
> > As
> > > an
> > > > > >> >> > architect,
> > > > > >> >> > > I
> > > > > >> >> > > > > wouldn't disable WAL for the sake of rebalancing;
> it's
> > > too
> > > > > >> risky.
> > > > > >> >> > > > >
> > > > > >> >> > > > > If you agree, then let's create the IEP. This way it
> > will
> > > > be
> > > > > >> >> easier
> > > > > >> >> > to
> > > > > >> >> > > > > track this endeavor. BTW, are you already ready to
> > > release
> > > > > any
> > > > > >> >> > > > > optimizations in 2.5 that is being discussed in a
> > > separate
> > > > > >> thread?
> > > > > >> >> > > > >
> > > > > >> >> > > > > --
> > > > > >> >> > > > > Denis
> > > > > >> >> > > > >
> > > > > >> >> > > > >
> > > > > >> >> > > > >
> > > > > >> >> > > > > On Fri, Mar 23, 2018 at 6:37 AM, Ilya Lantukh <
> > > > > >> >> ilant...@gridgain.com
> > > > > >> >> > >
> > > > > >> >> > > > > wrote:
> > > > > >> >> > > > >
> > > > > >> >> > > > > > Denis,
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > > - Don't you want to aggregate the tickets under
> an
> > > IEP?
> > > > > >> >> > > > > > Yes, I think so.
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > > - Does it mean we're going to update our B+Tree
> > > > > >> >> implementation?
> > > > > >> >> > Any
> > > > > >> >> > > > > ideas
> > > > > >> >> > > > > > how risky it is?
> > > > > >> >> > > > > > One of tickets that I created (
> > > > > >> >> > > > > > https://issues.apache.org/jira/browse/IGNITE-7935)
> > > > > involves
> > > > > >> >> B+Tree
> > > > > >> >> > > > > > modification, but I am not planning to do it in the
> > > > nearest
> > > > > >> >> future.
> > > > > >> >> > > It
> > > > > >> >> > > > > > shouldn't affect existing tree operations, only
> > > introduce
> > > > > new
> > > > > >> >> ones
> > > > > >> >> > > > > (putAll,
> > > > > >> >> > > > > > invokeAll, removeAll).
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > > - Any chance you had a prototype that shows
> > > performance
> > > > > >> >> > > optimizations
> > > > > >> >> > > > > the
> > > > > >> >> > > > > > approach you are suggesting to take?
> > > > > >> >> > > > > > I have a prototype for simplest improvements (
> > > > > >> >> > > > https://issues.apache.org/
> > > > > >> >> > > > > > jira/browse/IGNITE-8019 &
> https://issues.apache.org/
> > > > > >> >> > > > > > jira/browse/IGNITE-8018)
> > > > > >> >> > > > > > - together they increase throughput by 5-20%,
> > depending
> > > > on
> > > > > >> >> > > > configuration
> > > > > >> >> > > > > > and environment. Also, I've tested different WAL
> > modes
> > > -
> > > > > >> >> switching
> > > > > >> >> > > from
> > > > > >> >> > > > > > LOG_ONLY to NONE gives over 100% boost - this is
> > what I
> > > > > >> expect
> > > > > >> >> from
> > > > > >> >> > > > > > https://issues.apache.org/jira/browse/IGNITE-8017.
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > On Thu, Mar 22, 2018 at 9:48 PM, Denis Magda <
> > > > > >> dma...@apache.org
> > > > > >> >> >
> > > > > >> >> > > > wrote:
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > > Ilya,
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > That's outstanding research and summary. Thanks
> for
> > > > > >> spending
> > > > > >> >> your
> > > > > >> >> > > > time
> > > > > >> >> > > > > on
> > > > > >> >> > > > > > > this.
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > Not sure I have enough expertise to challenge
> your
> > > > > >> approach,
> > > > > >> >> but
> > > > > >> >> > it
> > > > > >> >> > > > > > sounds
> > > > > >> >> > > > > > > 100% reasonable to me. As side notes:
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > >    - Don't you want to aggregate the tickets
> under
> > an
> > > > > IEP?
> > > > > >> >> > > > > > >    - Does it mean we're going to update our
> B+Tree
> > > > > >> >> > implementation?
> > > > > >> >> > > > Any
> > > > > >> >> > > > > > >    ideas how risky it is?
> > > > > >> >> > > > > > >    - Any chance you had a prototype that shows
> > > > > performance
> > > > > >> >> > > > > optimizations
> > > > > >> >> > > > > > of
> > > > > >> >> > > > > > >    the approach you are suggesting to take?
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > --
> > > > > >> >> > > > > > > Denis
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > On Thu, Mar 22, 2018 at 8:38 AM, Ilya Lantukh <
> > > > > >> >> > > ilant...@gridgain.com
> > > > > >> >> > > > >
> > > > > >> >> > > > > > > wrote:
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > > Igniters,
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > I've spent some time analyzing performance of
> > > > > rebalancing
> > > > > >> >> > > process.
> > > > > >> >> > > > > The
> > > > > >> >> > > > > > > > initial goal was to understand, what limits
> it's
> > > > > >> throughput,
> > > > > >> >> > > > because
> > > > > >> >> > > > > it
> > > > > >> >> > > > > > > is
> > > > > >> >> > > > > > > > significantly slower than network and storage
> > > device
> > > > > can
> > > > > >> >> > > > > theoretically
> > > > > >> >> > > > > > > > handle.
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > Turns out, our current implementation has a
> > number
> > > of
> > > > > >> issues
> > > > > >> >> > > caused
> > > > > >> >> > > > > by
> > > > > >> >> > > > > > a
> > > > > >> >> > > > > > > > single fundamental problem.
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > During rebalance data is sent in batches called
> > > > > >> >> > > > > > > > GridDhtPartitionSupplyMessages. Batch size is
> > > > > >> configurable,
> > > > > >> >> > > > default
> > > > > >> >> > > > > > > value
> > > > > >> >> > > > > > > > is 512KB, which could mean thousands of
> key-value
> > > > > pairs.
> > > > > >> >> > However,
> > > > > >> >> > > > we
> > > > > >> >> > > > > > > don't
> > > > > >> >> > > > > > > > take any advantage over this fact and process
> > each
> > > > > entry
> > > > > >> >> > > > > independently:
> > > > > >> >> > > > > > > > - checkpointReadLock is acquired multiple times
> > for
> > > > > every
> > > > > >> >> > entry,
> > > > > >> >> > > > > > leading
> > > > > >> >> > > > > > > to
> > > > > >> >> > > > > > > > unnecessary contention - this is clearly a bug;
> > > > > >> >> > > > > > > > - for each entry we write (and fsync, if
> > > > configuration
> > > > > >> >> assumes
> > > > > >> >> > > it)
> > > > > >> >> > > > a
> > > > > >> >> > > > > > > > separate WAL record - so, if batch contains N
> > > > entries,
> > > > > we
> > > > > >> >> might
> > > > > >> >> > > end
> > > > > >> >> > > > > up
> > > > > >> >> > > > > > > > doing N fsyncs;
> > > > > >> >> > > > > > > > - adding every entry into CacheDataStore also
> > > happens
> > > > > >> >> > completely
> > > > > >> >> > > > > > > > independently. It means, we will traverse and
> > > modify
> > > > > each
> > > > > >> >> index
> > > > > >> >> > > > tree
> > > > > >> >> > > > > N
> > > > > >> >> > > > > > > > times, we will allocate space in FreeList N
> times
> > > and
> > > > > we
> > > > > >> >> will
> > > > > >> >> > > have
> > > > > >> >> > > > to
> > > > > >> >> > > > > > > > additionally store in WAL O(N*log(N)) page
> delta
> > > > > records.
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > I've created a few tickets in JIRA with very
> > > > different
> > > > > >> >> levels
> > > > > >> >> > of
> > > > > >> >> > > > > scale
> > > > > >> >> > > > > > > and
> > > > > >> >> > > > > > > > complexity.
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > Ways to reduce impact of independent
> processing:
> > > > > >> >> > > > > > > > - https://issues.apache.org/
> > > jira/browse/IGNITE-8019
> > > > -
> > > > > >> >> > > > aforementioned
> > > > > >> >> > > > > > > bug,
> > > > > >> >> > > > > > > > causing contention on checkpointReadLock;
> > > > > >> >> > > > > > > > - https://issues.apache.org/
> > > jira/browse/IGNITE-8018
> > > > -
> > > > > >> >> > > inefficiency
> > > > > >> >> > > > > in
> > > > > >> >> > > > > > > > GridCacheMapEntry implementation;
> > > > > >> >> > > > > > > > - https://issues.apache.org/
> > > jira/browse/IGNITE-8017
> > > > -
> > > > > >> >> > > > automatically
> > > > > >> >> > > > > > > > disable
> > > > > >> >> > > > > > > > WAL during preloading.
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > Ways to solve problem on more global level:
> > > > > >> >> > > > > > > > - https://issues.apache.org/
> > > jira/browse/IGNITE-7935
> > > > -
> > > > > a
> > > > > >> >> ticket
> > > > > >> >> > > to
> > > > > >> >> > > > > > > > introduce
> > > > > >> >> > > > > > > > batch modification;
> > > > > >> >> > > > > > > > - https://issues.apache.org/
> > > jira/browse/IGNITE-8020
> > > > -
> > > > > >> >> complete
> > > > > >> >> > > > > > redesign
> > > > > >> >> > > > > > > of
> > > > > >> >> > > > > > > > rebalancing process for persistent caches,
> based
> > on
> > > > > file
> > > > > >> >> > > transfer.
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > Everyone is welcome to criticize above ideas,
> > > suggest
> > > > > new
> > > > > >> >> ones
> > > > > >> >> > or
> > > > > >> >> > > > > > > > participate in implementation.
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > --
> > > > > >> >> > > > > > > > Best regards,
> > > > > >> >> > > > > > > > Ilya
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > >
> > > > > >> >> > > > > >
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > --
> > > > > >> >> > > > > > Best regards,
> > > > > >> >> > > > > > Ilya
> > > > > >> >> > > > > >
> > > > > >> >> > > > >
> > > > > >> >> > > >
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > > --
> > > > > >> >> > > Best regards,
> > > > > >> >> > > Ilya
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> --
> > > > > >> >> Best regards,
> > > > > >> >> Ilya
> > > > > >> >>
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Ilya
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Ilya
> > > > >
> > > >
> > >
> >
>

Reply via email to