The system property is a good choice, especially for experimental features.
Anton, thanks for reminding about that.

Once the feature will be about to graduate from the experimental state we
might not need any API parameter at all if the functionality is enabled by
default.

--
Denis

On Mon, Apr 9, 2018 at 12:28 PM, Anton Vinogradov <a...@apache.org> wrote:

> Dmitry, I see no problems to use system property for cases you specified.
> Migration possibility is not a reason to change the API.
>
> пн, 9 апр. 2018 г., 20:18 Dmitry Pavlov <dpavlov....@gmail.com>:
>
> > Not sure I agree. I think feature toggle is best practice.
> >
> > Eventually we can remove related code, but it always good to have
> migration
> > stages:
> > 1) old and new provided but new is disabled,
> > 2) both provided and new is enabled
> > 3) new is enabled, old is not available.
> >
> > It is common practice for enterprize software.
> >
> > пн, 9 апр. 2018 г. в 20:08, Anton Vinogradov <a...@apache.org>:
> >
> > > >> I would enable this option only after we confirm it's stable.
> > > We have to add system property in that case.
> > >
> > > It's a bad idea to have such options at API because it can be unstable.
> > > It's like allowMoreThan1024Node(bool) since we have issues with that.
> > >
> > > 2018-04-09 20:03 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:
> > >
> > > > How about skipWalOnRebalancing or disableWalOnRebalancing? I like the
> > > first
> > > > one better and both are shorter.
> > > >
> > > > --Yakov
> > > >
> > > > 2018-04-09 19:55 GMT+03:00 Denis Magda <dma...@apache.org>:
> > > >
> > > > > I would enable this option only after we confirm it's stable. Until
> > it
> > > > > happens it should be treated as an experimental feature that is
> > turned
> > > on
> > > > > manually with a parameter like Ilya is suggesting.
> > > > >
> > > > > Ilya, the name sounds good to me.
> > > > >
> > > > > --
> > > > > Denis
> > > > >
> > > > > On Mon, Apr 9, 2018 at 9:16 AM, Anton Vinogradov <a...@apache.org>
> > > wrote:
> > > > >
> > > > > > Ilya,
> > > > > >
> > > > > > WAL should be automatically disabled at initial rebalancing.
> > > > > > It can't be disabled at regilar rebalancing, since you never
> ready
> > to
> > > > > lose
> > > > > > data you protected by WAL.
> > > > > > Is there any need to have special param in that case?
> > > > > >
> > > > > > 2018-04-09 16:41 GMT+03:00 Ilya Lantukh <ilant...@gridgain.com>:
> > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > I am currently at the finish line of
> > > > > > > https://issues.apache.org/jira/browse/IGNITE-8017 ("Disable
> WAL
> > > > during
> > > > > > > initial preloading") implementation. And I need that such
> > behavior
> > > > > should
> > > > > > > be configurable. In my intermediate implementation I have
> > parameter
> > > > > > called
> > > > > > > "disableWalDuringRebalancing" in IgniteConfiguration. Do you
> > thing
> > > > such
> > > > > > > name is meaningful and self-explanatory? Do we need to ensure
> > that
> > > it
> > > > > has
> > > > > > > the same value on every node? Should I make it configurable per
> > > cache
> > > > > > > rather than globally?
> > > > > > >
> > > > > > > Please share your thoughts.
> > > > > > >
> > > > > > > On Mon, Apr 9, 2018 at 4:32 PM, Ilya Lantukh <
> > > ilant...@gridgain.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Denis,
> > > > > > > >
> > > > > > > > Those ticket are rather complex, and so I don't know when
> I'll
> > be
> > > > > able
> > > > > > to
> > > > > > > > start working on them.
> > > > > > > >
> > > > > > > > On Fri, Mar 30, 2018 at 11:45 PM, Denis Magda <
> > dma...@apache.org
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> Ilya,
> > > > > > > >>
> > > > > > > >> Just came across the IEP put together by you:
> > > > > > > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-16%
> > > > > > > >> 3A+Optimization+of+rebalancing
> > > > > > > >>
> > > > > > > >> Excellent explanation, thanks for aggregating everything
> > there.
> > > > > > > >>
> > > > > > > >> Two tickets below don't have a fixed version assigned:
> > > > > > > >> https://issues.apache.org/jira/browse/IGNITE-8020
> > > > > > > >> https://issues.apache.org/jira/browse/IGNITE-7935
> > > > > > > >>
> > > > > > > >> Do you plan to work on them in 2.6 time frame, right?
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Denis
> > > > > > > >>
> > > > > > > >> On Tue, Mar 27, 2018 at 9:29 AM, Denis Magda <
> > dma...@apache.org
> > > >
> > > > > > wrote:
> > > > > > > >>
> > > > > > > >> > Ilya, granted you all the required permissions. Please let
> > me
> > > > know
> > > > > > if
> > > > > > > >> you
> > > > > > > >> > still have troubles with the wiki.
> > > > > > > >> >
> > > > > > > >> > --
> > > > > > > >> > Denis
> > > > > > > >> >
> > > > > > > >> > On Tue, Mar 27, 2018 at 8:56 AM, Ilya Lantukh <
> > > > > > ilant...@gridgain.com>
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> >> Unfortunately, I don't have permission to create page for
> > IEP
> > > > on
> > > > > > > wiki.
> > > > > > > >> >> Denis, can you grant it? My username is ilantukh.
> > > > > > > >> >>
> > > > > > > >> >> On Mon, Mar 26, 2018 at 8:04 PM, Anton Vinogradov <
> > > > a...@apache.org
> > > > > >
> > > > > > > >> wrote:
> > > > > > > >> >>
> > > > > > > >> >> > >> It is impossible to disable WAL only for certain
> > > > partitions
> > > > > > > >> without
> > > > > > > >> >> > >> completely overhauling design of Ignite storage
> > > mechanism.
> > > > > > Right
> > > > > > > >> now
> > > > > > > >> >> we
> > > > > > > >> >> > can
> > > > > > > >> >> > >> afford only to change WAL mode per cache group.
> > > > > > > >> >> >
> > > > > > > >> >> > Cache group rebalancing is a one cache rebalancing, and
> > > then
> > > > > this
> > > > > > > >> cache
> > > > > > > >> >> > ("cache group") can be presented as a set of virtual
> > > caches.
> > > > > > > >> >> > So, there is no issues for initial rebalancing.
> > > > > > > >> >> > Lets disable WAL on initial rebalancing.
> > > > > > > >> >> >
> > > > > > > >> >> > 2018-03-26 16:46 GMT+03:00 Ilya Lantukh <
> > > > ilant...@gridgain.com
> > > > > >:
> > > > > > > >> >> >
> > > > > > > >> >> > > Dmitry,
> > > > > > > >> >> > > It is impossible to disable WAL only for certain
> > > partitions
> > > > > > > without
> > > > > > > >> >> > > completely overhauling design of Ignite storage
> > > mechanism.
> > > > > > Right
> > > > > > > >> now
> > > > > > > >> >> we
> > > > > > > >> >> > can
> > > > > > > >> >> > > afford only to change WAL mode per cache group.
> > > > > > > >> >> > >
> > > > > > > >> >> > > The idea is to disable WAL when node doesn't have any
> > > > > partition
> > > > > > > in
> > > > > > > >> >> OWNING
> > > > > > > >> >> > > state, which means it doesn't have any consistent
> data
> > > and
> > > > > > won't
> > > > > > > be
> > > > > > > >> >> able
> > > > > > > >> >> > to
> > > > > > > >> >> > > restore from WAL anyway. I don't see any potential
> use
> > > for
> > > > > WAL
> > > > > > on
> > > > > > > >> such
> > > > > > > >> >> > > node, but we can keep a configurable parameter
> > indicating
> > > > can
> > > > > > we
> > > > > > > >> >> > > automatically disable WAL in such case or not.
> > > > > > > >> >> > >
> > > > > > > >> >> > > On Fri, Mar 23, 2018 at 10:40 PM, Dmitry Pavlov <
> > > > > > > >> >> dpavlov....@gmail.com>
> > > > > > > >> >> > > wrote:
> > > > > > > >> >> > >
> > > > > > > >> >> > > > Denis, as I understood, there is and idea to
> exclude
> > > only
> > > > > > > >> rebalanced
> > > > > > > >> >> > > > partition(s) data. All other data will go to the
> WAL.
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > Ilya, please correct me if I'm wrong.
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > пт, 23 мар. 2018 г. в 22:15, Denis Magda <
> > > > > dma...@apache.org
> > > > > > >:
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > > Ilya,
> > > > > > > >> >> > > > >
> > > > > > > >> >> > > > > That's a decent boost (5-20%) even having WAL
> > > enabled.
> > > > > Not
> > > > > > > sure
> > > > > > > >> >> that
> > > > > > > >> >> > we
> > > > > > > >> >> > > > > should stake on the WAL "off" mode here because
> if
> > > the
> > > > > > whole
> > > > > > > >> >> cluster
> > > > > > > >> >> > > goes
> > > > > > > >> >> > > > > down, it's then the data consistency is
> > questionable.
> > > > As
> > > > > an
> > > > > > > >> >> > architect,
> > > > > > > >> >> > > I
> > > > > > > >> >> > > > > wouldn't disable WAL for the sake of rebalancing;
> > > it's
> > > > > too
> > > > > > > >> risky.
> > > > > > > >> >> > > > >
> > > > > > > >> >> > > > > If you agree, then let's create the IEP. This way
> > it
> > > > will
> > > > > > be
> > > > > > > >> >> easier
> > > > > > > >> >> > to
> > > > > > > >> >> > > > > track this endeavor. BTW, are you already ready
> to
> > > > > release
> > > > > > > any
> > > > > > > >> >> > > > > optimizations in 2.5 that is being discussed in a
> > > > > separate
> > > > > > > >> thread?
> > > > > > > >> >> > > > >
> > > > > > > >> >> > > > > --
> > > > > > > >> >> > > > > Denis
> > > > > > > >> >> > > > >
> > > > > > > >> >> > > > >
> > > > > > > >> >> > > > >
> > > > > > > >> >> > > > > On Fri, Mar 23, 2018 at 6:37 AM, Ilya Lantukh <
> > > > > > > >> >> ilant...@gridgain.com
> > > > > > > >> >> > >
> > > > > > > >> >> > > > > wrote:
> > > > > > > >> >> > > > >
> > > > > > > >> >> > > > > > Denis,
> > > > > > > >> >> > > > > >
> > > > > > > >> >> > > > > > > - Don't you want to aggregate the tickets
> under
> > > an
> > > > > IEP?
> > > > > > > >> >> > > > > > Yes, I think so.
> > > > > > > >> >> > > > > >
> > > > > > > >> >> > > > > > > - Does it mean we're going to update our
> B+Tree
> > > > > > > >> >> implementation?
> > > > > > > >> >> > Any
> > > > > > > >> >> > > > > ideas
> > > > > > > >> >> > > > > > how risky it is?
> > > > > > > >> >> > > > > > One of tickets that I created (
> > > > > > > >> >> > > > > >
> > https://issues.apache.org/jira/browse/IGNITE-7935)
> > > > > > > involves
> > > > > > > >> >> B+Tree
> > > > > > > >> >> > > > > > modification, but I am not planning to do it in
> > the
> > > > > > nearest
> > > > > > > >> >> future.
> > > > > > > >> >> > > It
> > > > > > > >> >> > > > > > shouldn't affect existing tree operations, only
> > > > > introduce
> > > > > > > new
> > > > > > > >> >> ones
> > > > > > > >> >> > > > > (putAll,
> > > > > > > >> >> > > > > > invokeAll, removeAll).
> > > > > > > >> >> > > > > >
> > > > > > > >> >> > > > > > > - Any chance you had a prototype that shows
> > > > > performance
> > > > > > > >> >> > > optimizations
> > > > > > > >> >> > > > > the
> > > > > > > >> >> > > > > > approach you are suggesting to take?
> > > > > > > >> >> > > > > > I have a prototype for simplest improvements (
> > > > > > > >> >> > > > https://issues.apache.org/
> > > > > > > >> >> > > > > > jira/browse/IGNITE-8019 &
> > > https://issues.apache.org/
> > > > > > > >> >> > > > > > jira/browse/IGNITE-8018)
> > > > > > > >> >> > > > > > - together they increase throughput by 5-20%,
> > > > depending
> > > > > > on
> > > > > > > >> >> > > > configuration
> > > > > > > >> >> > > > > > and environment. Also, I've tested different
> WAL
> > > > modes
> > > > > -
> > > > > > > >> >> switching
> > > > > > > >> >> > > from
> > > > > > > >> >> > > > > > LOG_ONLY to NONE gives over 100% boost - this
> is
> > > > what I
> > > > > > > >> expect
> > > > > > > >> >> from
> > > > > > > >> >> > > > > >
> > https://issues.apache.org/jira/browse/IGNITE-8017.
> > > > > > > >> >> > > > > >
> > > > > > > >> >> > > > > > On Thu, Mar 22, 2018 at 9:48 PM, Denis Magda <
> > > > > > > >> dma...@apache.org
> > > > > > > >> >> >
> > > > > > > >> >> > > > wrote:
> > > > > > > >> >> > > > > >
> > > > > > > >> >> > > > > > > Ilya,
> > > > > > > >> >> > > > > > >
> > > > > > > >> >> > > > > > > That's outstanding research and summary.
> Thanks
> > > for
> > > > > > > >> spending
> > > > > > > >> >> your
> > > > > > > >> >> > > > time
> > > > > > > >> >> > > > > on
> > > > > > > >> >> > > > > > > this.
> > > > > > > >> >> > > > > > >
> > > > > > > >> >> > > > > > > Not sure I have enough expertise to challenge
> > > your
> > > > > > > >> approach,
> > > > > > > >> >> but
> > > > > > > >> >> > it
> > > > > > > >> >> > > > > > sounds
> > > > > > > >> >> > > > > > > 100% reasonable to me. As side notes:
> > > > > > > >> >> > > > > > >
> > > > > > > >> >> > > > > > >    - Don't you want to aggregate the tickets
> > > under
> > > > an
> > > > > > > IEP?
> > > > > > > >> >> > > > > > >    - Does it mean we're going to update our
> > > B+Tree
> > > > > > > >> >> > implementation?
> > > > > > > >> >> > > > Any
> > > > > > > >> >> > > > > > >    ideas how risky it is?
> > > > > > > >> >> > > > > > >    - Any chance you had a prototype that
> shows
> > > > > > > performance
> > > > > > > >> >> > > > > optimizations
> > > > > > > >> >> > > > > > of
> > > > > > > >> >> > > > > > >    the approach you are suggesting to take?
> > > > > > > >> >> > > > > > >
> > > > > > > >> >> > > > > > > --
> > > > > > > >> >> > > > > > > Denis
> > > > > > > >> >> > > > > > >
> > > > > > > >> >> > > > > > > On Thu, Mar 22, 2018 at 8:38 AM, Ilya
> Lantukh <
> > > > > > > >> >> > > ilant...@gridgain.com
> > > > > > > >> >> > > > >
> > > > > > > >> >> > > > > > > wrote:
> > > > > > > >> >> > > > > > >
> > > > > > > >> >> > > > > > > > Igniters,
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > I've spent some time analyzing performance
> of
> > > > > > > rebalancing
> > > > > > > >> >> > > process.
> > > > > > > >> >> > > > > The
> > > > > > > >> >> > > > > > > > initial goal was to understand, what limits
> > > it's
> > > > > > > >> throughput,
> > > > > > > >> >> > > > because
> > > > > > > >> >> > > > > it
> > > > > > > >> >> > > > > > > is
> > > > > > > >> >> > > > > > > > significantly slower than network and
> storage
> > > > > device
> > > > > > > can
> > > > > > > >> >> > > > > theoretically
> > > > > > > >> >> > > > > > > > handle.
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > Turns out, our current implementation has a
> > > > number
> > > > > of
> > > > > > > >> issues
> > > > > > > >> >> > > caused
> > > > > > > >> >> > > > > by
> > > > > > > >> >> > > > > > a
> > > > > > > >> >> > > > > > > > single fundamental problem.
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > During rebalance data is sent in batches
> > called
> > > > > > > >> >> > > > > > > > GridDhtPartitionSupplyMessages. Batch
> size is
> > > > > > > >> configurable,
> > > > > > > >> >> > > > default
> > > > > > > >> >> > > > > > > value
> > > > > > > >> >> > > > > > > > is 512KB, which could mean thousands of
> > > key-value
> > > > > > > pairs.
> > > > > > > >> >> > However,
> > > > > > > >> >> > > > we
> > > > > > > >> >> > > > > > > don't
> > > > > > > >> >> > > > > > > > take any advantage over this fact and
> process
> > > > each
> > > > > > > entry
> > > > > > > >> >> > > > > independently:
> > > > > > > >> >> > > > > > > > - checkpointReadLock is acquired multiple
> > times
> > > > for
> > > > > > > every
> > > > > > > >> >> > entry,
> > > > > > > >> >> > > > > > leading
> > > > > > > >> >> > > > > > > to
> > > > > > > >> >> > > > > > > > unnecessary contention - this is clearly a
> > bug;
> > > > > > > >> >> > > > > > > > - for each entry we write (and fsync, if
> > > > > > configuration
> > > > > > > >> >> assumes
> > > > > > > >> >> > > it)
> > > > > > > >> >> > > > a
> > > > > > > >> >> > > > > > > > separate WAL record - so, if batch
> contains N
> > > > > > entries,
> > > > > > > we
> > > > > > > >> >> might
> > > > > > > >> >> > > end
> > > > > > > >> >> > > > > up
> > > > > > > >> >> > > > > > > > doing N fsyncs;
> > > > > > > >> >> > > > > > > > - adding every entry into CacheDataStore
> also
> > > > > happens
> > > > > > > >> >> > completely
> > > > > > > >> >> > > > > > > > independently. It means, we will traverse
> and
> > > > > modify
> > > > > > > each
> > > > > > > >> >> index
> > > > > > > >> >> > > > tree
> > > > > > > >> >> > > > > N
> > > > > > > >> >> > > > > > > > times, we will allocate space in FreeList N
> > > times
> > > > > and
> > > > > > > we
> > > > > > > >> >> will
> > > > > > > >> >> > > have
> > > > > > > >> >> > > > to
> > > > > > > >> >> > > > > > > > additionally store in WAL O(N*log(N)) page
> > > delta
> > > > > > > records.
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > I've created a few tickets in JIRA with
> very
> > > > > > different
> > > > > > > >> >> levels
> > > > > > > >> >> > of
> > > > > > > >> >> > > > > scale
> > > > > > > >> >> > > > > > > and
> > > > > > > >> >> > > > > > > > complexity.
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > Ways to reduce impact of independent
> > > processing:
> > > > > > > >> >> > > > > > > > - https://issues.apache.org/
> > > > > jira/browse/IGNITE-8019
> > > > > > -
> > > > > > > >> >> > > > aforementioned
> > > > > > > >> >> > > > > > > bug,
> > > > > > > >> >> > > > > > > > causing contention on checkpointReadLock;
> > > > > > > >> >> > > > > > > > - https://issues.apache.org/
> > > > > jira/browse/IGNITE-8018
> > > > > > -
> > > > > > > >> >> > > inefficiency
> > > > > > > >> >> > > > > in
> > > > > > > >> >> > > > > > > > GridCacheMapEntry implementation;
> > > > > > > >> >> > > > > > > > - https://issues.apache.org/
> > > > > jira/browse/IGNITE-8017
> > > > > > -
> > > > > > > >> >> > > > automatically
> > > > > > > >> >> > > > > > > > disable
> > > > > > > >> >> > > > > > > > WAL during preloading.
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > Ways to solve problem on more global level:
> > > > > > > >> >> > > > > > > > - https://issues.apache.org/
> > > > > jira/browse/IGNITE-7935
> > > > > > -
> > > > > > > a
> > > > > > > >> >> ticket
> > > > > > > >> >> > > to
> > > > > > > >> >> > > > > > > > introduce
> > > > > > > >> >> > > > > > > > batch modification;
> > > > > > > >> >> > > > > > > > - https://issues.apache.org/
> > > > > jira/browse/IGNITE-8020
> > > > > > -
> > > > > > > >> >> complete
> > > > > > > >> >> > > > > > redesign
> > > > > > > >> >> > > > > > > of
> > > > > > > >> >> > > > > > > > rebalancing process for persistent caches,
> > > based
> > > > on
> > > > > > > file
> > > > > > > >> >> > > transfer.
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > Everyone is welcome to criticize above
> ideas,
> > > > > suggest
> > > > > > > new
> > > > > > > >> >> ones
> > > > > > > >> >> > or
> > > > > > > >> >> > > > > > > > participate in implementation.
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > > > --
> > > > > > > >> >> > > > > > > > Best regards,
> > > > > > > >> >> > > > > > > > Ilya
> > > > > > > >> >> > > > > > > >
> > > > > > > >> >> > > > > > >
> > > > > > > >> >> > > > > >
> > > > > > > >> >> > > > > >
> > > > > > > >> >> > > > > >
> > > > > > > >> >> > > > > > --
> > > > > > > >> >> > > > > > Best regards,
> > > > > > > >> >> > > > > > Ilya
> > > > > > > >> >> > > > > >
> > > > > > > >> >> > > > >
> > > > > > > >> >> > > >
> > > > > > > >> >> > >
> > > > > > > >> >> > >
> > > > > > > >> >> > >
> > > > > > > >> >> > > --
> > > > > > > >> >> > > Best regards,
> > > > > > > >> >> > > Ilya
> > > > > > > >> >> > >
> > > > > > > >> >> >
> > > > > > > >> >>
> > > > > > > >> >>
> > > > > > > >> >>
> > > > > > > >> >> --
> > > > > > > >> >> Best regards,
> > > > > > > >> >> Ilya
> > > > > > > >> >>
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best regards,
> > > > > > > > Ilya
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > > Ilya
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to