Re: Rebalancing - how to make it faster

Ilya Lantukh Mon, 26 Mar 2018 08:29:26 -0700

Dmitry,
It is impossible to disable WAL only for certain partitions without
completely overhauling design of Ignite storage mechanism. Right now we can
afford only to change WAL mode per cache group.


The idea is to disable WAL when node doesn't have any partition in OWNING
state, which means it doesn't have any consistent data and won't be able to
restore from WAL anyway. I don't see any potential use for WAL on such
node, but we can keep a configurable parameter indicating can we
automatically disable WAL in such case or not.

On Fri, Mar 23, 2018 at 10:40 PM, Dmitry Pavlov <dpavlov....@gmail.com>
wrote:

> Denis, as I understood, there is and idea to exclude only rebalanced
> partition(s) data. All other data will go to the WAL.
>
> Ilya, please correct me if I'm wrong.
>
> пт, 23 мар. 2018 г. в 22:15, Denis Magda <dma...@apache.org>:
>
> > Ilya,
> >
> > That's a decent boost (5-20%) even having WAL enabled. Not sure that we
> > should stake on the WAL "off" mode here because if the whole cluster goes
> > down, it's then the data consistency is questionable. As an architect, I
> > wouldn't disable WAL for the sake of rebalancing; it's too risky.
> >
> > If you agree, then let's create the IEP. This way it will be easier to
> > track this endeavor. BTW, are you already ready to release any
> > optimizations in 2.5 that is being discussed in a separate thread?
> >
> > --
> > Denis
> >
> >
> >
> > On Fri, Mar 23, 2018 at 6:37 AM, Ilya Lantukh <ilant...@gridgain.com>
> > wrote:
> >
> > > Denis,
> > >
> > > > - Don't you want to aggregate the tickets under an IEP?
> > > Yes, I think so.
> > >
> > > > - Does it mean we're going to update our B+Tree implementation? Any
> > ideas
> > > how risky it is?
> > > One of tickets that I created (
> > > https://issues.apache.org/jira/browse/IGNITE-7935) involves B+Tree
> > > modification, but I am not planning to do it in the nearest future. It
> > > shouldn't affect existing tree operations, only introduce new ones
> > (putAll,
> > > invokeAll, removeAll).
> > >
> > > > - Any chance you had a prototype that shows performance optimizations
> > the
> > > approach you are suggesting to take?
> > > I have a prototype for simplest improvements (
> https://issues.apache.org/
> > > jira/browse/IGNITE-8019 & https://issues.apache.org/
> > > jira/browse/IGNITE-8018)
> > > - together they increase throughput by 5-20%, depending on
> configuration
> > > and environment. Also, I've tested different WAL modes - switching from
> > > LOG_ONLY to NONE gives over 100% boost - this is what I expect from
> > > https://issues.apache.org/jira/browse/IGNITE-8017.
> > >
> > > On Thu, Mar 22, 2018 at 9:48 PM, Denis Magda <dma...@apache.org>
> wrote:
> > >
> > > > Ilya,
> > > >
> > > > That's outstanding research and summary. Thanks for spending your
> time
> > on
> > > > this.
> > > >
> > > > Not sure I have enough expertise to challenge your approach, but it
> > > sounds
> > > > 100% reasonable to me. As side notes:
> > > >
> > > >    - Don't you want to aggregate the tickets under an IEP?
> > > >    - Does it mean we're going to update our B+Tree implementation?
> Any
> > > >    ideas how risky it is?
> > > >    - Any chance you had a prototype that shows performance
> > optimizations
> > > of
> > > >    the approach you are suggesting to take?
> > > >
> > > > --
> > > > Denis
> > > >
> > > > On Thu, Mar 22, 2018 at 8:38 AM, Ilya Lantukh <ilant...@gridgain.com
> >
> > > > wrote:
> > > >
> > > > > Igniters,
> > > > >
> > > > > I've spent some time analyzing performance of rebalancing process.
> > The
> > > > > initial goal was to understand, what limits it's throughput,
> because
> > it
> > > > is
> > > > > significantly slower than network and storage device can
> > theoretically
> > > > > handle.
> > > > >
> > > > > Turns out, our current implementation has a number of issues caused
> > by
> > > a
> > > > > single fundamental problem.
> > > > >
> > > > > During rebalance data is sent in batches called
> > > > > GridDhtPartitionSupplyMessages. Batch size is configurable,
> default
> > > > value
> > > > > is 512KB, which could mean thousands of key-value pairs. However,
> we
> > > > don't
> > > > > take any advantage over this fact and process each entry
> > independently:
> > > > > - checkpointReadLock is acquired multiple times for every entry,
> > > leading
> > > > to
> > > > > unnecessary contention - this is clearly a bug;
> > > > > - for each entry we write (and fsync, if configuration assumes it)
> a
> > > > > separate WAL record - so, if batch contains N entries, we might end
> > up
> > > > > doing N fsyncs;
> > > > > - adding every entry into CacheDataStore also happens completely
> > > > > independently. It means, we will traverse and modify each index
> tree
> > N
> > > > > times, we will allocate space in FreeList N times and we will have
> to
> > > > > additionally store in WAL O(N*log(N)) page delta records.
> > > > >
> > > > > I've created a few tickets in JIRA with very different levels of
> > scale
> > > > and
> > > > > complexity.
> > > > >
> > > > > Ways to reduce impact of independent processing:
> > > > > - https://issues.apache.org/jira/browse/IGNITE-8019 -
> aforementioned
> > > > bug,
> > > > > causing contention on checkpointReadLock;
> > > > > - https://issues.apache.org/jira/browse/IGNITE-8018 - inefficiency
> > in
> > > > > GridCacheMapEntry implementation;
> > > > > - https://issues.apache.org/jira/browse/IGNITE-8017 -
> automatically
> > > > > disable
> > > > > WAL during preloading.
> > > > >
> > > > > Ways to solve problem on more global level:
> > > > > - https://issues.apache.org/jira/browse/IGNITE-7935 - a ticket to
> > > > > introduce
> > > > > batch modification;
> > > > > - https://issues.apache.org/jira/browse/IGNITE-8020 - complete
> > > redesign
> > > > of
> > > > > rebalancing process for persistent caches, based on file transfer.
> > > > >
> > > > > Everyone is welcome to criticize above ideas, suggest new ones or
> > > > > participate in implementation.
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Ilya
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ilya
> > >
> >
>



-- 
Best regards,
Ilya

Re: Rebalancing - how to make it faster

Reply via email to