Denis,

> - Don't you want to aggregate the tickets under an IEP?
Yes, I think so.

> - Does it mean we're going to update our B+Tree implementation? Any ideas
how risky it is?
One of tickets that I created (
https://issues.apache.org/jira/browse/IGNITE-7935) involves B+Tree
modification, but I am not planning to do it in the nearest future. It
shouldn't affect existing tree operations, only introduce new ones (putAll,
invokeAll, removeAll).

> - Any chance you had a prototype that shows performance optimizations the
approach you are suggesting to take?
I have a prototype for simplest improvements (https://issues.apache.org/
jira/browse/IGNITE-8019 & https://issues.apache.org/jira/browse/IGNITE-8018)
- together they increase throughput by 5-20%, depending on configuration
and environment. Also, I've tested different WAL modes - switching from
LOG_ONLY to NONE gives over 100% boost - this is what I expect from
https://issues.apache.org/jira/browse/IGNITE-8017.

On Thu, Mar 22, 2018 at 9:48 PM, Denis Magda <dma...@apache.org> wrote:

> Ilya,
>
> That's outstanding research and summary. Thanks for spending your time on
> this.
>
> Not sure I have enough expertise to challenge your approach, but it sounds
> 100% reasonable to me. As side notes:
>
>    - Don't you want to aggregate the tickets under an IEP?
>    - Does it mean we're going to update our B+Tree implementation? Any
>    ideas how risky it is?
>    - Any chance you had a prototype that shows performance optimizations of
>    the approach you are suggesting to take?
>
> --
> Denis
>
> On Thu, Mar 22, 2018 at 8:38 AM, Ilya Lantukh <ilant...@gridgain.com>
> wrote:
>
> > Igniters,
> >
> > I've spent some time analyzing performance of rebalancing process. The
> > initial goal was to understand, what limits it's throughput, because it
> is
> > significantly slower than network and storage device can theoretically
> > handle.
> >
> > Turns out, our current implementation has a number of issues caused by a
> > single fundamental problem.
> >
> > During rebalance data is sent in batches called
> > GridDhtPartitionSupplyMessages. Batch size is configurable, default
> value
> > is 512KB, which could mean thousands of key-value pairs. However, we
> don't
> > take any advantage over this fact and process each entry independently:
> > - checkpointReadLock is acquired multiple times for every entry, leading
> to
> > unnecessary contention - this is clearly a bug;
> > - for each entry we write (and fsync, if configuration assumes it) a
> > separate WAL record - so, if batch contains N entries, we might end up
> > doing N fsyncs;
> > - adding every entry into CacheDataStore also happens completely
> > independently. It means, we will traverse and modify each index tree N
> > times, we will allocate space in FreeList N times and we will have to
> > additionally store in WAL O(N*log(N)) page delta records.
> >
> > I've created a few tickets in JIRA with very different levels of scale
> and
> > complexity.
> >
> > Ways to reduce impact of independent processing:
> > - https://issues.apache.org/jira/browse/IGNITE-8019 - aforementioned
> bug,
> > causing contention on checkpointReadLock;
> > - https://issues.apache.org/jira/browse/IGNITE-8018 - inefficiency in
> > GridCacheMapEntry implementation;
> > - https://issues.apache.org/jira/browse/IGNITE-8017 - automatically
> > disable
> > WAL during preloading.
> >
> > Ways to solve problem on more global level:
> > - https://issues.apache.org/jira/browse/IGNITE-7935 - a ticket to
> > introduce
> > batch modification;
> > - https://issues.apache.org/jira/browse/IGNITE-8020 - complete redesign
> of
> > rebalancing process for persistent caches, based on file transfer.
> >
> > Everyone is welcome to criticize above ideas, suggest new ones or
> > participate in implementation.
> >
> > --
> > Best regards,
> > Ilya
> >
>



-- 
Best regards,
Ilya

Reply via email to