Re: Batch support in Cassandra store

Valentin Kulichenko Fri, 29 Jul 2016 17:46:03 -0700

Hi Igor,

1) Yes, I'm talking about splitting the entry set into per-partition (or
per-node) batches. Having entries that are stores on different nodes in the
same batch doesn't make much sense, of course.


2) RAMP looks interesting, but it seems to be a pretty complicated task.
How about adding the support for built-in logged batches (this should be
fairly easy to implement) and then improve the atomicity as a second phase?

-Val

On Fri, Jul 29, 2016 at 5:19 PM, Igor Rudyak <irud...@gmail.com> wrote:

> Hi Valentin,
>
> 1) According unlogged batches I think it doesn't make sense to support
> them, cause:
> - They are deprecated starting from Cassandra 3.0 (which we are currently
> using in Cassandra module)
> - According to Cassandra documentation (
> http://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html) "Batches
> are often mistakenly used in an attempt to optimize performance". Cassandra
> guys saying that no batches (
> https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.rxkmfe209)
> is the fastest way to load data. I checked it with the batches having
> records with different partition keys and it's definitely true. For small
> batch of records having all the same partition key (affinity in Ignite)
> they could provide better performance, but I didn't investigated this case
> deeply (what is the optimal size of a batch, how significantly is the
> performance benefits and etc.) Can try to do some load tests to have better
> understanding of this.
>
> 2) Regarding logged batches I think that it makes sense to support them in
> Cassandra module for transactional caches. The bad thing is that they don't
> provide isolation, the good thing is they guaranty that all your changes
> will be eventually committed and visible to clients. Thus it's still better
> than nothing... However there is a better approach for this. We can
> implement transactional protocol on top of Cassandra, which will give us
> atomic read isolation - you'll either see all the changes made by
> transaction or none of them. For example we can implement RAMP transactions(
> http://www.bailis.org/papers/ramp-sigmod2014.pdf) cause it provides
> rather low overhead.
>
> Igor Rudyak
>
> On Thu, Jul 28, 2016 at 11:00 PM, Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
>> Hi Igor,
>>
>> I'm not a big Cassandra expert, but here are my thoughts.
>>
>> 1. Sending updates in a batch is always better than sending them one by
>> one. For example, if you do putAll in Ignite with 100 entries, and these
>> entries are split across 5 nodes, the client will send 5 requests instead
>> of 100. This provides significant performance improvement. Is there a way
>> to use similar approach in Cassandra?
>> 2. As for logged batches, I can easily believe that this is a rarely used
>> feature, but since it exists in Cassandra, I can't find a single reason why
>> not to support it in our store as an option. Users that come across those
>> rare cases, will only say thank you to us :)
>>
>> What do you think?
>>
>> -Val
>>
>> On Thu, Jul 28, 2016 at 10:41 PM, Igor Rudyak <irud...@gmail.com> wrote:
>>
>>> There are actually some cases when atomic read isolation in Cassandra
>>> could
>>> be important. Lets assume batch was persisted in Cassandra, but not
>>> finalized yet - read operation from Cassandra returns us only partially
>>> committed data of the batch. In the such situation we have problems when:
>>>
>>> 1) Some of the batch records already expired from Ignite cache and we
>>> reading them from persistent store (Cassandra in our case).
>>>
>>> 2) All Ignite nodes storing the batch records (or subset records) died
>>> (or
>>> for example became unavailable for 10sec because of network problem).
>>> While
>>> reading such records from Ignite cache we will be redirected to
>>> persistent
>>> store.
>>>
>>> 3) Network separation occurred such a way that we now have two Ignite
>>> cluster, but all the replicas of the batch data are located only in one
>>> of
>>> these clusters. Again while reading such records from Ignite cache on the
>>> second cluster we will be redirected to persistent store.
>>>
>>> In all mentioned cases, if Cassandra batch isn't finalized yet - we will
>>> read partially committed transaction data.
>>>
>>>
>>> On Thu, Jul 28, 2016 at 6:52 AM, Luiz Felipe Trevisan <
>>> luizfelipe.trevi...@gmail.com> wrote:
>>>
>>> > I totally agree with you regarding the guarantees we have with logged
>>> > batches and I'm also pretty much aware of the performance penalty
>>> involved
>>> > using this solution.
>>> >
>>> > But since all read operations are executed via ignite it means that
>>> > isolation in the Cassandra level is not really important. I think the
>>> only
>>> > guarantee really needed is that we don't end up with a partial insert
>>> in
>>> > Cassandra in case we have a failure in ignite and we loose the node
>>> that
>>> > was responsible for this write operation.
>>> >
>>> > My other assumption is that the write operation needs to finish before
>>> an
>>> > eviction happens for this entry and we loose the data in cache (since
>>> batch
>>> > doesn't guarantee isolation). However if we cannot achieve this I
>>> don't see
>>> > why use ignite as a cache store.
>>> >
>>> > Luiz
>>> >
>>> > --
>>> > Luiz Felipe Trevisan
>>> >
>>> > On Wed, Jul 27, 2016 at 4:55 PM, Igor Rudyak <irud...@gmail.com>
>>> wrote:
>>> >
>>> >> Hi Luiz,
>>> >>
>>> >> Logged batches is not the solution to achieve atomic view of your
>>> Ignite
>>> >> transaction changes in Cassandra.
>>> >>
>>> >> The problem with logged batches(aka atomic) is they guarantees that if
>>> >> any part of the batch succeeds, all of it will, no other transactional
>>> >> enforcement is done at the batch level. For example, there is no batch
>>> >> isolation. Clients are able to read the first updated rows from the
>>> batch,
>>> >> while other rows are still being updated on the server (in RDBMS
>>> >> terminology it means *READ-UNCOMMITED* isolation level). Thus
>>> Cassandra
>>>
>>> >> mean "atomic" in the database sense that if any part of the batch
>>> succeeds,
>>> >> all of it will.
>>> >>
>>> >> Probably the best way to archive read atomic isolation for Ignite
>>> >> transaction persisting data into Cassandra, is to implement RAMP
>>> >> transactions (http://www.bailis.org/papers/ramp-sigmod2014.pdf) on
>>> top
>>> >> of Cassandra.
>>> >>
>>> >> I may create a ticket for this if community would like it.
>>> >>
>>> >>
>>> >> Igor Rudyak
>>> >>
>>> >>
>>> >> On Wed, Jul 27, 2016 at 12:55 PM, Luiz Felipe Trevisan <
>>> >> luizfelipe.trevi...@gmail.com> wrote:
>>> >>
>>> >>> Hi Igor,
>>> >>>
>>> >>> Does it make sense for you using logged batches to guarantee
>>> atomicity
>>> >>> in Cassandra in cases we are doing a cross cache transaction
>>> operation?
>>> >>>
>>> >>> Luiz
>>> >>>
>>> >>> --
>>> >>> Luiz Felipe Trevisan
>>> >>>
>>> >>> On Wed, Jul 27, 2016 at 2:05 AM, Dmitriy Setrakyan <
>>> >>> dsetrak...@apache.org> wrote:
>>> >>>
>>> >>>> I am very confused still. Ilya, can you please explain what happens
>>> in
>>> >>>> Cassandra if user calls IgniteCache.putAll(...) method?
>>> >>>>
>>> >>>> In Ignite, if putAll(...) is called, Ignite will make the best
>>> effort to
>>> >>>> execute the update as a batch, in which case the performance is
>>> better.
>>> >>>> What is the analogy in Cassandra?
>>> >>>>
>>> >>>> D.
>>> >>>>
>>> >>>> On Tue, Jul 26, 2016 at 9:16 PM, Igor Rudyak <irud...@gmail.com>
>>> wrote:
>>> >>>>
>>> >>>> > Dmitriy,
>>> >>>> >
>>> >>>> > There is absolutely same approach for all async read/write/delete
>>> >>>> > operations - Cassandra session just provides
>>> executeAsync(statement)
>>> >>>> > function
>>> >>>> > for all type of operations.
>>> >>>> >
>>> >>>> > To be more detailed about Cassandra batches, there are actually
>>> two
>>> >>>> types
>>> >>>> > of batches:
>>> >>>> >
>>> >>>> > 1) *Logged batch* (aka atomic) - the main purpose of such batches
>>> is
>>> >>>> to
>>> >>>> > keep duplicated data in sync while updating multiple tables, but
>>> at
>>> >>>> the
>>> >>>> > cost of performance.
>>> >>>> >
>>> >>>> > 2) *Unlogged batch* - the only specific case for such batch is
>>> when
>>> >>>> all
>>> >>>> > updates are addressed to only *one* partition key and batch having
>>> >>>> > "*reasonable
>>> >>>> > size*". In a such situation there *could be* performance benefits
>>> if
>>> >>>> you
>>> >>>> > are using Cassandra *TokenAware* load balancing policy. In this
>>> >>>> particular
>>> >>>> > case all the updates will go directly without any additional
>>> >>>> > coordination to the primary node, which is responsible for storing
>>> >>>> data for
>>> >>>> > this partition key.
>>> >>>> >
>>> >>>> > The *generic rule* is that - *individual updates using async mode*
>>> >>>> provides
>>> >>>> > the best performance (
>>> >>>> > https://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html
>>> ).
>>> >>>> That's
>>> >>>> > because it spread all updates across the whole cluster. In
>>> contrast to
>>> >>>> > this, when you are using batches, what this is actually doing is
>>> >>>> putting a
>>> >>>> > huge amount of pressure on a single coordinator node. This is
>>> because
>>> >>>> the
>>> >>>> > coordinator needs to forward each individual insert/update/delete
>>> to
>>> >>>> the
>>> >>>> > correct replicas. In general you're just losing all the benefit of
>>> >>>> > Cassandra TokenAware load balancing policy when you're updating
>>> >>>> different
>>> >>>> > partitions in a single round trip to the database.
>>> >>>> >
>>> >>>> > Probably the only enhancement which could be done is to separate
>>> our
>>> >>>> batch
>>> >>>> > to smaller batches, each of which is updating records having the
>>> same
>>> >>>> > partition key. In this case it could provide some performance
>>> >>>> benefits when
>>> >>>> > used in combination with Cassandra TokenAware policy. But there
>>> are
>>> >>>> several
>>> >>>> > concerns:
>>> >>>> >
>>> >>>> > 1) It looks like rather rare case
>>> >>>> > 2) Makes error handling more complex - you just don't know what
>>> >>>> operations
>>> >>>> > in a batch succeed and what failed and need to retry all batch
>>> >>>> > 3) Retry logic could produce more load on the cluster - in case of
>>> >>>> > individual updates you just need to retry the only mutations
>>> which are
>>> >>>> > failed, in case of batches you need to retry the whole batch
>>> >>>> > 4)* Unlogged batch is deprecated in Cassandra 3.0* (
>>> >>>> >
>>> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/batch_r.html),
>>> >>>> > which
>>> >>>> > we are currently using for Ignite Cassandra module.
>>> >>>> >
>>> >>>> >
>>> >>>> > Igor Rudyak
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> > On Tue, Jul 26, 2016 at 4:45 PM, Dmitriy Setrakyan <
>>> >>>> dsetrak...@apache.org>
>>> >>>> > wrote:
>>> >>>> >
>>> >>>> > >
>>> >>>> > >
>>> >>>> > > On Tue, Jul 26, 2016 at 5:53 PM, Igor Rudyak <irud...@gmail.com
>>> >
>>> >>>> wrote:
>>> >>>> > >
>>> >>>> > >> Hi Valentin,
>>> >>>> > >>
>>> >>>> > >> For writeAll/readAll Cassandra cache store implementation uses
>>> >>>> async
>>> >>>> > >> operations (
>>> >>>> http://www.datastax.com/dev/blog/java-driver-async-queries)
>>> >>>> > >> and
>>> >>>> > >> futures, which has the best characteristics in terms of
>>> >>>> performance.
>>> >>>> > >>
>>> >>>> > >>
>>> >>>> > > Thanks, Igor. This link describes the query operations, but I
>>> could
>>> >>>> not
>>> >>>> > > find the mention of writes.
>>> >>>> > >
>>> >>>> > >
>>> >>>> > >> Cassandra BATCH statement is actually quite often anti-pattern
>>> for
>>> >>>> those
>>> >>>> > >> who come from relational world. BATCH statement concept in
>>> >>>> Cassandra is
>>> >>>> > >> totally different from relational world and is not for
>>> optimizing
>>> >>>> > >> batch/bulk operations. The main purpose of Cassandra BATCH is
>>> to
>>> >>>> keep
>>> >>>> > >> denormalized data in sync. For example when you duplicating the
>>> >>>> same
>>> >>>> > data
>>> >>>> > >> into several tables. All other cases are not recommended for
>>> >>>> Cassandra
>>> >>>> > >> batches:
>>> >>>> > >>  -
>>> >>>> > >>
>>> >>>> > >>
>>> >>>> >
>>> >>>>
>>> https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.k4xfir8ij
>>> >>>> > >>  -
>>> >>>> > >>
>>> >>>> > >>
>>> >>>> >
>>> >>>>
>>> http://christopher-batey.blogspot.com/2015/02/cassandra-anti-pattern-misuse-of.html
>>> >>>> > >>  -
>>> >>>>
>>> https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/
>>> >>>> > >>
>>> >>>> > >> It's also good to mention that in CassandraCacheStore
>>> >>>> implementation
>>> >>>> > >> (actually in CassandraSessionImpl) all operation with
>>> Cassandra is
>>> >>>> > wrapped
>>> >>>> > >> in a loop. The reason is in a case of failure it will be
>>> performed
>>> >>>> 20
>>> >>>> > >> attempts to retry the operation with incrementally increasing
>>> >>>> timeouts
>>> >>>> > >> starting from 100ms and specific exception handling logic
>>> >>>> (Cassandra
>>> >>>> > hosts
>>> >>>> > >> unavailability and etc.). Thus it provides quite reliable
>>> >>>> persistence
>>> >>>> > >> mechanism. According to load tests, even on heavily overloaded
>>> >>>> Cassandra
>>> >>>> > >> cluster (CPU LOAD > 10 per one core) there were no lost
>>> >>>> > >> writes/reads/deletes and maximum 6 attempts to perform one
>>> >>>> operation.
>>> >>>> > >>
>>> >>>> > >
>>> >>>> > > I think that the main point about Cassandra batch operations is
>>> not
>>> >>>> about
>>> >>>> > > reliability, but about performance. If user batches up 100s of
>>> >>>> updates
>>> >>>> > in 1
>>> >>>> > > Cassandra batch, then it will be a lot faster than doing them
>>> >>>> 1-by-1 in
>>> >>>> > > Ignite. Wrapping them into Ignite "putAll(...)" call just seems
>>> more
>>> >>>> > > logical to me, no?
>>> >>>> > >
>>> >>>> > >
>>> >>>> > >>
>>> >>>> > >> Igor Rudyak
>>> >>>> > >>
>>> >>>> > >> On Tue, Jul 26, 2016 at 1:58 PM, Valentin Kulichenko <
>>> >>>> > >> valentin.kuliche...@gmail.com> wrote:
>>> >>>> > >>
>>> >>>> > >> > Hi Igor,
>>> >>>> > >> >
>>> >>>> > >> > I noticed that current Cassandra store implementation doesn't
>>> >>>> support
>>> >>>> > >> > batching for writeAll and deleteAll methods, it simply
>>> executes
>>> >>>> all
>>> >>>> > >> updates
>>> >>>> > >> > one by one (asynchronously in parallel).
>>> >>>> > >> >
>>> >>>> > >> > I think it can be useful to provide such support and created
>>> a
>>> >>>> ticket
>>> >>>> > >> [1].
>>> >>>> > >> > Can you please give your input on this? Does it make sense in
>>> >>>> your
>>> >>>> > >> opinion?
>>> >>>> > >> >
>>> >>>> > >> > [1] https://issues.apache.org/jira/browse/IGNITE-3588
>>> >>>> > >> >
>>> >>>> > >> > -Val
>>> >>>> > >> >
>>> >>>> > >>
>>> >>>> > >
>>> >>>> > >
>>> >>>> >
>>> >>>>
>>> >>>
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>

Re: Batch support in Cassandra store

Reply via email to