Re: Batch support in Cassandra store

Luiz Felipe Trevisan Wed, 27 Jul 2016 12:56:29 -0700

Hi Igor,

Does it make sense for you using logged batches to guarantee atomicity in
Cassandra in cases we are doing a cross cache transaction operation?


Luiz

--
Luiz Felipe Trevisan

On Wed, Jul 27, 2016 at 2:05 AM, Dmitriy Setrakyan <dsetrak...@apache.org>
wrote:

> I am very confused still. Ilya, can you please explain what happens in
> Cassandra if user calls IgniteCache.putAll(...) method?
>
> In Ignite, if putAll(...) is called, Ignite will make the best effort to
> execute the update as a batch, in which case the performance is better.
> What is the analogy in Cassandra?
>
> D.
>
> On Tue, Jul 26, 2016 at 9:16 PM, Igor Rudyak <irud...@gmail.com> wrote:
>
> > Dmitriy,
> >
> > There is absolutely same approach for all async read/write/delete
> > operations - Cassandra session just provides executeAsync(statement)
> > function
> > for all type of operations.
> >
> > To be more detailed about Cassandra batches, there are actually two types
> > of batches:
> >
> > 1) *Logged batch* (aka atomic) - the main purpose of such batches is to
> > keep duplicated data in sync while updating multiple tables, but at the
> > cost of performance.
> >
> > 2) *Unlogged batch* - the only specific case for such batch is when all
> > updates are addressed to only *one* partition key and batch having
> > "*reasonable
> > size*". In a such situation there *could be* performance benefits if you
> > are using Cassandra *TokenAware* load balancing policy. In this
> particular
> > case all the updates will go directly without any additional
> > coordination to the primary node, which is responsible for storing data
> for
> > this partition key.
> >
> > The *generic rule* is that - *individual updates using async mode*
> provides
> > the best performance (
> > https://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html).
> That's
> > because it spread all updates across the whole cluster. In contrast to
> > this, when you are using batches, what this is actually doing is putting
> a
> > huge amount of pressure on a single coordinator node. This is because the
> > coordinator needs to forward each individual insert/update/delete to the
> > correct replicas. In general you're just losing all the benefit of
> > Cassandra TokenAware load balancing policy when you're updating different
> > partitions in a single round trip to the database.
> >
> > Probably the only enhancement which could be done is to separate our
> batch
> > to smaller batches, each of which is updating records having the same
> > partition key. In this case it could provide some performance benefits
> when
> > used in combination with Cassandra TokenAware policy. But there are
> several
> > concerns:
> >
> > 1) It looks like rather rare case
> > 2) Makes error handling more complex - you just don't know what
> operations
> > in a batch succeed and what failed and need to retry all batch
> > 3) Retry logic could produce more load on the cluster - in case of
> > individual updates you just need to retry the only mutations which are
> > failed, in case of batches you need to retry the whole batch
> > 4)* Unlogged batch is deprecated in Cassandra 3.0* (
> > https://docs.datastax.com/en/cql/3.3/cql/cql_reference/batch_r.html),
> > which
> > we are currently using for Ignite Cassandra module.
> >
> >
> > Igor Rudyak
> >
> >
> >
> > On Tue, Jul 26, 2016 at 4:45 PM, Dmitriy Setrakyan <
> dsetrak...@apache.org>
> > wrote:
> >
> > >
> > >
> > > On Tue, Jul 26, 2016 at 5:53 PM, Igor Rudyak <irud...@gmail.com>
> wrote:
> > >
> > >> Hi Valentin,
> > >>
> > >> For writeAll/readAll Cassandra cache store implementation uses async
> > >> operations (
> http://www.datastax.com/dev/blog/java-driver-async-queries)
> > >> and
> > >> futures, which has the best characteristics in terms of performance.
> > >>
> > >>
> > > Thanks, Igor. This link describes the query operations, but I could not
> > > find the mention of writes.
> > >
> > >
> > >> Cassandra BATCH statement is actually quite often anti-pattern for
> those
> > >> who come from relational world. BATCH statement concept in Cassandra
> is
> > >> totally different from relational world and is not for optimizing
> > >> batch/bulk operations. The main purpose of Cassandra BATCH is to keep
> > >> denormalized data in sync. For example when you duplicating the same
> > data
> > >> into several tables. All other cases are not recommended for Cassandra
> > >> batches:
> > >>  -
> > >>
> > >>
> >
> https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.k4xfir8ij
> > >>  -
> > >>
> > >>
> >
> http://christopher-batey.blogspot.com/2015/02/cassandra-anti-pattern-misuse-of.html
> > >>  -
> https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/
> > >>
> > >> It's also good to mention that in CassandraCacheStore implementation
> > >> (actually in CassandraSessionImpl) all operation with Cassandra is
> > wrapped
> > >> in a loop. The reason is in a case of failure it will be performed 20
> > >> attempts to retry the operation with incrementally increasing timeouts
> > >> starting from 100ms and specific exception handling logic (Cassandra
> > hosts
> > >> unavailability and etc.). Thus it provides quite reliable persistence
> > >> mechanism. According to load tests, even on heavily overloaded
> Cassandra
> > >> cluster (CPU LOAD > 10 per one core) there were no lost
> > >> writes/reads/deletes and maximum 6 attempts to perform one operation.
> > >>
> > >
> > > I think that the main point about Cassandra batch operations is not
> about
> > > reliability, but about performance. If user batches up 100s of updates
> > in 1
> > > Cassandra batch, then it will be a lot faster than doing them 1-by-1 in
> > > Ignite. Wrapping them into Ignite "putAll(...)" call just seems more
> > > logical to me, no?
> > >
> > >
> > >>
> > >> Igor Rudyak
> > >>
> > >> On Tue, Jul 26, 2016 at 1:58 PM, Valentin Kulichenko <
> > >> valentin.kuliche...@gmail.com> wrote:
> > >>
> > >> > Hi Igor,
> > >> >
> > >> > I noticed that current Cassandra store implementation doesn't
> support
> > >> > batching for writeAll and deleteAll methods, it simply executes all
> > >> updates
> > >> > one by one (asynchronously in parallel).
> > >> >
> > >> > I think it can be useful to provide such support and created a
> ticket
> > >> [1].
> > >> > Can you please give your input on this? Does it make sense in your
> > >> opinion?
> > >> >
> > >> > [1] https://issues.apache.org/jira/browse/IGNITE-3588
> > >> >
> > >> > -Val
> > >> >
> > >>
> > >
> > >
> >
>

Re: Batch support in Cassandra store

Reply via email to