Hi Igor, Does it make sense for you using logged batches to guarantee atomicity in Cassandra in cases we are doing a cross cache transaction operation?
Luiz -- Luiz Felipe Trevisan On Wed, Jul 27, 2016 at 2:05 AM, Dmitriy Setrakyan <dsetrak...@apache.org> wrote: > I am very confused still. Ilya, can you please explain what happens in > Cassandra if user calls IgniteCache.putAll(...) method? > > In Ignite, if putAll(...) is called, Ignite will make the best effort to > execute the update as a batch, in which case the performance is better. > What is the analogy in Cassandra? > > D. > > On Tue, Jul 26, 2016 at 9:16 PM, Igor Rudyak <irud...@gmail.com> wrote: > > > Dmitriy, > > > > There is absolutely same approach for all async read/write/delete > > operations - Cassandra session just provides executeAsync(statement) > > function > > for all type of operations. > > > > To be more detailed about Cassandra batches, there are actually two types > > of batches: > > > > 1) *Logged batch* (aka atomic) - the main purpose of such batches is to > > keep duplicated data in sync while updating multiple tables, but at the > > cost of performance. > > > > 2) *Unlogged batch* - the only specific case for such batch is when all > > updates are addressed to only *one* partition key and batch having > > "*reasonable > > size*". In a such situation there *could be* performance benefits if you > > are using Cassandra *TokenAware* load balancing policy. In this > particular > > case all the updates will go directly without any additional > > coordination to the primary node, which is responsible for storing data > for > > this partition key. > > > > The *generic rule* is that - *individual updates using async mode* > provides > > the best performance ( > > https://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html). > That's > > because it spread all updates across the whole cluster. In contrast to > > this, when you are using batches, what this is actually doing is putting > a > > huge amount of pressure on a single coordinator node. This is because the > > coordinator needs to forward each individual insert/update/delete to the > > correct replicas. In general you're just losing all the benefit of > > Cassandra TokenAware load balancing policy when you're updating different > > partitions in a single round trip to the database. > > > > Probably the only enhancement which could be done is to separate our > batch > > to smaller batches, each of which is updating records having the same > > partition key. In this case it could provide some performance benefits > when > > used in combination with Cassandra TokenAware policy. But there are > several > > concerns: > > > > 1) It looks like rather rare case > > 2) Makes error handling more complex - you just don't know what > operations > > in a batch succeed and what failed and need to retry all batch > > 3) Retry logic could produce more load on the cluster - in case of > > individual updates you just need to retry the only mutations which are > > failed, in case of batches you need to retry the whole batch > > 4)* Unlogged batch is deprecated in Cassandra 3.0* ( > > https://docs.datastax.com/en/cql/3.3/cql/cql_reference/batch_r.html), > > which > > we are currently using for Ignite Cassandra module. > > > > > > Igor Rudyak > > > > > > > > On Tue, Jul 26, 2016 at 4:45 PM, Dmitriy Setrakyan < > dsetrak...@apache.org> > > wrote: > > > > > > > > > > > On Tue, Jul 26, 2016 at 5:53 PM, Igor Rudyak <irud...@gmail.com> > wrote: > > > > > >> Hi Valentin, > > >> > > >> For writeAll/readAll Cassandra cache store implementation uses async > > >> operations ( > http://www.datastax.com/dev/blog/java-driver-async-queries) > > >> and > > >> futures, which has the best characteristics in terms of performance. > > >> > > >> > > > Thanks, Igor. This link describes the query operations, but I could not > > > find the mention of writes. > > > > > > > > >> Cassandra BATCH statement is actually quite often anti-pattern for > those > > >> who come from relational world. BATCH statement concept in Cassandra > is > > >> totally different from relational world and is not for optimizing > > >> batch/bulk operations. The main purpose of Cassandra BATCH is to keep > > >> denormalized data in sync. For example when you duplicating the same > > data > > >> into several tables. All other cases are not recommended for Cassandra > > >> batches: > > >> - > > >> > > >> > > > https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.k4xfir8ij > > >> - > > >> > > >> > > > http://christopher-batey.blogspot.com/2015/02/cassandra-anti-pattern-misuse-of.html > > >> - > https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/ > > >> > > >> It's also good to mention that in CassandraCacheStore implementation > > >> (actually in CassandraSessionImpl) all operation with Cassandra is > > wrapped > > >> in a loop. The reason is in a case of failure it will be performed 20 > > >> attempts to retry the operation with incrementally increasing timeouts > > >> starting from 100ms and specific exception handling logic (Cassandra > > hosts > > >> unavailability and etc.). Thus it provides quite reliable persistence > > >> mechanism. According to load tests, even on heavily overloaded > Cassandra > > >> cluster (CPU LOAD > 10 per one core) there were no lost > > >> writes/reads/deletes and maximum 6 attempts to perform one operation. > > >> > > > > > > I think that the main point about Cassandra batch operations is not > about > > > reliability, but about performance. If user batches up 100s of updates > > in 1 > > > Cassandra batch, then it will be a lot faster than doing them 1-by-1 in > > > Ignite. Wrapping them into Ignite "putAll(...)" call just seems more > > > logical to me, no? > > > > > > > > >> > > >> Igor Rudyak > > >> > > >> On Tue, Jul 26, 2016 at 1:58 PM, Valentin Kulichenko < > > >> valentin.kuliche...@gmail.com> wrote: > > >> > > >> > Hi Igor, > > >> > > > >> > I noticed that current Cassandra store implementation doesn't > support > > >> > batching for writeAll and deleteAll methods, it simply executes all > > >> updates > > >> > one by one (asynchronously in parallel). > > >> > > > >> > I think it can be useful to provide such support and created a > ticket > > >> [1]. > > >> > Can you please give your input on this? Does it make sense in your > > >> opinion? > > >> > > > >> > [1] https://issues.apache.org/jira/browse/IGNITE-3588 > > >> > > > >> > -Val > > >> > > > >> > > > > > > > > >