Hi Igor, 1) Yes, I'm talking about splitting the entry set into per-partition (or per-node) batches. Having entries that are stores on different nodes in the same batch doesn't make much sense, of course.
2) RAMP looks interesting, but it seems to be a pretty complicated task. How about adding the support for built-in logged batches (this should be fairly easy to implement) and then improve the atomicity as a second phase? -Val On Fri, Jul 29, 2016 at 5:19 PM, Igor Rudyak <irud...@gmail.com> wrote: > Hi Valentin, > > 1) According unlogged batches I think it doesn't make sense to support > them, cause: > - They are deprecated starting from Cassandra 3.0 (which we are currently > using in Cassandra module) > - According to Cassandra documentation ( > http://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html) "Batches > are often mistakenly used in an attempt to optimize performance". Cassandra > guys saying that no batches ( > https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.rxkmfe209) > is the fastest way to load data. I checked it with the batches having > records with different partition keys and it's definitely true. For small > batch of records having all the same partition key (affinity in Ignite) > they could provide better performance, but I didn't investigated this case > deeply (what is the optimal size of a batch, how significantly is the > performance benefits and etc.) Can try to do some load tests to have better > understanding of this. > > 2) Regarding logged batches I think that it makes sense to support them in > Cassandra module for transactional caches. The bad thing is that they don't > provide isolation, the good thing is they guaranty that all your changes > will be eventually committed and visible to clients. Thus it's still better > than nothing... However there is a better approach for this. We can > implement transactional protocol on top of Cassandra, which will give us > atomic read isolation - you'll either see all the changes made by > transaction or none of them. For example we can implement RAMP transactions( > http://www.bailis.org/papers/ramp-sigmod2014.pdf) cause it provides > rather low overhead. > > Igor Rudyak > > On Thu, Jul 28, 2016 at 11:00 PM, Valentin Kulichenko < > valentin.kuliche...@gmail.com> wrote: > >> Hi Igor, >> >> I'm not a big Cassandra expert, but here are my thoughts. >> >> 1. Sending updates in a batch is always better than sending them one by >> one. For example, if you do putAll in Ignite with 100 entries, and these >> entries are split across 5 nodes, the client will send 5 requests instead >> of 100. This provides significant performance improvement. Is there a way >> to use similar approach in Cassandra? >> 2. As for logged batches, I can easily believe that this is a rarely used >> feature, but since it exists in Cassandra, I can't find a single reason why >> not to support it in our store as an option. Users that come across those >> rare cases, will only say thank you to us :) >> >> What do you think? >> >> -Val >> >> On Thu, Jul 28, 2016 at 10:41 PM, Igor Rudyak <irud...@gmail.com> wrote: >> >>> There are actually some cases when atomic read isolation in Cassandra >>> could >>> be important. Lets assume batch was persisted in Cassandra, but not >>> finalized yet - read operation from Cassandra returns us only partially >>> committed data of the batch. In the such situation we have problems when: >>> >>> 1) Some of the batch records already expired from Ignite cache and we >>> reading them from persistent store (Cassandra in our case). >>> >>> 2) All Ignite nodes storing the batch records (or subset records) died >>> (or >>> for example became unavailable for 10sec because of network problem). >>> While >>> reading such records from Ignite cache we will be redirected to >>> persistent >>> store. >>> >>> 3) Network separation occurred such a way that we now have two Ignite >>> cluster, but all the replicas of the batch data are located only in one >>> of >>> these clusters. Again while reading such records from Ignite cache on the >>> second cluster we will be redirected to persistent store. >>> >>> In all mentioned cases, if Cassandra batch isn't finalized yet - we will >>> read partially committed transaction data. >>> >>> >>> On Thu, Jul 28, 2016 at 6:52 AM, Luiz Felipe Trevisan < >>> luizfelipe.trevi...@gmail.com> wrote: >>> >>> > I totally agree with you regarding the guarantees we have with logged >>> > batches and I'm also pretty much aware of the performance penalty >>> involved >>> > using this solution. >>> > >>> > But since all read operations are executed via ignite it means that >>> > isolation in the Cassandra level is not really important. I think the >>> only >>> > guarantee really needed is that we don't end up with a partial insert >>> in >>> > Cassandra in case we have a failure in ignite and we loose the node >>> that >>> > was responsible for this write operation. >>> > >>> > My other assumption is that the write operation needs to finish before >>> an >>> > eviction happens for this entry and we loose the data in cache (since >>> batch >>> > doesn't guarantee isolation). However if we cannot achieve this I >>> don't see >>> > why use ignite as a cache store. >>> > >>> > Luiz >>> > >>> > -- >>> > Luiz Felipe Trevisan >>> > >>> > On Wed, Jul 27, 2016 at 4:55 PM, Igor Rudyak <irud...@gmail.com> >>> wrote: >>> > >>> >> Hi Luiz, >>> >> >>> >> Logged batches is not the solution to achieve atomic view of your >>> Ignite >>> >> transaction changes in Cassandra. >>> >> >>> >> The problem with logged batches(aka atomic) is they guarantees that if >>> >> any part of the batch succeeds, all of it will, no other transactional >>> >> enforcement is done at the batch level. For example, there is no batch >>> >> isolation. Clients are able to read the first updated rows from the >>> batch, >>> >> while other rows are still being updated on the server (in RDBMS >>> >> terminology it means *READ-UNCOMMITED* isolation level). Thus >>> Cassandra >>> >>> >> mean "atomic" in the database sense that if any part of the batch >>> succeeds, >>> >> all of it will. >>> >> >>> >> Probably the best way to archive read atomic isolation for Ignite >>> >> transaction persisting data into Cassandra, is to implement RAMP >>> >> transactions (http://www.bailis.org/papers/ramp-sigmod2014.pdf) on >>> top >>> >> of Cassandra. >>> >> >>> >> I may create a ticket for this if community would like it. >>> >> >>> >> >>> >> Igor Rudyak >>> >> >>> >> >>> >> On Wed, Jul 27, 2016 at 12:55 PM, Luiz Felipe Trevisan < >>> >> luizfelipe.trevi...@gmail.com> wrote: >>> >> >>> >>> Hi Igor, >>> >>> >>> >>> Does it make sense for you using logged batches to guarantee >>> atomicity >>> >>> in Cassandra in cases we are doing a cross cache transaction >>> operation? >>> >>> >>> >>> Luiz >>> >>> >>> >>> -- >>> >>> Luiz Felipe Trevisan >>> >>> >>> >>> On Wed, Jul 27, 2016 at 2:05 AM, Dmitriy Setrakyan < >>> >>> dsetrak...@apache.org> wrote: >>> >>> >>> >>>> I am very confused still. Ilya, can you please explain what happens >>> in >>> >>>> Cassandra if user calls IgniteCache.putAll(...) method? >>> >>>> >>> >>>> In Ignite, if putAll(...) is called, Ignite will make the best >>> effort to >>> >>>> execute the update as a batch, in which case the performance is >>> better. >>> >>>> What is the analogy in Cassandra? >>> >>>> >>> >>>> D. >>> >>>> >>> >>>> On Tue, Jul 26, 2016 at 9:16 PM, Igor Rudyak <irud...@gmail.com> >>> wrote: >>> >>>> >>> >>>> > Dmitriy, >>> >>>> > >>> >>>> > There is absolutely same approach for all async read/write/delete >>> >>>> > operations - Cassandra session just provides >>> executeAsync(statement) >>> >>>> > function >>> >>>> > for all type of operations. >>> >>>> > >>> >>>> > To be more detailed about Cassandra batches, there are actually >>> two >>> >>>> types >>> >>>> > of batches: >>> >>>> > >>> >>>> > 1) *Logged batch* (aka atomic) - the main purpose of such batches >>> is >>> >>>> to >>> >>>> > keep duplicated data in sync while updating multiple tables, but >>> at >>> >>>> the >>> >>>> > cost of performance. >>> >>>> > >>> >>>> > 2) *Unlogged batch* - the only specific case for such batch is >>> when >>> >>>> all >>> >>>> > updates are addressed to only *one* partition key and batch having >>> >>>> > "*reasonable >>> >>>> > size*". In a such situation there *could be* performance benefits >>> if >>> >>>> you >>> >>>> > are using Cassandra *TokenAware* load balancing policy. In this >>> >>>> particular >>> >>>> > case all the updates will go directly without any additional >>> >>>> > coordination to the primary node, which is responsible for storing >>> >>>> data for >>> >>>> > this partition key. >>> >>>> > >>> >>>> > The *generic rule* is that - *individual updates using async mode* >>> >>>> provides >>> >>>> > the best performance ( >>> >>>> > https://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html >>> ). >>> >>>> That's >>> >>>> > because it spread all updates across the whole cluster. In >>> contrast to >>> >>>> > this, when you are using batches, what this is actually doing is >>> >>>> putting a >>> >>>> > huge amount of pressure on a single coordinator node. This is >>> because >>> >>>> the >>> >>>> > coordinator needs to forward each individual insert/update/delete >>> to >>> >>>> the >>> >>>> > correct replicas. In general you're just losing all the benefit of >>> >>>> > Cassandra TokenAware load balancing policy when you're updating >>> >>>> different >>> >>>> > partitions in a single round trip to the database. >>> >>>> > >>> >>>> > Probably the only enhancement which could be done is to separate >>> our >>> >>>> batch >>> >>>> > to smaller batches, each of which is updating records having the >>> same >>> >>>> > partition key. In this case it could provide some performance >>> >>>> benefits when >>> >>>> > used in combination with Cassandra TokenAware policy. But there >>> are >>> >>>> several >>> >>>> > concerns: >>> >>>> > >>> >>>> > 1) It looks like rather rare case >>> >>>> > 2) Makes error handling more complex - you just don't know what >>> >>>> operations >>> >>>> > in a batch succeed and what failed and need to retry all batch >>> >>>> > 3) Retry logic could produce more load on the cluster - in case of >>> >>>> > individual updates you just need to retry the only mutations >>> which are >>> >>>> > failed, in case of batches you need to retry the whole batch >>> >>>> > 4)* Unlogged batch is deprecated in Cassandra 3.0* ( >>> >>>> > >>> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/batch_r.html), >>> >>>> > which >>> >>>> > we are currently using for Ignite Cassandra module. >>> >>>> > >>> >>>> > >>> >>>> > Igor Rudyak >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > On Tue, Jul 26, 2016 at 4:45 PM, Dmitriy Setrakyan < >>> >>>> dsetrak...@apache.org> >>> >>>> > wrote: >>> >>>> > >>> >>>> > > >>> >>>> > > >>> >>>> > > On Tue, Jul 26, 2016 at 5:53 PM, Igor Rudyak <irud...@gmail.com >>> > >>> >>>> wrote: >>> >>>> > > >>> >>>> > >> Hi Valentin, >>> >>>> > >> >>> >>>> > >> For writeAll/readAll Cassandra cache store implementation uses >>> >>>> async >>> >>>> > >> operations ( >>> >>>> http://www.datastax.com/dev/blog/java-driver-async-queries) >>> >>>> > >> and >>> >>>> > >> futures, which has the best characteristics in terms of >>> >>>> performance. >>> >>>> > >> >>> >>>> > >> >>> >>>> > > Thanks, Igor. This link describes the query operations, but I >>> could >>> >>>> not >>> >>>> > > find the mention of writes. >>> >>>> > > >>> >>>> > > >>> >>>> > >> Cassandra BATCH statement is actually quite often anti-pattern >>> for >>> >>>> those >>> >>>> > >> who come from relational world. BATCH statement concept in >>> >>>> Cassandra is >>> >>>> > >> totally different from relational world and is not for >>> optimizing >>> >>>> > >> batch/bulk operations. The main purpose of Cassandra BATCH is >>> to >>> >>>> keep >>> >>>> > >> denormalized data in sync. For example when you duplicating the >>> >>>> same >>> >>>> > data >>> >>>> > >> into several tables. All other cases are not recommended for >>> >>>> Cassandra >>> >>>> > >> batches: >>> >>>> > >> - >>> >>>> > >> >>> >>>> > >> >>> >>>> > >>> >>>> >>> https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.k4xfir8ij >>> >>>> > >> - >>> >>>> > >> >>> >>>> > >> >>> >>>> > >>> >>>> >>> http://christopher-batey.blogspot.com/2015/02/cassandra-anti-pattern-misuse-of.html >>> >>>> > >> - >>> >>>> >>> https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/ >>> >>>> > >> >>> >>>> > >> It's also good to mention that in CassandraCacheStore >>> >>>> implementation >>> >>>> > >> (actually in CassandraSessionImpl) all operation with >>> Cassandra is >>> >>>> > wrapped >>> >>>> > >> in a loop. The reason is in a case of failure it will be >>> performed >>> >>>> 20 >>> >>>> > >> attempts to retry the operation with incrementally increasing >>> >>>> timeouts >>> >>>> > >> starting from 100ms and specific exception handling logic >>> >>>> (Cassandra >>> >>>> > hosts >>> >>>> > >> unavailability and etc.). Thus it provides quite reliable >>> >>>> persistence >>> >>>> > >> mechanism. According to load tests, even on heavily overloaded >>> >>>> Cassandra >>> >>>> > >> cluster (CPU LOAD > 10 per one core) there were no lost >>> >>>> > >> writes/reads/deletes and maximum 6 attempts to perform one >>> >>>> operation. >>> >>>> > >> >>> >>>> > > >>> >>>> > > I think that the main point about Cassandra batch operations is >>> not >>> >>>> about >>> >>>> > > reliability, but about performance. If user batches up 100s of >>> >>>> updates >>> >>>> > in 1 >>> >>>> > > Cassandra batch, then it will be a lot faster than doing them >>> >>>> 1-by-1 in >>> >>>> > > Ignite. Wrapping them into Ignite "putAll(...)" call just seems >>> more >>> >>>> > > logical to me, no? >>> >>>> > > >>> >>>> > > >>> >>>> > >> >>> >>>> > >> Igor Rudyak >>> >>>> > >> >>> >>>> > >> On Tue, Jul 26, 2016 at 1:58 PM, Valentin Kulichenko < >>> >>>> > >> valentin.kuliche...@gmail.com> wrote: >>> >>>> > >> >>> >>>> > >> > Hi Igor, >>> >>>> > >> > >>> >>>> > >> > I noticed that current Cassandra store implementation doesn't >>> >>>> support >>> >>>> > >> > batching for writeAll and deleteAll methods, it simply >>> executes >>> >>>> all >>> >>>> > >> updates >>> >>>> > >> > one by one (asynchronously in parallel). >>> >>>> > >> > >>> >>>> > >> > I think it can be useful to provide such support and created >>> a >>> >>>> ticket >>> >>>> > >> [1]. >>> >>>> > >> > Can you please give your input on this? Does it make sense in >>> >>>> your >>> >>>> > >> opinion? >>> >>>> > >> > >>> >>>> > >> > [1] https://issues.apache.org/jira/browse/IGNITE-3588 >>> >>>> > >> > >>> >>>> > >> > -Val >>> >>>> > >> > >>> >>>> > >> >>> >>>> > > >>> >>>> > > >>> >>>> > >>> >>>> >>> >>> >>> >>> >>> >> >>> > >>> >> >> >