Re: batch_size_warn_threshold_in_kb

Jonathan Haddad Mon, 15 Dec 2014 19:00:08 -0800

You are, of course, free to use batches in your application.  Keep in mind
however, that both my and Ryan's advice is coming from debugging issues in
production.  I don't know why your Scala script is performing better on
batches than async.  It could be:


1) network.  are you running the test script on your laptop and connecting
to cluster over WAN?  If so, I would not be shocked if batch was faster
since your latency is going to be crazy high.

2) is the system under any other load?  I'd love to see the results of the
tests while cassandra stress was running.  This is a step closer to
production where you have to worry about such things

3) The logic for doing async queries may be incorrect.
a) Are you just throwing all the queries at once against the cluster?  If
so, I'd love to see what's happening with GC.  Typically in a real workload
you'd be
b) Are you keeping the servers busy?  If you're calling wait() on a group
of futures, you're now blocking requests from being submitted and limiting
the throughput.

4) you're still only using 3 servers.  The horror of using batches
increases linearly as you add servers.

5) What exactly are you summing in the end?  The total real time taken, or
an aggregation of the async query times?  If it's the async query times
that's going to be pretty misleading (and incorrect).  Again, my Scala is
terrible so I could be reading it wrong.

Sorry I don't have more time to debug the script.  Any of the above ideas
apply?

Jon

On Mon Dec 15 2014 at 1:11:43 PM Eric Stevens <migh...@gmail.com> wrote:

> > Unfortunately my Scala isn't the best so I'm going to have to take a
> little bit to wade through the code.
>
> I think the important thing to take from this code is that:
>
> 1) execution order is randomized for each run, and new data is randomly
> generated for each run to eliminate biases.
> 2) we write to five different key layouts in an attempt to eliminate bias
> from some poorly chosen scheme, we test both clustering and non-clustering
> approaches
> 3) We can fork *just* on batch-vs-single strategy (see
> https://gist.github.com/MightyE/1c98912fca104f6138fc/
> a7db68e72f99ac1215fcfb096d69391ee285c080#file-testsuite-L167-L180 )
> thanks to the DS driver having a common executable ancestor between them
> (an extremely nice feature)
> 4) We test three different parallelism strategies to eliminate bias from a
> poorly chosen concurrency model (see https://gist.github.com/
> MightyE/1c98912fca104f6138fc/a7db68e72f99ac1215fcfb096d6939
> 1ee285c080#file-testsuite-L181-L203 )
> 5) The code path is identical wherever possible between strategies.
> 6) Principally this just sets up an Iterable of Statement (sometimes
> members are batches, sometimes members are single statements), and times
> how long they take to execute and complete with different concurrency
> models.
>
> *RE: Cassandra-Stress*
> > It may be useful to run cassandra-stress (it doesn't seem to have a mode
> for batches) to get a baseline on non-batches.  I'm curious to know if you
> get different numbers than the scala profiler.
>
> We always use SSL for everything, and I've struggled to get
> cassandra-stress to talk to our SSL cluster.  Just so I don't keep spinning
> my wheels on a temporary effort, I used CCM to stand up a 2.0.11 cluster
> locally, and ran both tools against here.  I'm dubious about what you can
> infer from such a test because it's not apples to apples (they write
> different data).
>
> Nevertheless, here is the output of "ccm stress" against my local machine
> - I inserted 113,825 records in 62 seconds, and used this data size to
> drive my tool:
>
> Created keyspaces. Sleeping 3s for propagation.
> total   interval_op_rate  interval_key_rate  latency  95th   99.9th
>  elapsed_time
> 11271   1127              1127               8.9      144.7  401.1   10
> 27998   1672              1672               9.5      140.5  399.4   20
> 42189   1419              1419,              9.3      148.0  494.5   31
> 59335   1714              1714               9.3      147.0  493.2   41
> 84957   2562              2562               6.1      137.1  493.3   51
> 113825  2886              2886               5.1      131.5  493.3   62
>
>
> After a ccm clear && ccm start , here's my tool this same local cluster
> (note that I'm actually writing a total of 5x the records because I write
> the same data to each of 5 tables).  My little local cluster just about
> brought down my machine under this test (especially the second one).
>
> ==== Execution Results for 1 runs of 113825 records =============
> 1 runs of 113,825 records (3 protos, 5 agents, ~15 per bucket) as single
> statements
> Total Run Time
> traverse test2 ((aid, bckt), end)                             =
> 25,488,179,000
> traverse test4 ((aid, bckt), proto, end) no explicit ordering =
> 25,497,183,000
> traverse test5 ((aid, bckt, end))                             =
> 25,529,444,000
> traverse test3 ((aid, bckt), end, proto) reverse order        =
> 31,495,348,000
> traverse test1 ((aid, bckt), proto, end) reverse order        =
> 33,686,013,000
>
> ==== Execution Results for 1 runs of 113825 records =============
> 1 runs of 113,825 records (3 protos, 5 agents, ~15 per bucket) in batches
> of 10
> Total Run Time
> traverse test3 ((aid, bckt), end, proto) reverse order        =
> 11,030,788,000
> traverse test1 ((aid, bckt), proto, end) reverse order        =
> 13,345,962,000
> traverse test2 ((aid, bckt), end)                             =
> 15,110,208,000
> traverse test4 ((aid, bckt), proto, end) no explicit ordering =
> 16,398,982,000
> traverse test5 ((aid, bckt, end))                             =
> 22,166,119,000
>
> For giggles I added token aware batching (grouping statements within a
> single batch by meta.getReplicas(statement.getKeyspace,
> statement.getRoutingKey).iterator().next - see https://gist.github.com/
> MightyE/1c98912fca104f6138fc#file-testsuite-L176-L189 ), here's that run;
> comparable results with before, and easily inside one sigma of
> non-token-aware batching, so not a statistically significant difference.
>
> ==== Execution Results for 1 runs of 113825 records =============
> 1 runs of 113,825 records (3 protos, 5 agents, ~15 per bucket) in batches
> of 10
> Total Run Time
> traverse test2 ((aid, bckt), end)                             =
> 11,429,008,000
> traverse test1 ((aid, bckt), proto, end) reverse order        =
> 12,593,034,000
> traverse test4 ((aid, bckt), proto, end) no explicit ordering =
> 13,111,244,000
> traverse test3 ((aid, bckt), end, proto) reverse order        =
> 25,163,064,000
> traverse test5 ((aid, bckt, end))                             =
> 30,233,744,000
>
>
>
> On Sat, Dec 13, 2014 at 11:07 AM, Jonathan Haddad <j...@jonhaddad.com>
> wrote:
>
>>
>>
>> On Sat Dec 13 2014 at 10:00:16 AM Eric Stevens <migh...@gmail.com> wrote:
>>
>>> Isn't the net effect of coordination overhead incurred by batches
>>> basically the same as the overhead incurred by RoundRobin or other
>>> non-token-aware request routing?  As the cluster size increases, each node
>>> would coordinate the same percentage of writes in batches under token
>>> awareness as they would under a more naive single statement routing
>>> strategy.  If write volume per time unit is the same in both approaches,
>>> each node ends up coordinating the majority of writes under either strategy
>>> as the cluster grows.
>>>
>>
>> If you're not token aware, there's extra coordinator overhead, yes.  If
>> you are token aware, not the case.  I'm operating under the assumption that
>> you'd want to be token aware, since I don't see a point in not doing so :)
>>
>> Unfortunately my Scala isn't the best so I'm going to have to take a
>> little bit to wade through the code.
>>
>> It may be useful to run cassandra-stress (it doesn't seem to have a mode
>> for batches) to get a baseline on non-batches.  I'm curious to know if you
>> get different numbers than the scala profiler.
>>
>>
>>
>>>
>>> GC pressure in the cluster is a concern of course, as you observe.  But
>>> delta performance is *substantial* from what I can see.  As in the case
>>> where you're bumping up against retries, this will cause you to fall over
>>> much more rapidly as you approach your tipping point, but in a healthy
>>> cluster, it's the same write volume, just a longer tenancy in eden.  If
>>> reasonable sized batches are causing survivors, you're not far off from
>>> falling over anyway.
>>>
>>> On Sat, Dec 13, 2014 at 10:04 AM, Jonathan Haddad <j...@jonhaddad.com>
>>> wrote:
>>>
>>>> One thing to keep in mind is the overhead of a batch goes up as the
>>>> number of servers increases.  Talking to 3 is going to have a much
>>>> different performance profile than talking to 20.  Keep in mind that the
>>>> coordinator is going to be talking to every server in the cluster with a
>>>> big batch.  The amount of local writes will decrease as it owns a smaller
>>>> portion of the ring.  All you've done is add an extra network hop between
>>>> your client and where the data should actually be.  You also start to have
>>>> an impact on GC in a very negative way.
>>>>
>>>> Your point is valid about topology changes, but that's a relatively
>>>> rare occurrence, and the driver is notified pretty quickly, so I wouldn't
>>>> optimize for that case.
>>>>
>>>> Can you post your test code in a gist or something?  I can't really
>>>> talk about your benchmark without seeing it and you're basing your stance
>>>> on the premise that it is correct, which it may not be.
>>>>
>>>>
>>>>
>>>> On Sat Dec 13 2014 at 8:45:21 AM Eric Stevens <migh...@gmail.com>
>>>> wrote:
>>>>
>>>>> You can seen what the partition key strategies are for each of the
>>>>> tables, test5 shows the least improvement.  The set (aid, end) should be
>>>>> unique, and bckt is derived from end.  Some of these layouts result in
>>>>> clustering on the same partition keys, that's actually tunable with the
>>>>> "~15 per bucket" reported (exact number of entries per bucket will vary 
>>>>> but
>>>>> should have a mean of 15 in that run - it's an input parameter to my
>>>>> tests).  "test5" obviously ends up being exclusively unique partitions for
>>>>> each record.
>>>>>
>>>>> Your points about:
>>>>> 1) Failed batches having a higher cost than failed single statements
>>>>> 2) In my test, every node was a replica for all data.
>>>>>
>>>>> These are both very good points.
>>>>>
>>>>> For #1, since the worst case scenario is nearly twice fast in batches
>>>>> as its single statement equivalent, in terms of impact on the client, 
>>>>> you'd
>>>>> have to be retrying half your batches before you broke even there (but of
>>>>> course those retries are not free to the cluster, so you probably make the
>>>>> performance tipping point approach a lot faster).  This alone may be cause
>>>>> to justify avoiding batches, or at least severely limiting their size 
>>>>> (hey,
>>>>> that's what this discussion is about!).
>>>>>
>>>>> For #2, that's certainly a good point, for this test cluster, I should
>>>>> at least re-run with RF=1 so that proxying times start to matter.  If
>>>>> you're not using a token aware client or not using a token aware policy 
>>>>> for
>>>>> whatever reason, this should even out though, no?  Each node will end up
>>>>> coordinating 1/(nodecount-rf+1) mutations, regardless of whether they are
>>>>> batched or single statements.  The DS driver is very careful to caution
>>>>> that the topology map it maintains makes no guarantees on freshness, so 
>>>>> you
>>>>> may see a significant performance penalty in your client when the topology
>>>>> changes if you're depending on token aware routing as part of your
>>>>> performance requirements.
>>>>>
>>>>>
>>>>> I'm curious what your thoughts are on grouping statements by primary
>>>>> replica according to the routing policy, and executing unlogged batches
>>>>> that way (so that for token aware routing, all statements are executed on 
>>>>> a
>>>>> replica, for others it'd make no difference).  Retries are still more
>>>>> expensive, but token aware proxying avoidance is still had.  It's pretty
>>>>> easy to do in Scala:
>>>>>
>>>>>   def groupByFirstReplica(statements: Iterable[Statement])(implicit
>>>>> session: Session): Map[Host, Seq[Statement]] = {
>>>>>     val meta = session.getCluster.getMetadata
>>>>>     statements.groupBy { st =>
>>>>>       meta.getReplicas(st.getKeyspace, st.getRoutingKey).iterator().
>>>>> next
>>>>>     }
>>>>>   }
>>>>>   val result = 
>>>>> Future.traverse(groupByFirstReplica(statements).values).map(st
>>>>> => newBatch(st).executeAsync())
>>>>>
>>>>>
>>>>> Let me get together my test code, it depends on some existing
>>>>> utilities we use elsewhere, such as implicit conversions between Google 
>>>>> and
>>>>> Scala native futures.  I'll try to put this together in a format that's
>>>>> runnable for you in a Scala REPL console without having to resolve our
>>>>> internal dependencies.  This may not be today though.
>>>>>
>>>>> Also, @Ryan, I don't think that shuffling would make a difference for
>>>>> my above tests since as Jon observed, all my nodes were already replicas
>>>>> there.
>>>>>
>>>>>
>>>>> On Sat, Dec 13, 2014 at 7:37 AM, Ryan Svihla <rsvi...@datastax.com>
>>>>> wrote:
>>>>>
>>>>>> Also..what happens when you turn on shuffle with token aware?
>>>>>> http://www.datastax.com/drivers/java/2.1/com/datastax/
>>>>>> driver/core/policies/TokenAwarePolicy.html
>>>>>>
>>>>>> On Sat, Dec 13, 2014 at 8:21 AM, Jonathan Haddad <j...@jonhaddad.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> To add to Ryan's (extremely valid!) point, your test works because
>>>>>>> the coordinator is always a replica.  Try again using 20 (or 50) nodes.
>>>>>>> Batching works great at RF=N=3 because it always gets to write to local 
>>>>>>> and
>>>>>>> talk to exactly 2 other servers on every request.  Consider what happens
>>>>>>> when the coordinator needs to talk to 100 servers.  It's unnecessary
>>>>>>> overhead on the server side.
>>>>>>>
>>>>>>> To save network overhead, Cassandra 2.1 added support for response
>>>>>>> grouping (see http://www.datastax.com/dev/blog/cassandra-2-1-now-
>>>>>>> over-50-faster) which massively helps performance.  It provides the
>>>>>>> benefit of batches but without the coordinator overhead.
>>>>>>>
>>>>>>> Can you post your benchmark code?
>>>>>>>
>>>>>>> On Sat Dec 13 2014 at 6:10:36 AM Jonathan Haddad <j...@jonhaddad.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> There are cases where it can.  For instance, if you batch multiple
>>>>>>>> mutations to the same partition (and talk to a replica for that 
>>>>>>>> partition)
>>>>>>>> they can reduce network overhead because they're effectively a single
>>>>>>>> mutation in the eye of the cluster.  However, if you're not doing that 
>>>>>>>> (and
>>>>>>>> most people aren't!) you end up putting additional pressure on the
>>>>>>>> coordinator because now it has to talk to several other servers.  If 
>>>>>>>> you
>>>>>>>> have 100 servers, and perform a mutation on 100 partitions, you could 
>>>>>>>> have
>>>>>>>> a coordinator that's
>>>>>>>>
>>>>>>>> 1) talking to every machine in the cluster and
>>>>>>>> b) waiting on a response from a significant portion of them
>>>>>>>>
>>>>>>>> before it can respond success or fail.  Any delay, from GC to a bad
>>>>>>>> disk, can affect the performance of the entire batch.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat Dec 13 2014 at 4:17:33 AM Jack Krupansky <
>>>>>>>> j...@basetechnology.com> wrote:
>>>>>>>>
>>>>>>>>>   Jonathan and Ryan,
>>>>>>>>>
>>>>>>>>> Jonathan says “It is absolutely not going to help you if you're
>>>>>>>>> trying to lump queries together to reduce network & server overhead - 
>>>>>>>>> in
>>>>>>>>> fact it'll do the opposite”, but I would note that the CQL3 spec says 
>>>>>>>>> “
>>>>>>>>> The BATCH statement ... serves several purposes: 1. It saves
>>>>>>>>> network round-trips between the client and the server (and sometimes
>>>>>>>>> between the server coordinator and the replicas) when batching 
>>>>>>>>> multiple
>>>>>>>>> updates.” Is the spec inaccurate? I mean, it seems in conflict with 
>>>>>>>>> your
>>>>>>>>> statement.
>>>>>>>>>
>>>>>>>>> See:
>>>>>>>>> https://cassandra.apache.org/doc/cql3/CQL.html
>>>>>>>>>
>>>>>>>>> I see the spec as gospel – if it’s not accurate, let’s propose a
>>>>>>>>> change to make it accurate.
>>>>>>>>>
>>>>>>>>> The DataStax CQL doc is more nuanced: “Batching multiple
>>>>>>>>> statements can save network exchanges between the client/server and 
>>>>>>>>> server
>>>>>>>>> coordinator/replicas. However, because of the distributed nature of
>>>>>>>>> Cassandra, spread requests across nearby nodes as much as possible to
>>>>>>>>> optimize performance. Using batches to optimize performance is 
>>>>>>>>> usually not
>>>>>>>>> successful, as described in Using and misusing batches section. For
>>>>>>>>> information about the fastest way to load data, see "Cassandra: Batch
>>>>>>>>> loading without the Batch keyword."”
>>>>>>>>>
>>>>>>>>> Maybe what we really need is a “client/driver-side batch”, which
>>>>>>>>> is simply a way to collect “batches” of operations in the 
>>>>>>>>> client/driver and
>>>>>>>>> then let the driver determine what degree of batching and asynchronous
>>>>>>>>> operation is appropriate.
>>>>>>>>>
>>>>>>>>> It might also be nice to have an inquiry for the cluster as to
>>>>>>>>> what batch size is most optimal for the cluster, like number of 
>>>>>>>>> mutations
>>>>>>>>> in a batch and number of simultaneous connections, and to have that be
>>>>>>>>> dynamic based on overall cluster load.
>>>>>>>>>
>>>>>>>>> I would also note that the example in the spec has multiple
>>>>>>>>> inserts with different partition key values, which flies in the face 
>>>>>>>>> of the
>>>>>>>>> admonition to to refrain from using server-side distribution of 
>>>>>>>>> requests.
>>>>>>>>>
>>>>>>>>> At a minimum the CQL spec should make a more clear statement of
>>>>>>>>> intent and non-intent for BATCH.
>>>>>>>>>
>>>>>>>>> -- Jack Krupansky
>>>>>>>>>
>>>>>>>>>  *From:* Jonathan Haddad <j...@jonhaddad.com>
>>>>>>>>> *Sent:* Friday, December 12, 2014 12:58 PM
>>>>>>>>> *To:* user@cassandra.apache.org ; Ryan Svihla
>>>>>>>>> <rsvi...@datastax.com>
>>>>>>>>> *Subject:* Re: batch_size_warn_threshold_in_kb
>>>>>>>>>
>>>>>>>>> The really important thing to really take away from Ryan's
>>>>>>>>> original post is that batches are not there for performance.  The 
>>>>>>>>> only case
>>>>>>>>> I consider batches to be useful for is when you absolutely need to 
>>>>>>>>> know
>>>>>>>>> that several tables all get a mutation (via logged batches).  The use 
>>>>>>>>> case
>>>>>>>>> for this is when you've got multiple tables that are serving as 
>>>>>>>>> different
>>>>>>>>> views for data.  It is absolutely not going to help you if you're 
>>>>>>>>> trying to
>>>>>>>>> lump queries together to reduce network & server overhead - in fact 
>>>>>>>>> it'll
>>>>>>>>> do the opposite.  If you're trying to do that, instead perform many 
>>>>>>>>> async
>>>>>>>>> queries.  The overhead of batches in cassandra is significant and 
>>>>>>>>> you're
>>>>>>>>> going to hit a lot of problems if you use them excessively (timeouts /
>>>>>>>>> failures).
>>>>>>>>>
>>>>>>>>> tl;dr: you probably don't want batch, you most likely want many
>>>>>>>>> async calls
>>>>>>>>>
>>>>>>>>> On Thu Dec 11 2014 at 11:15:00 PM Mohammed Guller <
>>>>>>>>> moham...@glassbeam.com> wrote:
>>>>>>>>>
>>>>>>>>>>  Ryan,
>>>>>>>>>>
>>>>>>>>>> Thanks for the quick response.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I did see that jira before posting my question on this list.
>>>>>>>>>> However, I didn’t see any information about why 5kb+ data will cause
>>>>>>>>>> instability. 5kb or even 50kb seems too small. For example, if each
>>>>>>>>>> mutation is 1000+ bytes, then with just 5 mutations, you will hit 
>>>>>>>>>> that
>>>>>>>>>> threshold.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In addition, Patrick is saying that he does not recommend more
>>>>>>>>>> than 100 mutations per batch. So why not warn users just on the # of
>>>>>>>>>> mutations in a batch?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Mohammed
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *From:* Ryan Svihla [mailto:rsvi...@datastax.com]
>>>>>>>>>> *Sent:* Thursday, December 11, 2014 12:56 PM
>>>>>>>>>> *To:* user@cassandra.apache.org
>>>>>>>>>> *Subject:* Re: batch_size_warn_threshold_in_kb
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Nothing magic, just put in there based on experience. You can
>>>>>>>>>> find the story behind the original recommendation here
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-6487
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Key reasoning for the desire comes from Patrick McFadden:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> "Yes that was in bytes. Just in my own experience, I don't
>>>>>>>>>> recommend more than ~100 mutations per batch. Doing some quick math 
>>>>>>>>>> I came
>>>>>>>>>> up with 5k as 100 x 50 byte mutations.
>>>>>>>>>>
>>>>>>>>>> Totally up for debate."
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It's totally changeable, however, it's there in no small part
>>>>>>>>>> because so many people confuse the BATCH keyword as a performance
>>>>>>>>>> optimization, this helps flag those cases of misuse.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller <
>>>>>>>>>> moham...@glassbeam.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi –
>>>>>>>>>>
>>>>>>>>>> The cassandra.yaml file has property called 
>>>>>>>>>> *batch_size_warn_threshold_in_kb.
>>>>>>>>>> *
>>>>>>>>>>
>>>>>>>>>> The default size is 5kb and according to the comments in the yaml
>>>>>>>>>> file, it is used to log WARN on any batch size exceeding this value 
>>>>>>>>>> in
>>>>>>>>>> kilobytes. It says caution should be taken on increasing the size of 
>>>>>>>>>> this
>>>>>>>>>> threshold as it can lead to node instability.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Does anybody know the significance of this magic number 5kb? Why
>>>>>>>>>> would a higher number (say 10kb) lead to node instability?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Mohammed
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>>>>>>
>>>>>>>>>> Ryan Svihla
>>>>>>>>>>
>>>>>>>>>> Solution Architect
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [image: twitter.png] <https://twitter.com/foundev>[image:
>>>>>>>>>> linkedin.png]
>>>>>>>>>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> DataStax is the fastest, most scalable distributed database
>>>>>>>>>> technology, delivering Apache Cassandra to the world’s most 
>>>>>>>>>> innovative
>>>>>>>>>> enterprises. Datastax is built to be agile, always-on, and 
>>>>>>>>>> predictably
>>>>>>>>>> scalable to any size. With more than 500 customers in 45 countries, 
>>>>>>>>>> DataStax
>>>>>>>>>> is the database technology and transactional backbone of choice for 
>>>>>>>>>> the
>>>>>>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and 
>>>>>>>>>> eBay.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>>
>>>>>> Ryan Svihla
>>>>>>
>>>>>> Solution Architect
>>>>>>
>>>>>> [image: twitter.png] <https://twitter.com/foundev> [image:
>>>>>> linkedin.png] <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>>>>
>>>>>> DataStax is the fastest, most scalable distributed database
>>>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>>>> scalable to any size. With more than 500 customers in 45 countries, 
>>>>>> DataStax
>>>>>> is the database technology and transactional backbone of choice for the
>>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and 
>>>>>> eBay.
>>>>>>
>>>>>>
>>>>>
>>>
>

Re: batch_size_warn_threshold_in_kb

Reply via email to