Re: batch_size_warn_threshold_in_kb

Eric Stevens Sat, 13 Dec 2014 05:33:46 -0800

Jon,

> The really important thing to really take away from Ryan's original post
is that batches are not there for performance.
> tl;dr: you probably don't want batch, you most likely want many async
calls


My own rudimentary testing does not bear this out - at least not if you
mean to say that batches don't offer a performance advantage (vs this just
being a happy side effect).  Unlogged batches provide a substantial
improvement on performance for burst writes in my findings.

My test setup:

   - Amazon i2.8xl instances in 3 AZ's using EC2Snitch
   - Cluster size of 3, RF=3
   - DataStax Java Driver, with token aware routing, using Prepared
   Statements, vs Unlogged Batches of Prepared Statements.
   - Test client on separate machine in same AZ as one of the server nodes
   - Data Size: 50,000 records
   - Test Runs: 25 (unique data generated before each run)
   - Data written to 5 tables, one table at a time (all 500k records go to
   each table)
   - Timing begins when first record is written to a table and ends when
   the last async call completes for that table.  Timing is measured
   independently for each strategy, table, and run.
   - To eliminate bias, order between tables is randomized on each run, and
   order between single vs batched execution is randomized on each run.
   - Asynchronicity is tested using three different typical Scala
   parallelism strategies.
      - "traverse" = Futures.traverse(statements).map(_.executeAsync()) -
      let the Futures system schedule the parallelism it thinks is appropriate
      - "scatter" = Futures.sequence(statements.map(_.executeAsync())) -
      Create as many async calls as possible at a time, then let the Futures
      system gather together the results
      - "parallel" = statements.par.map(_.execute()) - using a parallel
      collection to initiate as many blocking calls as possible within the
      default thread pool.
   - I kept an eye on compaction throughout, and we never went above 2
   pending compaction tasks

I know this test is fairly contrived, but it's difficult to dismiss a
throughput differences of this magnitude over several million data points.
Times are in nanos.

==== Execution Results for 25 runs of 50000 records =============
25 runs of 50,000 records (3 protos, 5 agents, ~15 per bucket) as single
statements using strategy scatter
Total Run Time
        test3 ((aid, bckt), end, proto) reverse order        =
51,391,100,107
        test1 ((aid, bckt), proto, end) reverse order        =
52,206,907,605
        test4 ((aid, bckt), proto, end) no explicit ordering =
53,903,886,095
        test2 ((aid, bckt), end)                             =
54,613,620,320
        test5 ((aid, bckt, end))                             =
55,820,739,557

==== Execution Results for 25 runs of 50000 records =============
25 runs of 50,000 records (3 protos, 5 agents, ~15 per bucket) in batches
of 100 using strategy scatter
Total Run Time
        test3 ((aid, bckt), end, proto) reverse order        = 9,199,579,182
        test4 ((aid, bckt), proto, end) no explicit ordering =
11,661,638,491
        test2 ((aid, bckt), end)                             =
12,059,853,548
        test1 ((aid, bckt), proto, end) reverse order        =
12,957,113,345
        test5 ((aid, bckt, end))                             =
31,166,071,275

==== Execution Results for 25 runs of 50000 records =============
25 runs of 50,000 records (3 protos, 5 agents, ~15 per bucket) as single
statements using strategy traverse
Total Run Time
        test1 ((aid, bckt), proto, end) reverse order        =
52,368,815,408
        test2 ((aid, bckt), end)                             =
52,676,830,110
        test4 ((aid, bckt), proto, end) no explicit ordering =
54,096,838,258
        test5 ((aid, bckt, end))                             =
54,657,464,976
        test3 ((aid, bckt), end, proto) reverse order        =
55,668,202,827

==== Execution Results for 25 runs of 50000 records =============
25 runs of 50,000 records (3 protos, 5 agents, ~15 per bucket) in batches
of 100 using strategy traverse
Total Run Time
        test3 ((aid, bckt), end, proto) reverse order        = 9,633,141,094
        test4 ((aid, bckt), proto, end) no explicit ordering =
12,519,381,544
        test2 ((aid, bckt), end)                             =
12,653,843,637
        test1 ((aid, bckt), proto, end) reverse order        =
17,644,182,274
        test5 ((aid, bckt, end))                             =
27,902,501,534

==== Execution Results for 25 runs of 50000 records =============
25 runs of 50,000 records (3 protos, 5 agents, ~15 per bucket) as single
statements using strategy parallel
Total Run Time
        test1 ((aid, bckt), proto, end) reverse order        =
360,523,086,443
        test3 ((aid, bckt), end, proto) reverse order        =
364,375,212,413
        test4 ((aid, bckt), proto, end) no explicit ordering =
370,989,615,452
        test2 ((aid, bckt), end)                             =
378,368,728,469
        test5 ((aid, bckt, end))                             =
380,737,675,612

==== Execution Results for 25 runs of 50000 records =============
25 runs of 50,000 records (3 protos, 5 agents, ~15 per bucket) in batches
of 100 using strategy parallel
Total Run Time
        test3 ((aid, bckt), end, proto) reverse order        =
20,971,045,814
        test1 ((aid, bckt), proto, end) reverse order        =
21,379,583,690
        test4 ((aid, bckt), proto, end) no explicit ordering =
21,505,965,087
        test2 ((aid, bckt), end)                             =
24,433,580,144
        test5 ((aid, bckt, end))                             =
37,346,062,553



On Fri Dec 12 2014 at 11:00:12 AM Jonathan Haddad <j...@jonhaddad.com> wrote:

> The really important thing to really take away from Ryan's original post
> is that batches are not there for performance.  The only case I consider
> batches to be useful for is when you absolutely need to know that several
> tables all get a mutation (via logged batches).  The use case for this is
> when you've got multiple tables that are serving as different views for
> data.  It is absolutely not going to help you if you're trying to lump
> queries together to reduce network & server overhead - in fact it'll do the
> opposite.  If you're trying to do that, instead perform many async
> queries.  The overhead of batches in cassandra is significant and you're
> going to hit a lot of problems if you use them excessively (timeouts /
> failures).
>
> tl;dr: you probably don't want batch, you most likely want many async calls
>
>
> On Thu Dec 11 2014 at 11:15:00 PM Mohammed Guller <moham...@glassbeam.com>
> wrote:
>
>>  Ryan,
>>
>> Thanks for the quick response.
>>
>>
>>
>> I did see that jira before posting my question on this list. However, I
>> didn’t see any information about why 5kb+ data will cause instability. 5kb
>> or even 50kb seems too small. For example, if each mutation is 1000+ bytes,
>> then with just 5 mutations, you will hit that threshold.
>>
>>
>>
>> In addition, Patrick is saying that he does not recommend more than 100
>> mutations per batch. So why not warn users just on the # of mutations in a
>> batch?
>>
>>
>>
>> Mohammed
>>
>>
>>
>> *From:* Ryan Svihla [mailto:rsvi...@datastax.com]
>> *Sent:* Thursday, December 11, 2014 12:56 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: batch_size_warn_threshold_in_kb
>>
>>
>>
>> Nothing magic, just put in there based on experience. You can find the
>> story behind the original recommendation here
>>
>>
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-6487
>>
>>
>>
>> Key reasoning for the desire comes from Patrick McFadden:
>>
>>
>> "Yes that was in bytes. Just in my own experience, I don't recommend more
>> than ~100 mutations per batch. Doing some quick math I came up with 5k as
>> 100 x 50 byte mutations.
>>
>> Totally up for debate."
>>
>>
>>
>> It's totally changeable, however, it's there in no small part because so
>> many people confuse the BATCH keyword as a performance optimization, this
>> helps flag those cases of misuse.
>>
>>
>>
>> On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller <moham...@glassbeam.com>
>> wrote:
>>
>> Hi –
>>
>> The cassandra.yaml file has property called *batch_size_warn_threshold_in_kb.
>> *
>>
>> The default size is 5kb and according to the comments in the yaml file,
>> it is used to log WARN on any batch size exceeding this value in kilobytes.
>> It says caution should be taken on increasing the size of this threshold as
>> it can lead to node instability.
>>
>>
>>
>> Does anybody know the significance of this magic number 5kb? Why would a
>> higher number (say 10kb) lead to node instability?
>>
>>
>>
>> Mohammed
>>
>>
>>
>>
>> --
>>
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> Ryan Svihla
>>
>> Solution Architect
>>
>>
>> [image: twitter.png] <https://twitter.com/foundev>[image: linkedin.png]
>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>
>>
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>>
>>
>

Re: batch_size_warn_threshold_in_kb

Reply via email to