Re: batch_size_warn_threshold_in_kb

Ryan Svihla Sat, 13 Dec 2014 06:16:19 -0800

Also..to piggy back on what Jon is saying. what is the cost of retrying a
failed batch versus retrying a single failed record in your logical batch?
I'd suggest in the example you provide there is a 100x difference in retry
cost of a single record.


Regardless, I'm glad you're testing this out, always good to push back on
theory discussions with numbers.

On Sat, Dec 13, 2014 at 8:12 AM, Ryan Svihla <rsvi...@datastax.com> wrote:
>
> Are batches to the same partition key (which results in a single mutation,
> and obviously eliminates the primary problem)? Is your client network
> and/or CPU bound?
>
> Remember, the coordinator node is _just_ doing what your client is doing
> with executeAsync, only now it's dealing with the heap pressure of
> compaction and flush writers, while youre client is busy writing code.
>
> Not trying to be argumentative, but I talk to the driver writers almost
> daily, and I've moved a lot of customers off batches and every single one
> of them sped up things substantially, that experience plus the theory leads
> me to believe there is a bottleneck on your client. Final point the more
> you grow your cluster the more the cost of losing token awareness in all
> writes in the batch grows
>
>
> On Sat, Dec 13, 2014 at 7:32 AM, Eric Stevens <migh...@gmail.com> wrote:
>>
>> Jon,
>>
>> > The really important thing to really take away from Ryan's original
>> post is that batches are not there for performance.
>> > tl;dr: you probably don't want batch, you most likely want many async
>> calls
>>
>> My own rudimentary testing does not bear this out - at least not if you
>> mean to say that batches don't offer a performance advantage (vs this just
>> being a happy side effect).  Unlogged batches provide a substantial
>> improvement on performance for burst writes in my findings.
>>
>> My test setup:
>>
>>    - Amazon i2.8xl instances in 3 AZ's using EC2Snitch
>>    - Cluster size of 3, RF=3
>>    - DataStax Java Driver, with token aware routing, using Prepared
>>    Statements, vs Unlogged Batches of Prepared Statements.
>>    - Test client on separate machine in same AZ as one of the server
>>    nodes
>>    - Data Size: 50,000 records
>>    - Test Runs: 25 (unique data generated before each run)
>>    - Data written to 5 tables, one table at a time (all 500k records go
>>    to each table)
>>    - Timing begins when first record is written to a table and ends when
>>    the last async call completes for that table.  Timing is measured
>>    independently for each strategy, table, and run.
>>    - To eliminate bias, order between tables is randomized on each run,
>>    and order between single vs batched execution is randomized on each run.
>>    - Asynchronicity is tested using three different typical Scala
>>    parallelism strategies.
>>       - "traverse" = Futures.traverse(statements).map(_.executeAsync())
>>       - let the Futures system schedule the parallelism it thinks is 
>> appropriate
>>       - "scatter" = Futures.sequence(statements.map(_.executeAsync())) -
>>       Create as many async calls as possible at a time, then let the Futures
>>       system gather together the results
>>       - "parallel" = statements.par.map(_.execute()) - using a parallel
>>       collection to initiate as many blocking calls as possible within the
>>       default thread pool.
>>    - I kept an eye on compaction throughout, and we never went above 2
>>    pending compaction tasks
>>
>> I know this test is fairly contrived, but it's difficult to dismiss a
>> throughput differences of this magnitude over several million data points.
>> Times are in nanos.
>>
>> ==== Execution Results for 25 runs of 50000 records =============
>> 25 runs of 50,000 records (3 protos, 5 agents, ~15 per bucket) as single
>> statements using strategy scatter
>> Total Run Time
>>         test3 ((aid, bckt), end, proto) reverse order        =
>> 51,391,100,107
>>         test1 ((aid, bckt), proto, end) reverse order        =
>> 52,206,907,605
>>         test4 ((aid, bckt), proto, end) no explicit ordering =
>> 53,903,886,095
>>         test2 ((aid, bckt), end)                             =
>> 54,613,620,320
>>         test5 ((aid, bckt, end))                             =
>> 55,820,739,557
>>
>> ==== Execution Results for 25 runs of 50000 records =============
>> 25 runs of 50,000 records (3 protos, 5 agents, ~15 per bucket) in batches
>> of 100 using strategy scatter
>> Total Run Time
>>         test3 ((aid, bckt), end, proto) reverse order        =
>> 9,199,579,182
>>         test4 ((aid, bckt), proto, end) no explicit ordering =
>> 11,661,638,491
>>         test2 ((aid, bckt), end)                             =
>> 12,059,853,548
>>         test1 ((aid, bckt), proto, end) reverse order        =
>> 12,957,113,345
>>         test5 ((aid, bckt, end))                             =
>> 31,166,071,275
>>
>> ==== Execution Results for 25 runs of 50000 records =============
>> 25 runs of 50,000 records (3 protos, 5 agents, ~15 per bucket) as single
>> statements using strategy traverse
>> Total Run Time
>>         test1 ((aid, bckt), proto, end) reverse order        =
>> 52,368,815,408
>>         test2 ((aid, bckt), end)                             =
>> 52,676,830,110
>>         test4 ((aid, bckt), proto, end) no explicit ordering =
>> 54,096,838,258
>>         test5 ((aid, bckt, end))                             =
>> 54,657,464,976
>>         test3 ((aid, bckt), end, proto) reverse order        =
>> 55,668,202,827
>>
>> ==== Execution Results for 25 runs of 50000 records =============
>> 25 runs of 50,000 records (3 protos, 5 agents, ~15 per bucket) in batches
>> of 100 using strategy traverse
>> Total Run Time
>>         test3 ((aid, bckt), end, proto) reverse order        =
>> 9,633,141,094
>>         test4 ((aid, bckt), proto, end) no explicit ordering =
>> 12,519,381,544
>>         test2 ((aid, bckt), end)                             =
>> 12,653,843,637
>>         test1 ((aid, bckt), proto, end) reverse order        =
>> 17,644,182,274
>>         test5 ((aid, bckt, end))                             =
>> 27,902,501,534
>>
>> ==== Execution Results for 25 runs of 50000 records =============
>> 25 runs of 50,000 records (3 protos, 5 agents, ~15 per bucket) as single
>> statements using strategy parallel
>> Total Run Time
>>         test1 ((aid, bckt), proto, end) reverse order        =
>> 360,523,086,443
>>         test3 ((aid, bckt), end, proto) reverse order        =
>> 364,375,212,413
>>         test4 ((aid, bckt), proto, end) no explicit ordering =
>> 370,989,615,452
>>         test2 ((aid, bckt), end)                             =
>> 378,368,728,469
>>         test5 ((aid, bckt, end))                             =
>> 380,737,675,612
>>
>> ==== Execution Results for 25 runs of 50000 records =============
>> 25 runs of 50,000 records (3 protos, 5 agents, ~15 per bucket) in batches
>> of 100 using strategy parallel
>> Total Run Time
>>         test3 ((aid, bckt), end, proto) reverse order        =
>> 20,971,045,814
>>         test1 ((aid, bckt), proto, end) reverse order        =
>> 21,379,583,690
>>         test4 ((aid, bckt), proto, end) no explicit ordering =
>> 21,505,965,087
>>         test2 ((aid, bckt), end)                             =
>> 24,433,580,144
>>         test5 ((aid, bckt, end))                             =
>> 37,346,062,553
>>
>>
>>
>> On Fri Dec 12 2014 at 11:00:12 AM Jonathan Haddad <j...@jonhaddad.com>
>> wrote:
>>
>>> The really important thing to really take away from Ryan's original post
>>> is that batches are not there for performance.  The only case I consider
>>> batches to be useful for is when you absolutely need to know that several
>>> tables all get a mutation (via logged batches).  The use case for this is
>>> when you've got multiple tables that are serving as different views for
>>> data.  It is absolutely not going to help you if you're trying to lump
>>> queries together to reduce network & server overhead - in fact it'll do the
>>> opposite.  If you're trying to do that, instead perform many async
>>> queries.  The overhead of batches in cassandra is significant and you're
>>> going to hit a lot of problems if you use them excessively (timeouts /
>>> failures).
>>>
>>> tl;dr: you probably don't want batch, you most likely want many async
>>> calls
>>>
>>>
>>> On Thu Dec 11 2014 at 11:15:00 PM Mohammed Guller <
>>> moham...@glassbeam.com> wrote:
>>>
>>>>  Ryan,
>>>>
>>>> Thanks for the quick response.
>>>>
>>>>
>>>>
>>>> I did see that jira before posting my question on this list. However, I
>>>> didn’t see any information about why 5kb+ data will cause instability. 5kb
>>>> or even 50kb seems too small. For example, if each mutation is 1000+ bytes,
>>>> then with just 5 mutations, you will hit that threshold.
>>>>
>>>>
>>>>
>>>> In addition, Patrick is saying that he does not recommend more than 100
>>>> mutations per batch. So why not warn users just on the # of mutations in a
>>>> batch?
>>>>
>>>>
>>>>
>>>> Mohammed
>>>>
>>>>
>>>>
>>>> *From:* Ryan Svihla [mailto:rsvi...@datastax.com]
>>>> *Sent:* Thursday, December 11, 2014 12:56 PM
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* Re: batch_size_warn_threshold_in_kb
>>>>
>>>>
>>>>
>>>> Nothing magic, just put in there based on experience. You can find the
>>>> story behind the original recommendation here
>>>>
>>>>
>>>>
>>>> https://issues.apache.org/jira/browse/CASSANDRA-6487
>>>>
>>>>
>>>>
>>>> Key reasoning for the desire comes from Patrick McFadden:
>>>>
>>>>
>>>> "Yes that was in bytes. Just in my own experience, I don't recommend
>>>> more than ~100 mutations per batch. Doing some quick math I came up with 5k
>>>> as 100 x 50 byte mutations.
>>>>
>>>> Totally up for debate."
>>>>
>>>>
>>>>
>>>> It's totally changeable, however, it's there in no small part because
>>>> so many people confuse the BATCH keyword as a performance optimization,
>>>> this helps flag those cases of misuse.
>>>>
>>>>
>>>>
>>>> On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller <
>>>> moham...@glassbeam.com> wrote:
>>>>
>>>> Hi –
>>>>
>>>> The cassandra.yaml file has property called 
>>>> *batch_size_warn_threshold_in_kb.
>>>> *
>>>>
>>>> The default size is 5kb and according to the comments in the yaml file,
>>>> it is used to log WARN on any batch size exceeding this value in kilobytes.
>>>> It says caution should be taken on increasing the size of this threshold as
>>>> it can lead to node instability.
>>>>
>>>>
>>>>
>>>> Does anybody know the significance of this magic number 5kb? Why would
>>>> a higher number (say 10kb) lead to node instability?
>>>>
>>>>
>>>>
>>>> Mohammed
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>
>>>> Ryan Svihla
>>>>
>>>> Solution Architect
>>>>
>>>>
>>>> [image: twitter.png] <https://twitter.com/foundev>[image: linkedin.png]
>>>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>>
>>>>
>>>>
>>>> DataStax is the fastest, most scalable distributed database technology,
>>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>>> size. With more than 500 customers in 45 countries, DataStax is the
>>>> database technology and transactional backbone of choice for the worlds
>>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>>
>>>>
>>>>
>>>
>
> --
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Ryan Svihla
>
> Solution Architect
>
> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
>

-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: batch_size_warn_threshold_in_kb

Reply via email to