Cassandra 2.2.1

>From my throughput testing results, I want to utilize statement batching
and session.executeAsync in order to achieve the best write throughput, see
results and analysis below signature. The problem is, I can't figure out
how to estimate if/when a statement(s) are approaching or too large to fit
in a batch. At this point I'm sending 10 and if a Batch too large exception
is thrown by the server, I fall-back to executing the batches statements
individually. Unfortunately the log is filling with batch size warnings and
relying on server-side exceptions is wasting bandwidth and time to
re-transmit the individual statements back to the server. For my particular
use-case/data-flow, the gains from batching appear too great to not attempt
to realize.

Questions:

   - How can the client reasonably calculate what he batch size will be at
   the server?
   - Can the cassandra.yaml batch_size_warn_threshold_in_kb safely be
   increased to match the batch_size_fail_threshold_in_kb, contrary to the
   config file comment caution? See more explanation below.
   - Can anyone plausibly explain how ~1400 bytes of client data in a two
   statement batch becomes 8400 bytes at the server causing Batch size
   warnings?
   - Can anyone plausibly explain how  ~13072 bytes in a one statement
   batch becomes 54600 bytes at the server causing Batch too large errors?

Thanks,

Troy


The cassandra.yaml file contains the below batch configuration section. It
seems that there really should be only one level; fail. My inclination is
to set the warn level to the threshold level. Changing a logging level
shouldn't cause instability. Hopefully someone can provide clarity on just
exactly how in my case with small batches consisting of statements targeted
at the same physical node would cause instability. I'm not currently
experiencing any instability in performance testing, however I'm not at
full cluster scale either.

# Log WARN on any batch size exceeding this value. 5kb per batch by default.
# Caution should be taken on increasing the size of this threshold as it
can lead to node instability.
batch_size_warn_threshold_in_kb: 5

# Fail any batch exceeding this value. 50kb (10x warn threshold) by default.
batch_size_fail_threshold_in_kb: 50


Testing indicated that when writing with either execute or executeAsync,
utilizing batching substantial improves throughput. It certainly reduces
network round trips and chattiness. My batches are only composed of data
with the same partition key, therefore the statements composing the batches
are destined for the same physical nodes per best practice recommendations.

create table "wrkFinalRecs" (
  cust text
 ,user text
 ,assets list<text> /* 100 max, list reduces individual writes dramatically
improving throughput*/
 ,primary key ((cust), user)
);


Upserting 220K rows with the assets list<text> containing 100 - 6 byte
values in a batch of 1 vs. 10 performs 64% slower; 87 seconds vs. 53
seconds. For the tests, the cust and user were each 6 bytes long.

UPDATE \"rec0WrkspNo_0\".\"wrkFinalRecs\" set assets=? WHERE cust=? AND
user=?;

With only 2 statements added to a batch, batch size warnings occur and fill
the logs. A single statement batch with 1300 6 byte values causes Batch too
large errors, while executing the same statement individually succeeds
without warning, see details at the end.

WARN  [SharedPool-Worker-1] 2015-11-07 11:17:13,648 BatchStatement.java:272
- Batch of prepared statements for [rec0WrkspNo_0.wrkFinalRecs] is of size
8400, exceeding specified threshold of 5120 by 3280.


The total upsert data bytes per row without delimiters is 612 bytes. The
prepared statement text is 75 bytes. The cassandra.yaml warning limit is
5120 bytes.

So of course I ran tcpdump to try to understand how the data expands from
roughly 700 bytes per row x 2 or 1400 bytes to a warning that the size of
the batch is 8400.

$ sudo tcpdump -X -i lo 'port 9042'

The batch was transmitted in a single packet which was 2121 bytes long. A
far cry from 8400.

Analysis:

   - 64 bytes header
   - 330 bytes common at start of each row
   - 4 byte delimiters between each value
   - 11 bytes of trailer

11:23:19.381318 IP localhost.49801 > localhost.9042: Flags [P.], seq
173:2294, ack 192, win 342, options [nop,nop,TS val 191526341 ecr
191525312], length 2121
0x0000:  4500 087d 4c28 4000 4006 e850 7f00 0001  E..}L(@.@..P....
0x0010:  7f00 0001 c289 2352 94bb 3e44 2907 010f  ......#R..>D)...
0x0020:  8018 0156 0672 0000 0101 080a 0b6a 75c5  ...V.r.......ju.
0x0030:  0b6a 71c0 0400 00c0 0d00 0008 4000 0002  .jq.........@...
0x0040:  0100 1025 b80e 3017 0597 d988 cd5e 1d67  ...%..0......^.g
0x0050:  81e1 1200 0300 0003 ec00 0000 6400 0000  ............d...
0x0060:  0661 3130 3030 3000 0000 0661 3130 3030  .a10000....a1000
0x0070:  3000 0000 0661 3130 3030 3000 0000 0661  0....a10000....a
0x0080:  3130 3030 3000 0000 0661 3130 3030 3000  10000....a10000.
. . .
0x03f0:  3130 3030 3000 0000 0661 3130 3030 3000  10000....a10000.
0x0400:  0000 0661 3130 3030 3000 0000 0661 3130  ...a10000....a10
0x0410:  3030 3000 0000 0661 3130 3030 3000 0000  000....a10000...
0x0420:  0661 3130 3030 3000 0000 0661 3130 3030  .a10000....a1000
0x0430:  3000 0000 0661 3130 3030 3000 0000 0661  0....a10000....a
0x0440:  3130 3030 3000 0000 0663 7573 745f 6300  10000....cust_c.
0x0450:  0000 0631 3030 3030 3001 0010 25b8 0e30  ...100000...%..0
0x0460:  1705 97d9 88cd 5e1d 6781 e112 0003 0000  ......^.g.......
0x0470:  03ec 0000 0064 0000 0006 6131 3030 3030  .....d....a10000
0x0480:  0000 0006 6131 3030 3030 0000 0006 6131  ....a10000....a1
0x0490:  3030 3030 0000 0006 6131 3030 3030 0000  0000....a10000..
0x04a0:  0006 6131 3030 3030 0000 0006 6131 3030  ..a10000....a100
0x04b0:  3030 0000 0006 6131 3030 3030 0000 0006  00....a10000....
. . .
0x0820:  3030 0000 0006 6131 3030 3030 0000 0006  00....a10000....
0x0830:  6131 3030 3030 0000 0006 6131 3030 3030  a10000....a10000
0x0840:  0000 0006 6131 3030 3030 0000 0006 6131  ....a10000....a1
0x0850:  3030 3030 0000 0006 6131 3030 3030 0000  0000....a10000..
0x0860:  0006 6375 7374 5f63 0000 0006 3130 3030  ..cust_c....1000
0x0870:  3031 0004 2000 0523 f5c7 30d0 50         01.....#..0.P
11:23:19.381349 IP localhost.9042 > localhost.49801: Flags [.], ack 2294,
win 1365, options [nop,nop,TS val 191526341 ecr 191526341], length 0
0x0000:  4500 0034 71ea 4000 4006 cad7 7f00 0001  E..4q.@.@.......
0x0010:  7f00 0001 2352 c289 2907 010f 94bb 468d  ....#R..).....F.
0x0020:  8010 0555 fe28 0000 0101 080a 0b6a 75c5  ...U.(.......ju.
0x0030:  0b6a 75c5                                .ju.
11:23:19.383353 IP localhost.9042 > localhost.49801: Flags [P.], seq
192:334, ack 2294, win 1365, options [nop,nop,TS val 191526341 ecr
191526341], length 142
0x0000:  4500 00c2 71eb 4000 4006 ca48 7f00 0001  E...q.@.@..H....
0x0010:  7f00 0001 2352 c289 2907 010f 94bb 468d  ....#R..).....F.
0x0020:  8018 0555 feb6 0000 0101 080a 0b6a 75c5  ...U.........ju.
0x0030:  0b6a 75c5 8408 00c0 0800 0000 8500 0100  .ju.............
0x0040:  7d42 6174 6368 206f 6620 7072 6570 6172  }Batch.of.prepar
0x0050:  6564 2073 7461 7465 6d65 6e74 7320 666f  ed.statements.fo
0x0060:  7220 5b72 6563 3057 726b 7370 4e6f 5f30  r.[rec0WrkspNo_0
0x0070:  2e77 726b 4669 6e61 6c52 6563 735d 2069  .wrkFinalRecs].i
0x0080:  7320 6f66 2073 697a 6520 3834 3030 2c20  s.of.size.8400,.
0x0090:  6578 6365 6564 696e 6720 7370 6563 6966  exceeding.specif
0x00a0:  6965 6420 7468 7265 7368 6f6c 6420 6f66  ied.threshold.of
0x00b0:  2035 3132 3020 6279 2033 3238 302e 0000  .5120.by.3280...
0x00c0:  0001                                     ..


Batches with one statement throw Batch too large Error. The actual data
less than 8000 bytes and tcpdump indicates it transmitted

ERROR [SharedPool-Worker-1] 2015-11-07 11:48:10,713 BatchStatement.java:267
- Batch of prepared statements for [rec0WrkspNo_0.wrkFinalRecs]* is of size
54600, exceeding specified threshold of 51200 by 3400*. (see
batch_size_fail_threshold_in_kb)


The batch was transmitted in a single packet which was 13072 bytes long. A
far cry from the 51200 byte limit.

11:48:10.717483 IP localhost.33390 > 127.0.0.3.9042: Flags [P.], seq
173:13245, ack 192, win 342, options [nop,nop,TS val 191899175 ecr
191898686], length 13072
0x0000:  4500 3344 a91e 4000 4006 6091 7f00 0001  E.3D..@.@.`.....
0x0010:  7f00 0003 826e 2352 3492 8dcd 66bb 70f3  .....n#R4...f.p.
0x0020:  8018 0156 313b 0000 0101 080a 0b70 2627  ...V1;.......p&'
0x0030:  0b70 243e 0400 00c0 0a00 0033 0700 1025  .p$>.......3...%
0x0040:  b80e 3017 0597 d988 cd5e 1d67 81e1 1200  ..0......^.g....
0x0050:  0425 0003 0000 32cc 0000 0514 0000 0006  .%....2.........
0x0060:  6131 3030 3030 0000 0006 6131 3030 3030  a10000....a10000
0x0070:  0000 0006 6131 3030 3030 0000 0006 6131  ....a10000....a1
0x0080:  3030 3030 0000 0006 6131 3030 3030 0000  0000....a10000..
. . .
0x3300:  6131 3030 3030 0000 0006 6131 3030 3030  a10000....a10000
0x3310:  0000 0006 6131 3030 3030 0000 0006 6131  ....a10000....a1
0x3320:  3030 3030 0000 0006 6375 7374 5f63 0000  0000....cust_c..
0x3330:  0006 3130 3030 3030 0000 1388 0005 23f6  ..100000......#.
0x3340:  2014 d360                                ...`

Reply via email to