I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress tool 
to work for my test scenario. I have followed the example on 
http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
 
<http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema>
 to create a yaml file describing my test.

I am collecting events per user id (text, partition key). Events have a session 
type (text), event type (text), and creation time (timestamp) (clustering keys, 
in that order). Plus some more attributes required for rendering the events in 
a UI. For testing purposes I ended up with the following column spec and insert 
distribution:

columnspec:
  - name: created_at
    cluster: uniform(10..10000)
  - name: event_type
    size: uniform(5..10)
    population: uniform(1..30)
    cluster: uniform(1..30)
  - name: session_type
    size: fixed(5)
    population: uniform(1..4)
    cluster: uniform(1..4)
  - name: user_id
    size: fixed(15)
    population: uniform(1..1000000)
  - name: message
    size: uniform(10..100)
    population: uniform(1..100B)

insert:
  partitions: fixed(1)
  batchtype: UNLOGGED
  select: fixed(1)/1200000


Running stress tool for just the insert prints 

Generating batches with [1..1] partitions and [0..1] rows (of [10..1200000] 
total rows in the partitions)

and then immediately starts flooding me with 
"com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large”. 

Why I should be exceeding the "batch_size_fail_threshold_in_kb: 50” in the 
cassandra.yaml I do not understand. My understanding is that the stress tool 
should generate one row per batch. The size of a single row should not exceed 
8+10*3+5*3+15*3+100*3 = 398 bytes. Assuming a worst case of all text characters 
being 3 byte unicode characters. 

How come I end up with batches that exceed the 50kb threshold? Am I missing the 
point about the “select” attribute?


Thanks!
Ralf

Reply via email to