I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress tool to work for my test scenario. I have followed the example on http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema <http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema> to create a yaml file describing my test.
I am collecting events per user id (text, partition key). Events have a session type (text), event type (text), and creation time (timestamp) (clustering keys, in that order). Plus some more attributes required for rendering the events in a UI. For testing purposes I ended up with the following column spec and insert distribution: columnspec: - name: created_at cluster: uniform(10..10000) - name: event_type size: uniform(5..10) population: uniform(1..30) cluster: uniform(1..30) - name: session_type size: fixed(5) population: uniform(1..4) cluster: uniform(1..4) - name: user_id size: fixed(15) population: uniform(1..1000000) - name: message size: uniform(10..100) population: uniform(1..100B) insert: partitions: fixed(1) batchtype: UNLOGGED select: fixed(1)/1200000 Running stress tool for just the insert prints Generating batches with [1..1] partitions and [0..1] rows (of [10..1200000] total rows in the partitions) and then immediately starts flooding me with "com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large”. Why I should be exceeding the "batch_size_fail_threshold_in_kb: 50” in the cassandra.yaml I do not understand. My understanding is that the stress tool should generate one row per batch. The size of a single row should not exceed 8+10*3+5*3+15*3+100*3 = 398 bytes. Assuming a worst case of all text characters being 3 byte unicode characters. How come I end up with batches that exceed the 50kb threshold? Am I missing the point about the “select” attribute? Thanks! Ralf