I have a time series table consisting of frame information for media. The table is partitioned on the media ID and uses time and some other frame level keys as cluster keys, i.e. all frames for a one piece of media is really one column family "row", even though it is represented in CQL as a ordered series of frame data. The size of these sets vary from 5k to 200k "rows" per media and are always inserted at one time and available in memory in ordered form. I'm currently fanning the inserts out via async calls, using a queue to fix the max parallelism (set to 100 right now).
For some of the larger sets (50k and above) I sometimes get the following exception: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54) ~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na] at com.datastax.driver.core.Responses$Error.asException(Responses.java:93) ~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na] at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:110) ~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na] at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:237) ~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na] at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:402) ~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na] .... I've tried to reduce the max parallelism and increasing the timeout threshold, but once the cluster gets humming from a bunch of inserts even going as low as 10 in parallel doesn't seem to completely avoid those exceptions from occurring. I realize that fanning out just means that previously ordered data is not arriving at random nodes in random order and has to get to the partition key owning nodes and be re-ordered as they arrive, which seems less like the wrong way to do it. However the parallelism approach does increase insert speed almost linearly except for those timeouts. I'm wondering what the best approach would be. The scenarios I can think of are: 1) Retry and back off on Timeout Exceptions, but keep the fan out approach. Seems like a good approach unless the Timeout really is just a warning that I'm overloading things 2) Switch to BATCH inserts Would this be better, since the data would go to only a single node and be inserted in ordered form? And would this even alleviate timeouts since now giant batches need to be acknowledged by the replicas. 3) Go to consistency ANY. The docs seem to imply that TimeoutException isn't really a failure, just a heads up. I don't really care about waiting for all replicas to be up to date on these inserts anyhow, but is it really safe or am i looking at replica's drifting out of sync. 4) Figure out how to tune my cluster better and change nothing on the client thanks, arne