I have a time series table consisting of frame information for media. The
table is partitioned on the media ID and uses time and some other frame
level keys as cluster keys, i.e. all frames for a one piece of media is
really one column family "row", even though it is represented in CQL as a
ordered series of frame data. The size of these sets vary from 5k to 200k
"rows" per media and are always inserted at one time and available in
memory in ordered form. I'm currently fanning the inserts out via async
calls, using a queue to fix the max parallelism (set to 100 right now).

For some of the larger sets (50k and above) I sometimes get the following
exception:

com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra
timeout during write query at consistency ONE (1 replica were required but
only 0 acknowledged the write)
at
com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na]
at com.datastax.driver.core.Responses$Error.asException(Responses.java:93)
~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na]
at
com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:110)
~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na]
at
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:237)
~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na]
at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:402)
~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na]
        ....

I've tried to reduce the max parallelism and increasing the timeout
threshold, but once the cluster gets humming from a bunch of inserts even
going as low as 10 in parallel doesn't seem to completely avoid those
exceptions from occurring.

I realize that fanning out just means that previously ordered data is not
arriving at random nodes in random order and has to get to the partition
key owning nodes and be re-ordered as they arrive, which seems less like
the wrong way to do it. However the parallelism approach does increase
insert speed almost linearly except for those timeouts.

I'm wondering what the best approach would be. The scenarios I can think of
are:

1) Retry and back off on Timeout Exceptions, but keep the fan out approach.

Seems like a good approach unless the Timeout really is just a warning that
I'm overloading things

2) Switch to BATCH inserts

Would this be better, since the data would go to only a single node and be
inserted in ordered form? And would this even alleviate timeouts since now
giant batches need to be acknowledged by the replicas.

3) Go to consistency ANY.

The docs seem to imply that TimeoutException isn't really a failure, just a
heads up. I don't really care about waiting for all replicas to be up to
date on these inserts anyhow, but is it really safe or am i looking at
replica's drifting out of sync.

4) Figure out how to tune my cluster better and change nothing on the client

thanks,
arne

Reply via email to