> How can I handle this kind of situation?

In terms of surviving the problem, a re-try on the client side might
help assuming the problem is temporary.

However,  certainly the fact that you're seeing an issue to begin with
is interesting, and the way to avoid it would depend on what the
problem is. My understanding is that the UnavailableException
indicates that the node you are talking to was unable to read
form/write to a sufficient number of nodes to satisfy your consistency
level. Presumably either because individual requests failed to return
in time, or because the node considers other nodes to be flat out
down.

Can you correlate these issues with server-side activity on the nodes,
such as background compaction, commitlog rotation or memtable
flushing? Do you see your nodes saying that other nodes in the cluster
are "DOWN" and "UP" (flapping)?

How large is the data set in total (in terms of sstable size on disk),
and how much memory do you have in your machines (going to page
cache)?

Have you observed the behavior of your nodes during compaction; in
particular whether compaction is CPU bound or I/O bound? (That would
tend to depend on data; generally the larger the individual values the
more disk bound you'd tend to be.)

Just trying to zero in on what the likely root cause is in this case.

-- 
/ Peter Schuller

Reply via email to