unlogged batch meaningfully outperforms parallel execution of individual 
statements, especially at scale, and creates lower memory pressure on both the 
clients and cluster.  They do outperform parallel individuals, but in cost of 
higher pressure on coordinators which leads to more blocked Natives and dropped 
mutations, Actually i think that 10-20% better write performance + 20-30% less 
CPU usage on client machines (we don't care about client machines in compare 
with cluster machines) which is outcome of batch statements with multiple 
partitions on each batch, does not worth it, because less-busy cluster nodes 
are needed to answer read queries, compactions, repairs, etc The biggest major 
downside to unlogged batches are that the unit of retry during failure is the 
entire batch.  So if you use a retry policy, write timeouts will tip over your 
cluster a lot faster than individual statements.  Bounding your batch sizes 
helps mitigate this risk.   I assume that in most scenarios, the client 
machines are in the same network with Cassandra cluster, so is it still faster? 
Thank you all. Now I understand whether to use batch or asynchronous writes 
really depends on use case. Till now batch writes work for me in a 8 nodes 
cluster with over 500 million requests per day. Did you compare the cluster 
performance including blocked natives, dropped mutations, 95 percentiles, 
cluster CPU usage, etc  in two scenarios (batch vs single)? Although 500M per 
day is not so much for 8 nodes cluster (if the node spec is compliant with 
datastax recommendations) and async single statements could handle it (just 
demands high CPU on client machine), the impact of such things (non compliant 
batch statements annoying the cluster) would show up after some weeks, when 
suddenly a lot of cluster tasks need to be run simultaneously; one or two big 
compactions are running on most of the nodes, some hinted hand offs and cluster 
could not keep up and starts to became slower and slower. The way to prevent it 
sooner, would be keep the error counters as low as possible, things like 
blocked NTPs, dropped, errors, hinted hinted hand-offs, latencies, etc.

Reply via email to