Hi, I have a use case wherein I receive a daily batch of data; it's about 50M--100M records (a record is a list of integers, keyed by a UUID). The target is a 12-node cluster.
Using a simple-minded approach (24 batched inserts in parallel, using the Ruby client), while the cluster is being read at a rate of about 150k/s, I get about 15.5k insertions per second. This in itself is satisfactory, but the concern is that the large amount of writes causes the read latency to jump up during the insertion, and for a while after. I tried using sstableloader instead, and the overall throughput is similar (I spend 2/3 of the time preparing the SSTables, and 1/3 actually pushing them to nodes), but I believe this still causes a hike in read latency (after the load is complete). Is there a set of best practices for this kind of workload? We would like to avoid interfering with reads as much as possible. I can of course post more information about our setup and requirements if this helps answering. -- Thanks, David Haguenauer