Bulk loading performance

David Haguenauer Mon, 13 Jul 2015 15:34:28 -0700

Hi,

I have a use case wherein I receive a daily batch of data; it's about
50M--100M records (a record is a list of integers, keyed by a
UUID). The target is a 12-node cluster.


Using a simple-minded approach (24 batched inserts in parallel, using
the Ruby client), while the cluster is being read at a rate of about
150k/s, I get about 15.5k insertions per second. This in itself is
satisfactory, but the concern is that the large amount of writes
causes the read latency to jump up during the insertion, and for a
while after.

I tried using sstableloader instead, and the overall throughput is
similar (I spend 2/3 of the time preparing the SSTables, and 1/3
actually pushing them to nodes), but I believe this still causes a
hike in read latency (after the load is complete).

Is there a set of best practices for this kind of workload? We would
like to avoid interfering with reads as much as possible.

I can of course post more information about our setup and requirements
if this helps answering.

-- 
Thanks,
David Haguenauer

Bulk loading performance

Reply via email to