Re: Bulk loading performance

Graham Sanderson Mon, 13 Jul 2015 16:24:13 -0700

Ironically in my experience the fastest ways to get data into C* are considered 
“anti-patterns” by most (but I have no problem saturating multiple gigabit 
network links if I really feel like inserting fast)


It’s been a while since I tried some of the newer approaches though (my fast 
load code is a few years old).

> On Jul 13, 2015, at 5:31 PM, David Haguenauer <m...@kurokatta.org> wrote:
> 
> Hi,
> 
> I have a use case wherein I receive a daily batch of data; it's about
> 50M--100M records (a record is a list of integers, keyed by a
> UUID). The target is a 12-node cluster.
> 
> Using a simple-minded approach (24 batched inserts in parallel, using
> the Ruby client), while the cluster is being read at a rate of about
> 150k/s, I get about 15.5k insertions per second. This in itself is
> satisfactory, but the concern is that the large amount of writes
> causes the read latency to jump up during the insertion, and for a
> while after.
> 
> I tried using sstableloader instead, and the overall throughput is
> similar (I spend 2/3 of the time preparing the SSTables, and 1/3
> actually pushing them to nodes), but I believe this still causes a
> hike in read latency (after the load is complete).
> 
> Is there a set of best practices for this kind of workload? We would
> like to avoid interfering with reads as much as possible.
> 
> I can of course post more information about our setup and requirements
> if this helps answering.
> 
> -- 
> Thanks,
> David Haguenauer

smime.p7s
Description: S/MIME cryptographic signature

Re: Bulk loading performance

Reply via email to