Ironically in my experience the fastest ways to get data into C* are considered “anti-patterns” by most (but I have no problem saturating multiple gigabit network links if I really feel like inserting fast)
It’s been a while since I tried some of the newer approaches though (my fast load code is a few years old). > On Jul 13, 2015, at 5:31 PM, David Haguenauer <m...@kurokatta.org> wrote: > > Hi, > > I have a use case wherein I receive a daily batch of data; it's about > 50M--100M records (a record is a list of integers, keyed by a > UUID). The target is a 12-node cluster. > > Using a simple-minded approach (24 batched inserts in parallel, using > the Ruby client), while the cluster is being read at a rate of about > 150k/s, I get about 15.5k insertions per second. This in itself is > satisfactory, but the concern is that the large amount of writes > causes the read latency to jump up during the insertion, and for a > while after. > > I tried using sstableloader instead, and the overall throughput is > similar (I spend 2/3 of the time preparing the SSTables, and 1/3 > actually pushing them to nodes), but I believe this still causes a > hike in read latency (after the load is complete). > > Is there a set of best practices for this kind of workload? We would > like to avoid interfering with reads as much as possible. > > I can of course post more information about our setup and requirements > if this helps answering. > > -- > Thanks, > David Haguenauer
smime.p7s
Description: S/MIME cryptographic signature