Re: 15 seconds to increment 17k keys?

Oleg Anastastasyev Mon, 05 Sep 2011 23:02:06 -0700

> in the family. There are millions of rows. Each operation consists of
> doing a batch_insert through pycassa, which increments ~17k keys. A
> majority of these keys are new in each batch.
> 
>  Each operation is taking up to 15 seconds. For our system this is a
> significant bottleneck.
>


Try to split your batch to smaller pieces and launch them in parallel. This way
you may get better performance, because all cores are employed and there will be
less copying/rebuilding of large structures inside thrift & cassandra. I found
that 1k rows in a batch is behaving better than 10k.

It is also a good idea to split batch to slices according to replication
strategy and communicate appropriate slice directly to its natural endpoint.
This will reduce neccessary intercommunication between nodes.

Re: 15 seconds to increment 17k keys?

Reply via email to