Re: What is the optimal size of batch mutate batches?

Ben Browning Tue, 11 May 2010 05:21:39 -0700

I like to base my batch sizes off of the total number of columns
instead of the number of rows. This effectively means counting the
number of Mutation objects in your mutation map and submitting the
batch once it reaches a certain size. For my data, batch sizes of
about 25,000 columns work best. You'll need to adjust this up or down
depending on the size of your column names / values and available
memory.


With this strategy the "bushiness" of your rows shouldn't be a problem.

Ben


On Tue, May 11, 2010 at 7:54 AM, David Boxenhorn <da...@lookin2.com> wrote:
> I am saving a large amount of data to Cassandra using batch mutate. I have
> found that my speed is proportional to the size of the batch. It was very
> slow when I was inserting one row at a time, but when I created batches of
> 100 rows and mutated them together, it went 100 times faster. (OK, I didn't
> measure it, but it was MUCH faster.)
>
> My problem is that my rows are of very varying degrees of bushiness (i.e.
> number of supercolums and columns per row). I inserted 592,500 rows
> successfully, in a few minutes, and then I hit a batch of exceptionally
> bushy rows and ran out of memory.
>
> Does anyone have any suggestions about how to deal with this problem? I can
> make my algorithm smarter by taking into account the size of the rows and
> not just blindly do 100 at a time, but I want to solve this problem as
> generally as possible, and not depend on trial and error, and on the
> specific configuration of the machine I happen to be working on right now. I
> don't even know if the critical parameter is the total size of the values,
> or the number of columns, or what? Or maybe there's some optimal batch size,
> and that's what I should use always?
>
> Thanks.
>

Re: What is the optimal size of batch mutate batches?

Reply via email to