Awesome.  In the meantime, I hacked something similar myself.  The
performance difference does not appear to be material.  I think the real
killer is the get_range_slices call.  Relative to that, the cost of getting
the connection appears to be more or less trivial.  What can I do to
alleviate that cost?  CASSANDRA-821 looks interesting -- can I apply that to
0.6.1 ?
joost.

On Fri, Apr 23, 2010 at 9:39 AM, Jonathan Ellis <jbel...@gmail.com> wrote:

> Great!  Created https://issues.apache.org/jira/browse/CASSANDRA-1017
> to track this.
>
> On Fri, Apr 23, 2010 at 4:12 AM, Johan Oskarsson <jo...@oskarsson.nu>
> wrote:
> > I have written some code to avoid thrift reconnection, it just keeps the
> connection open between get_range_slices calls.
> > I can extract that and put it up but not until early next week.
> >
> > /Johan
> >
> > On 23 apr 2010, at 05.09, Jonathan Ellis wrote:
> >
> >> That would be an easy win, sure.
> >>
> >> On Thu, Apr 22, 2010 at 9:27 PM, Joost Ouwerkerk <jo...@openplaces.org>
> wrote:
> >>> I was getting client timeouts in ColumnFamilyRecordReader.maybeInit()
> when
> >>> MapReducing.  So I've reduced the Range Batch Size to 256 (from 4096)
> and
> >>> this seems to have fixed my problem, although it has slowed things down
> a
> >>> bit -- presumably because there are 16x more calls to get_range_slices.
> >>> While I was in that code I noticed that a new client was being created
> for
> >>> each batch get.  By decreasing the batch size, I've increased this
> >>> overhead.  I'm thinking of re-writing ColumnFamilyRecordReader to do
> some
> >>> connection pooling.  Anyone have any thoughts on that?
> >>> joost.
> >>>
> >
> >
>

Reply via email to