Awesome. In the meantime, I hacked something similar myself. The performance difference does not appear to be material. I think the real killer is the get_range_slices call. Relative to that, the cost of getting the connection appears to be more or less trivial. What can I do to alleviate that cost? CASSANDRA-821 looks interesting -- can I apply that to 0.6.1 ? joost.
On Fri, Apr 23, 2010 at 9:39 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > Great! Created https://issues.apache.org/jira/browse/CASSANDRA-1017 > to track this. > > On Fri, Apr 23, 2010 at 4:12 AM, Johan Oskarsson <jo...@oskarsson.nu> > wrote: > > I have written some code to avoid thrift reconnection, it just keeps the > connection open between get_range_slices calls. > > I can extract that and put it up but not until early next week. > > > > /Johan > > > > On 23 apr 2010, at 05.09, Jonathan Ellis wrote: > > > >> That would be an easy win, sure. > >> > >> On Thu, Apr 22, 2010 at 9:27 PM, Joost Ouwerkerk <jo...@openplaces.org> > wrote: > >>> I was getting client timeouts in ColumnFamilyRecordReader.maybeInit() > when > >>> MapReducing. So I've reduced the Range Batch Size to 256 (from 4096) > and > >>> this seems to have fixed my problem, although it has slowed things down > a > >>> bit -- presumably because there are 16x more calls to get_range_slices. > >>> While I was in that code I noticed that a new client was being created > for > >>> each batch get. By decreasing the batch size, I've increased this > >>> overhead. I'm thinking of re-writing ColumnFamilyRecordReader to do > some > >>> connection pooling. Anyone have any thoughts on that? > >>> joost. > >>> > > > > >