In that case I should probably wait for 0.7. Is there any fundamental performance difference in get_range_slices between Random and Order-Preserving partitioners. If so, by what factor? joost.
On Fri, Apr 23, 2010 at 10:47 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > You could look into it, but it's not going to be an easy backport > since SSTableReader and SSTableScanner got split into two classes in > trunk. > > On Fri, Apr 23, 2010 at 9:39 AM, Joost Ouwerkerk <jo...@openplaces.org> > wrote: > > Awesome. In the meantime, I hacked something similar myself. The > > performance difference does not appear to be material. I think the real > > killer is the get_range_slices call. Relative to that, the cost of > getting > > the connection appears to be more or less trivial. What can I do to > > alleviate that cost? CASSANDRA-821 looks interesting -- can I apply that > to > > 0.6.1 ? > > joost. > > On Fri, Apr 23, 2010 at 9:39 AM, Jonathan Ellis <jbel...@gmail.com> > wrote: > >> > >> Great! Created https://issues.apache.org/jira/browse/CASSANDRA-1017 > >> to track this. > >> > >> On Fri, Apr 23, 2010 at 4:12 AM, Johan Oskarsson <jo...@oskarsson.nu> > >> wrote: > >> > I have written some code to avoid thrift reconnection, it just keeps > the > >> > connection open between get_range_slices calls. > >> > I can extract that and put it up but not until early next week. > >> > > >> > /Johan > >> > > >> > On 23 apr 2010, at 05.09, Jonathan Ellis wrote: > >> > > >> >> That would be an easy win, sure. > >> >> > >> >> On Thu, Apr 22, 2010 at 9:27 PM, Joost Ouwerkerk < > jo...@openplaces.org> > >> >> wrote: > >> >>> I was getting client timeouts in > ColumnFamilyRecordReader.maybeInit() > >> >>> when > >> >>> MapReducing. So I've reduced the Range Batch Size to 256 (from > 4096) > >> >>> and > >> >>> this seems to have fixed my problem, although it has slowed things > >> >>> down a > >> >>> bit -- presumably because there are 16x more calls to > >> >>> get_range_slices. > >> >>> While I was in that code I noticed that a new client was being > created > >> >>> for > >> >>> each batch get. By decreasing the batch size, I've increased this > >> >>> overhead. I'm thinking of re-writing ColumnFamilyRecordReader to do > >> >>> some > >> >>> connection pooling. Anyone have any thoughts on that? > >> >>> joost. > >> >>> > >> > > >> > > > > > >