OPP will be marginally faster. Maybe 10%? I don't think anyone has benchmarked it.
On Fri, Apr 23, 2010 at 10:30 AM, Joost Ouwerkerk <jo...@openplaces.org> wrote: > In that case I should probably wait for 0.7. Is there any fundamental > performance difference in get_range_slices between Random and > Order-Preserving partitioners. If so, by what factor? > joost. > > On Fri, Apr 23, 2010 at 10:47 AM, Jonathan Ellis <jbel...@gmail.com> wrote: >> >> You could look into it, but it's not going to be an easy backport >> since SSTableReader and SSTableScanner got split into two classes in >> trunk. >> >> On Fri, Apr 23, 2010 at 9:39 AM, Joost Ouwerkerk <jo...@openplaces.org> >> wrote: >> > Awesome. In the meantime, I hacked something similar myself. The >> > performance difference does not appear to be material. I think the real >> > killer is the get_range_slices call. Relative to that, the cost of >> > getting >> > the connection appears to be more or less trivial. What can I do to >> > alleviate that cost? CASSANDRA-821 looks interesting -- can I apply >> > that to >> > 0.6.1 ? >> > joost. >> > On Fri, Apr 23, 2010 at 9:39 AM, Jonathan Ellis <jbel...@gmail.com> >> > wrote: >> >> >> >> Great! Created https://issues.apache.org/jira/browse/CASSANDRA-1017 >> >> to track this. >> >> >> >> On Fri, Apr 23, 2010 at 4:12 AM, Johan Oskarsson <jo...@oskarsson.nu> >> >> wrote: >> >> > I have written some code to avoid thrift reconnection, it just keeps >> >> > the >> >> > connection open between get_range_slices calls. >> >> > I can extract that and put it up but not until early next week. >> >> > >> >> > /Johan >> >> > >> >> > On 23 apr 2010, at 05.09, Jonathan Ellis wrote: >> >> > >> >> >> That would be an easy win, sure. >> >> >> >> >> >> On Thu, Apr 22, 2010 at 9:27 PM, Joost Ouwerkerk >> >> >> <jo...@openplaces.org> >> >> >> wrote: >> >> >>> I was getting client timeouts in >> >> >>> ColumnFamilyRecordReader.maybeInit() >> >> >>> when >> >> >>> MapReducing. So I've reduced the Range Batch Size to 256 (from >> >> >>> 4096) >> >> >>> and >> >> >>> this seems to have fixed my problem, although it has slowed things >> >> >>> down a >> >> >>> bit -- presumably because there are 16x more calls to >> >> >>> get_range_slices. >> >> >>> While I was in that code I noticed that a new client was being >> >> >>> created >> >> >>> for >> >> >>> each batch get. By decreasing the batch size, I've increased this >> >> >>> overhead. I'm thinking of re-writing ColumnFamilyRecordReader to >> >> >>> do >> >> >>> some >> >> >>> connection pooling. Anyone have any thoughts on that? >> >> >>> joost. >> >> >>> >> >> > >> >> > >> > >> > > >