Re: MapReduce, Timeouts and Range Batch Size

2010-04-26 Thread Jonathan Ellis
OPP will be marginally faster. Maybe 10%? I don't think anyone has benchmarked it. On Fri, Apr 23, 2010 at 10:30 AM, Joost Ouwerkerk wrote: > In that case I should probably wait for 0.7.  Is there any fundamental > performance difference in get_range_slices between Random and > Order-Preserving

Re: MapReduce, Timeouts and Range Batch Size

2010-04-23 Thread Joost Ouwerkerk
In that case I should probably wait for 0.7. Is there any fundamental performance difference in get_range_slices between Random and Order-Preserving partitioners. If so, by what factor? joost. On Fri, Apr 23, 2010 at 10:47 AM, Jonathan Ellis wrote: > You could look into it, but it's not going

Re: MapReduce, Timeouts and Range Batch Size

2010-04-23 Thread Jonathan Ellis
You could look into it, but it's not going to be an easy backport since SSTableReader and SSTableScanner got split into two classes in trunk. On Fri, Apr 23, 2010 at 9:39 AM, Joost Ouwerkerk wrote: > Awesome.  In the meantime, I hacked something similar myself.  The > performance difference does

Re: MapReduce, Timeouts and Range Batch Size

2010-04-23 Thread Joost Ouwerkerk
Awesome. In the meantime, I hacked something similar myself. The performance difference does not appear to be material. I think the real killer is the get_range_slices call. Relative to that, the cost of getting the connection appears to be more or less trivial. What can I do to alleviate that

Re: MapReduce, Timeouts and Range Batch Size

2010-04-23 Thread Jonathan Ellis
Great! Created https://issues.apache.org/jira/browse/CASSANDRA-1017 to track this. On Fri, Apr 23, 2010 at 4:12 AM, Johan Oskarsson wrote: > I have written some code to avoid thrift reconnection, it just keeps the > connection open between get_range_slices calls. > I can extract that and put it

Re: MapReduce, Timeouts and Range Batch Size

2010-04-23 Thread Johan Oskarsson
I have written some code to avoid thrift reconnection, it just keeps the connection open between get_range_slices calls. I can extract that and put it up but not until early next week. /Johan On 23 apr 2010, at 05.09, Jonathan Ellis wrote: > That would be an easy win, sure. > > On Thu, Apr 22

Re: MapReduce, Timeouts and Range Batch Size

2010-04-22 Thread Jonathan Ellis
That would be an easy win, sure. On Thu, Apr 22, 2010 at 9:27 PM, Joost Ouwerkerk wrote: > I was getting client timeouts in ColumnFamilyRecordReader.maybeInit() when > MapReducing.  So I've reduced the Range Batch Size to 256 (from 4096) and > this seems to have fixed my problem, although it has

MapReduce, Timeouts and Range Batch Size

2010-04-22 Thread Joost Ouwerkerk
I was getting client timeouts in ColumnFamilyRecordReader.maybeInit() when MapReducing. So I've reduced the Range Batch Size to 256 (from 4096) and this seems to have fixed my problem, although it has slowed things down a bit -- presumably because there are 16x more calls to get_range_slices. Whil