the literature is not quite clear on this point[1].

Also, to Paul's earlier point, yes, technically couchdb's btree is a b+tree as 
the reductions aren't mapped to the keys, though this impacts the ability to 
get as many keys as possible into RAM with each read.

couch_btree is a pretty clever piece of erlang. The order (as defined in 
Knuth's sense[1]) is determined by a chunk size. In some cases you can improve 
performance by imposing a minimum but by and large no one to date has 
demonstrated any major improvements in it. Which is fine as that's probably not 
where the bottlenecks are.

I think storing the reductions with the inner nodes is a place where an 
optimization could be performed. The problem being that they are essential to 
the incremental feature of MR.


[1] http://en.wikipedia.org/wiki/B-tree


On Mar 14, 2012, at 12:37 AM, Paul Davis wrote:

> On Tue, Mar 13, 2012 at 11:33 PM, Randall Leeds <randall.le...@gmail.com> 
> wrote:
>> On Mar 13, 2012 7:11 PM, "Paul Davis" <paul.joseph.da...@gmail.com> wrote:
>>> 
>>> On Tue, Mar 13, 2012 at 5:50 PM, Bob Dionne
>>> <dio...@dionne-associates.com> wrote:
>>>> That's largely correct, though couchdb's btree is neither fish nor fowl.
>>>> 
>>>> It doesn't have an order (B-trees generally do) and it does store
>> values in the inner nodes (B+trees do not)
>>>> 
>>> 
>>> The lack of order is the big thing.
>> 
>> What is meant by order in this context?
> 
> Maximum number of elements in a node. Ie, the whole "every node except
> the root must have between N/2 and N elements at all times" thing.

Reply via email to