the literature is not quite clear on this point[1]. Also, to Paul's earlier point, yes, technically couchdb's btree is a b+tree as the reductions aren't mapped to the keys, though this impacts the ability to get as many keys as possible into RAM with each read.
couch_btree is a pretty clever piece of erlang. The order (as defined in Knuth's sense[1]) is determined by a chunk size. In some cases you can improve performance by imposing a minimum but by and large no one to date has demonstrated any major improvements in it. Which is fine as that's probably not where the bottlenecks are. I think storing the reductions with the inner nodes is a place where an optimization could be performed. The problem being that they are essential to the incremental feature of MR. [1] http://en.wikipedia.org/wiki/B-tree On Mar 14, 2012, at 12:37 AM, Paul Davis wrote: > On Tue, Mar 13, 2012 at 11:33 PM, Randall Leeds <randall.le...@gmail.com> > wrote: >> On Mar 13, 2012 7:11 PM, "Paul Davis" <paul.joseph.da...@gmail.com> wrote: >>> >>> On Tue, Mar 13, 2012 at 5:50 PM, Bob Dionne >>> <dio...@dionne-associates.com> wrote: >>>> That's largely correct, though couchdb's btree is neither fish nor fowl. >>>> >>>> It doesn't have an order (B-trees generally do) and it does store >> values in the inner nodes (B+trees do not) >>>> >>> >>> The lack of order is the big thing. >> >> What is meant by order in this context? > > Maximum number of elements in a node. Ie, the whole "every node except > the root must have between N/2 and N elements at all times" thing.