Is it inefficient to map over a small bucket when you have millions of other buckets?

Daniel Einspanjer Sun, 11 Jul 2010 09:56:35 -0700

I'm thinking about the pros and cons of Riak vs HBase for Mozilla'sWeave (now Firefox Sync) 2.0 engine.

https://wiki.mozilla.org/Labs/Weave/Sync/2.0/API

The primary use case is that when a user's client performs a sync, itneeds to retrieve all the new items since the last time it synced foreach collection (bookmarks, tabs, history, etc.) that the client isconfigured to sync.If a particular client doesn't sync often, it is possible thatthere might be thousands of items to retrieve, this means that usinglinks *might* run into issues.

HBase's use of ordered keys pushes for a schema where you'd have themodified timestamp in the key. That would allow for quick and easyscanning of just the new items.

Riak however, has a few interesting features such as the on-demandcreation of new buckets that might make it much more flexible... ifthere is a highly performant mechanism for the client to retrieve new data.

What prompted me to post this message was something I thought Iremembered seeing regarding mapping over buckets in Riak. UnfortunatelyI can't find the reference now.

Is it true that in order to map over all the keys in a single bucket,the Riak cluster must actually traverse the entire global keyspace ofall buckets to find the keys that are part of the desired bucket?

In the case where you have tens of millions of users, and you haveeither one bucket per user or (if it were feasible) one bucket per userper collection, it seems like it would be impossible to efficientlyperform a map reduce on one user's bucket.

That seems like such a common scenario, I must have misinterpreted whatI read. I'd really appreciate some clarification there and also wouldbe very interested in any schema proposals or thoughts you might haveabout this use case.


-Daniel

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Is it inefficient to map over a small bucket when you have millions of other buckets?

Reply via email to