I'm thinking about the pros and cons of Riak vs HBase for Mozilla's Weave (now Firefox Sync) 2.0 engine.
https://wiki.mozilla.org/Labs/Weave/Sync/2.0/API

The primary use case is that when a user's client performs a sync, it needs to retrieve all the new items since the last time it synced for each collection (bookmarks, tabs, history, etc.) that the client is configured to sync. If a particular client doesn't sync often, it is possible that there might be thousands of items to retrieve, this means that using links *might* run into issues.

HBase's use of ordered keys pushes for a schema where you'd have the modified timestamp in the key. That would allow for quick and easy scanning of just the new items.

Riak however, has a few interesting features such as the on-demand creation of new buckets that might make it much more flexible... if there is a highly performant mechanism for the client to retrieve new data.

What prompted me to post this message was something I thought I remembered seeing regarding mapping over buckets in Riak. Unfortunately I can't find the reference now.

Is it true that in order to map over all the keys in a single bucket, the Riak cluster must actually traverse the entire global keyspace of all buckets to find the keys that are part of the desired bucket?

In the case where you have tens of millions of users, and you have either one bucket per user or (if it were feasible) one bucket per user per collection, it seems like it would be impossible to efficiently perform a map reduce on one user's bucket.

That seems like such a common scenario, I must have misinterpreted what I read. I'd really appreciate some clarification there and also would be very interested in any schema proposals or thoughts you might have about this use case.

-Daniel

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to