We implemented a scheme for maintaining secondary indexes using MapReduce and KeyFilters. It's written in Python but there's a description of how the design works and a test script for validating other implementations:
https://github.com/williamsjj/txriakidx Nor sure if that will help...it's reasonably performant and much faster than regular Riak MR on the values. -J On Fri, Apr 22, 2011 at 2:45 PM, Bryan O'Sullivan <b...@mailrank.com> wrote: > In my application, I have a frequent need to store collections of data where > I rarely access the items individually, but much more commonly in aggregate > form. In principle, the mapreduce feature is well suited to this, but it > must be fed a set of {bucket,key} pairs to consume, and maintaining these > collections is painful. > The naive approach would be to store each collection in its own bucket, and > use listkeys to fetch the keys that I can then mapreduce over. However, > listkeys has red flags all over it, presumably because it's implemented as a > scan+filter of all buckets and keys. > The slightly less naive approach would be to store the keys for a collection > in a list. Unfortunately, for a collection of even modest size (10,000 > elements), the cost of retrieving and storing this index to make small > changes (e.g. "I'm adding one element to the collection") rapidly becomes > untenable, and that's before such knotty issues as "how do I delete items > from this index?" arise. This is essentially the approach taken by the > riak_link_index project. > A more subtle approach would be to maintain a hierarchical index, with a > "current" index that is modified whenever items are added, and promoted to > "frozen" once it reaches a certain size. Frozen indices would be maintained > in an index of their own, which itself would be frozen, and so on. > Performance would be maintained by limiting the maximum size of each index > object. This is a similar scheme to the traditional filesystem approach of > direct, single indirect, and double indirect block indexing. It's also quite > a lot of work to implement. > Does Basho have any plans to address this? It seems likely to be a very > frequently arising need. > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com