I've been working on an experimental branch to offer some improvements to the functionality and performance of 2i queries in Riak: https://github.com/martinsumner/riak_kv
Explanation: https://github.com/martinsumner/riak_kv/blob/master/docs/index_speedup.md There are four basic features that are included: 1. The ability to pin particular 2i indexes into memory (without loss of consistency on restart of a node) 2. The ability to set partition-level static bloom filters for particular 2i indexes to greatly reduce the disk overheads of exact-term queries with small result sets (e.g. for queries by a secondary identifier such as email address) 3. The ability to return indexterms, not just keys as results of a query - so that those terms can be overloaded with additional information which can then be filtered by the application without requiring a M/R stage (note this is already available via Russell Brown's branch - https://github.com/basho/riak_kv/tree/pt34-index-values) 4. The ability to pass a regular expression to the query iterator - so that range queries will be filtered based on matches to that regular expression (for example allowing for non-trailing wildcards) before returning the keys and terms Testing is slight at the moment, both functionally and non-functionally. This is still very-much an experiment. We're hoping to do some full scale volume testing on the branch in the next couple of weeks. The branch has been developed to solve some problems we have with edge cases in our implementation for the NHS in England - where we have to support tracing across an 80M record demographic database. I'd be interested if people thought it had value in other environments. Regards Martin
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com