Experimental branch - 2i query improvements

Martin Sumner Tue, 16 Apr 2013 14:50:25 -0700

I've been working on an experimental branch to offer some improvements to
the functionality and performance of 2i queries in Riak:
https://github.com/martinsumner/riak_kv


Explanation:
https://github.com/martinsumner/riak_kv/blob/master/docs/index_speedup.md

There are four basic features that are included:
1. The ability to pin particular 2i indexes into memory (without loss of
consistency on restart of a node)
2. The ability to set partition-level static bloom filters for particular
2i indexes to greatly reduce the disk overheads of exact-term queries with
small result sets (e.g. for queries by a secondary identifier such as email
address)
3. The ability to return indexterms, not just keys as results of a query -
so that those terms can be overloaded with additional information which can
then be filtered by the application without requiring a M/R stage (note
this is already available via Russell Brown's branch -
https://github.com/basho/riak_kv/tree/pt34-index-values)
4. The ability to pass a regular expression to the query iterator - so that
range queries will be filtered based on matches to that regular expression
(for example allowing for non-trailing wildcards) before returning the keys
and terms

Testing is slight at the moment, both functionally and non-functionally.
 This is still very-much an experiment.  We're hoping to do some full scale
volume testing on the branch in the next couple of weeks.

The branch has been developed to solve some problems we have with edge
cases in our implementation for the NHS in England - where we have to
support tracing across an 80M record demographic database.  I'd be
interested if people thought it had value in other environments.

Regards

Martin

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Experimental branch - 2i query improvements

Reply via email to