Re: Riak Secondary Index Limits

2014-08-29 Thread Sean Cribbs
Correct, there is a key in LevelDB for each Riak key that has the index term attached. This is somewhat mitigated by Snappy compression (600K records might very well compress into a single block), but it is nowhere near the storage efficiency of something like Solr's indexes. It still has to scan.

Re: Riak Secondary Index Limits

2014-08-29 Thread Bryan
Hi Sean, Sweet! Thanks for the explanation. Much appreciated and very helpful. Just a bit more clarification, on an equality lookup, where the ‘foobar’ key has a value ‘barfoo’ that is the very low-cardinality, are those indexed objects individually stored as a key/value term which then is enu

Re: Riak Secondary Index Limits

2014-08-29 Thread Sean Cribbs
I made a minor mistake in my example, the PrimaryKey is part of the index key, whereas the value contains nothing. It's more like this: {i, IndexName, IndexTerm, PrimaryKey} => <<>> So for the initial seek, we construct a key like so: {i, <<"foobar_bin">>, <<"baz">>, <<>>} On Fri, Aug 29, 2014

Re: Riak Secondary Index Limits

2014-08-29 Thread Sean Cribbs
Hi Bryan, Index entries are just keys in LevelDB like normal values are. So, performance is relatively constant at write time but is O(N) at read (because you are scanning the index). The high-cardinality term will definitely be expensive to enumerate, but the low-cardinality terms will be much le

Riak Secondary Index Limits

2014-08-28 Thread Bryan
Hi Everyone, Apologies as this has probably been asked before. Unfortunately I have not been able to parse through the list serve to find a reasonable answer and the Basho wiki docs seem to be missing this information. I have read up on the secondary index docs. I am interested to better under