Hi Everyone,

Apologies as this has probably been asked before. Unfortunately I have not been 
able to parse through the list serve to find a reasonable answer and the Basho 
wiki docs seem to be missing this information. I have read up on the secondary 
index docs.

I am interested to better understand how the secondary indexes perform when 
there is a very low distribution of values that are indexed. For example, lets 
say I have a bucket with 1 million objects that I create a secondary index on. 
Now lets say the index is on a value that has an uneven distribution where one 
of the values is not selective while the others are, such that 60% of the 
values fall into a single indexed value, while the remaining 40% have a good 
distribution.

For example, I have a record (i.e. object) where the indexed field is 
‘foobar_bin'. I have 1 million objects in the bucket that have 100 unique 
‘foobar’ values distributed over the 1 million objects. One of the values 
repeats for 60% of the records (600K) and the rest have an even distribution of 
about 4%.

How will the secondary indexes perform with this and is this an appropriate use 
of the secondary indexes? Finally, what I have read is not completely clear on 
what happens if the indexed value is updated when the value has such a low 
degree of selectivity?

We have less than 512 partitions and are using the erlang client.

Thanks in advance - any insights will be much appreciated!


Cheers,
Bryan

----

Bryan Hughes
Go Factory
http://www.go-factory.net
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to