Hi Everyone,
Apologies as this has probably been asked before. Unfortunately I have not been
able to parse through the list serve to find a reasonable answer and the Basho
wiki docs seem to be missing this information. I have read up on the secondary
index docs.
I am interested to better understand how the secondary indexes perform when
there is a very low distribution of values that are indexed. For example, lets
say I have a bucket with 1 million objects that I create a secondary index on.
Now lets say the index is on a value that has an uneven distribution where one
of the values is not selective while the others are, such that 60% of the
values fall into a single indexed value, while the remaining 40% have a good
distribution.
For example, I have a record (i.e. object) where the indexed field is
‘foobar_bin'. I have 1 million objects in the bucket that have 100 unique
‘foobar’ values distributed over the 1 million objects. One of the values
repeats for 60% of the records (600K) and the rest have an even distribution of
about 4%.
How will the secondary indexes perform with this and is this an appropriate use
of the secondary indexes? Finally, what I have read is not completely clear on
what happens if the indexed value is updated when the value has such a low
degree of selectivity?
We have less than 512 partitions and are using the erlang client.
Thanks in advance - any insights will be much appreciated!
Cheers,
Bryan
----
Bryan Hughes
Go Factory
http://www.go-factory.net
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com