Hi Nate,

No, Secondary Indexes doesn't attempt to cluster index integer keys. Since
Secondary Indexes uses document-partitioning, only the *document* key (and
not the *index* key) has any bearing on where the data lives.

As Sean mentioned, the other approach is to use term-partitioning, where the
index key determines how the index data is partitioned. In the case of Riak
Search, which uses term-partitioning, we still hash the index key, and don't
make any attempt to cluster nearby integer keys on the same partition. The
reason is that once you begin clustering keys, then you need to understand
something about the underlying data being represented. Consider, for
example, an index on birth month and an index on shoe size. The month would
be in the range of 1 to 12 with an even distribution, while the shoe size
(in the US) would be in the range of 0 to 15, with a normal distribution
centered around 9 or 10. To spread the index evenly across all partitions,
the system would need to know the upper and lower bounds and the
distribution of values in advance.

Best,
Rusty

On Tue, Oct 11, 2011 at 5:29 PM, Nate Lawson <n...@root.org> wrote:

> A related question -- does the secondary index implementation make some
> attempt to cluster "nearby" integer keys for range queries? In other words,
> if I have an integer secondary index on a set of keys, is this taken into
> account in the partition function?
>
> Since you have to query the full covering set anyway to get all the
> results, I'm not sure how much of an advantage this would be and the
> complexity tradeoff. I'm just wondering if reducing the covering set size
> based on secondary index would be useful for some.
>
> -Nate
>
> On Oct 10, 2011, at 7:09 AM, Sean Cribbs wrote:
>
> > 1. The design of secondary indexes is based on "document partitioning",
> which makes it very simple to keep indexes consistent with the objects they
> reference, but does mean that queries will have to communicate with a
> "covering set" of partitions, more or less (Ring Size / N) of them.  This
> design is the dual of Riak Search's design, or "term partitioning", which
> only requires one partition to respond for a given index/field/term, but can
> be prone to hotspots.
> >
> > I do not know of any comparisons to Cassandra's secondary index feature.
> >
> > 2. See #1. It is stored in the same place as the objects themselves,
> which means queries need to contact many partitions.
> >
> > On Mon, Oct 10, 2011 at 4:56 AM, OZAWA Tsuyoshi <
> ozawa.tsuyo...@gmail.com> wrote:
> > Hi,
> >
> > I have 2 questions about the riak's new feature - secondary index.
> >
> > 1. Is there the scalability benchmark of the secondary index feature? I
> > read some blog about the implementation of riak secondary index, but it
> > seems to run scanning across mostly all nodes at worst case if my
> > expectation is correct. Secondary index feature of riak is more scalable
> > than the one of Cassandra?
> >
> > 2. Where is the metadata which the nodes contain the indexed data? Is it
> > distributed across riak storage servers by consistent hashing?
> >
> > Thank you.
> >
> > Best Regards,
> > OZAWA Tsuyoshi
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
> >
> > --
> > Sean Cribbs <s...@basho.com>
> > Developer Advocate
> > Basho Technologies, Inc.
> > http://www.basho.com/
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



-- 
Rusty Klophaus

*Basho Technologies, Inc.*
11921 Freedom Drive, Suite 550
Reston, VA 20190
www.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to