Hi Soren, Excellent question. As you noted, Secondary Indexes and Search cover some of the same ground. But, there are key differences.
As a rule of thumb: Developers should try Secondary Indexes first, and only use Riak Search if they need more flexibility around querying. Secondary Indexes has fewer moving parts, and in technology, that's an advantage. Here's another way to look at it: "Secondary Indexes vs. Search" is kind of like "MySQL vs. Solr". While both solutions allow you to store and retrieve data, Riak Search works best for applications that need full-text search on documents that don't change very much, and are in JSON, XML, or plain-text format. To elaborate, there are two core differences between Secondary Indexes and Search. 1. In Secondary Indexes, the *application* tokenizes the document 80% of the problems that developers face when first using Riak Search relate to either the schema, document formats, or tokenization. Setting this up correctly can be confusing and error prone. Secondary Indexes skips these issues by pushing responsibility for tokenizing the document to the application. This simplifies indexing; Riak indexes your object exactly how you instruct it to, no more, no less. 2. Secondary Indexes use document partitioning (aka: local indexes) Document partitioning means that the index for a document is stored locally on the same server (or in our case, vnode) as the document itself. Picture many small indexes that work together to form a big index. This is the approach that Secondary Indexes use. Riak Search, on the other hand, uses term partitioning (aka: a global index). The document is tokenized, and then the postings (the entries in the index) are written to the different vnodes in the cluster. Picture one big index. There are many tradeoffs between the two approaches, but the most significant is that term-partitioning (the Riak Search approach) has more overhead at write time. In a typical "search" use case, documents don't change very often, so a system sees more query traffic than write traffic. Losing some write performance to gain query performance makes sense. Also, in a term partitioned system, write overhead can be mitigated somewhat by writing documents in batches, which Riak Search supports through the Solr interface. In comparison, document partitioning (the Secondary Index approach) is optimized for the typical KV use case, where the system sees more KV reads and writes and index queries. A document-partitioned system makes KV reads and writes fast, but with more overhead at query time. That said, queries still run within a typical web response time; we’ve done a lot of work to make sure that this is fast. One thing to note is that the Secondary Indexes feature is new, and we’ve deliberately aimed to keep things simple in this first release to get something out there while still leaving room for more advanced features down the road. Secondary Indexes won’t ever support the *full* query interface of Riak Search, but I’m looking forward to seeing it get fleshed out in future releases. So to summarize, while there is some overlap, there are distinct ideal use cases for both products. It's important to understand the tradeoffs, but in general most applications that currently use Riak KV will be better served with Secondary Indexes than Riak Search. Hope that helps! Best, Rusty On Tue, Oct 25, 2011 at 4:53 AM, Soren Hansen <so...@linux2go.dk> wrote: > From a user's perspective, 2I and Search seem incredibly similar. > > Both offer a way to efficiently query Riak for objects based on things > other than their keys. The fact that 2I uses explicitly set indices, > while Riak Search indexes the contents of Riak objects[1] seems like a > minor detail. > > The interface for Riak search is much richer, and notably supports > querying on multiple terms in one go. > > My question is: What would be my motivation for using 2I? As far as I > can tell, anything I can do with 2I, I can also do with Search, so the > differences must lie elsewhere (performance? availability? > consistency?), and I'm at a bit of a loss here. > > [1]: I realise Riak Search can index things that aren't in Riak KV, > but that's beside the point for this particular discussion. > > -- > Soren Hansen | http://linux2go.dk/ > Ubuntu Developer | http://www.ubuntu.com/ > OpenStack Developer | http://www.openstack.org/ > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > -- Rusty Klophaus (@rustyio) *Basho Technologies, Inc.* www.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com