Re: Difference between 2I and Search

Rusty Klophaus Tue, 25 Oct 2011 14:09:58 -0700

Hi Soren,

Excellent question. As you noted, Secondary Indexes and Search cover some of
the same ground. But, there are key differences.

As a rule of thumb: Developers should try Secondary Indexes first, and only
use Riak Search if they need more flexibility around querying. Secondary
Indexes has fewer moving parts, and in technology, that's an advantage.

Here's another way to look at it: "Secondary Indexes vs. Search" is kind of
like "MySQL vs. Solr". While both solutions allow you to store and retrieve
data, Riak Search works best for applications that need full-text search on
documents that don't change very much, and are in JSON, XML, or plain-text
format.

To elaborate, there are two core differences between Secondary Indexes and
Search.

1. In Secondary Indexes, the *application* tokenizes the document

80% of the problems that developers face when first using Riak Search relate
to either the schema, document formats, or tokenization. Setting this up
correctly can be confusing and error prone.

Secondary Indexes skips these issues by pushing responsibility for
tokenizing the document to the application. This simplifies indexing; Riak
indexes your object exactly how you instruct it to, no more, no less.

2. Secondary Indexes use document partitioning (aka: local indexes)

Document partitioning means that the index for a document is stored locally
on the same server (or in our case, vnode) as the document itself. Picture
many small indexes that work together to form a big index. This is the
approach that Secondary Indexes use. Riak Search, on the other hand, uses
term partitioning (aka: a global index). The document is tokenized, and then
the postings (the entries in the index) are written to the different vnodes
in the cluster. Picture one big index.

There are many tradeoffs between the two approaches, but the most
significant is that term-partitioning (the Riak Search approach) has more
overhead at write time. In a typical "search" use case, documents don't
change very often, so a system sees more query traffic than write traffic.
Losing some write performance to gain query performance makes sense. Also,
in a term partitioned system, write overhead can be mitigated somewhat by
writing documents in batches, which Riak Search supports through the Solr
interface.

In comparison, document partitioning (the Secondary Index approach) is
optimized for the typical KV use case, where the system sees more KV reads
and writes and index queries. A document-partitioned system makes KV reads
and writes fast, but with more overhead at query time. That said, queries
still run within a typical web response time; we’ve done a lot of work to
make sure that this is fast.

One thing to note is that the Secondary Indexes feature is new, and we’ve
deliberately aimed to keep things simple in this first release to get
something out there while still leaving room for more advanced features down
the road. Secondary Indexes won’t ever support the *full* query interface of
Riak Search, but I’m looking forward to seeing it get fleshed out in future
releases.

So to summarize, while there is some overlap, there are distinct ideal use
cases for both products. It's important to understand the tradeoffs, but in
general most applications that currently use Riak KV will be better served
with Secondary Indexes than Riak Search.

Hope that helps!

Best,
Rusty

On Tue, Oct 25, 2011 at 4:53 AM, Soren Hansen <so...@linux2go.dk> wrote:

> From a user's perspective, 2I and Search seem incredibly similar.
>
> Both offer a way to efficiently query Riak for objects based on things
> other than their keys. The fact that 2I uses explicitly set indices,
> while Riak Search indexes the contents of Riak objects[1] seems like a
> minor detail.
>
> The interface for Riak search is much richer, and notably supports
> querying on multiple terms in one go.
>
> My question is: What would be my motivation for using 2I? As far as I
> can tell, anything I can do with 2I, I can also do with Search, so the
> differences must lie elsewhere (performance? availability?
> consistency?), and I'm at a bit of a loss here.
>
> [1]: I realise Riak Search can index things that aren't in Riak KV,
> but that's beside the point for this particular discussion.
>
> --
> Soren Hansen        | http://linux2go.dk/
> Ubuntu Developer    | http://www.ubuntu.com/
> OpenStack Developer | http://www.openstack.org/
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

-- 
Rusty Klophaus (@rustyio)

*Basho Technologies, Inc.*
www.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Difference between 2I and Search

Reply via email to