Re: Difference between 2I and Search

Kresten Krab Thorup Wed, 26 Oct 2011 08:42:29 -0700

Perhaps as follow-up FAQ-style questions, you could also answer these questions:


1. What is the typical execution profile of 2i vs solr queries?  Do all queries 
go to all nodes in the cluster, or does it depend on the query?  I imagine that 
range-queries need to go to all vnodes, whereas simple "prop=value" queries may 
be more directly to a node holding the given index.

2. What is the reliability N,R,W / replication / failover properties of these 
indexes?  How do the two different index types reconcile after a netsplit / 
crash?

3. What are the edge conditions if nodes crash in the middle of a put? I.e., 
how dependable are the indexes?  Is the index updated before or after the "riak 
put", or as part of the "same transaction" somehow?

4. What is the storage cost for indexes?  Some of it go into the leveldb 
storage, and some go in merge indexes?

Kresten


Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab
Trifork A/S  |  Margrethepladsen 4  | DK- 8000 Aarhus C |  Phone : +45 8732 
8787  |  www.trifork.com<http://www.trifork.com/>


On Oct 25, 2011, at 2:09 PM, Rusty Klophaus wrote:

Hi Soren,

Excellent question. As you noted, Secondary Indexes and Search cover some of 
the same ground. But, there are key differences.

As a rule of thumb: Developers should try Secondary Indexes first, and only use 
Riak Search if they need more flexibility around querying. Secondary Indexes 
has fewer moving parts, and in technology, that's an advantage.

Here's another way to look at it: "Secondary Indexes vs. Search" is kind of 
like "MySQL vs. Solr". While both solutions allow you to store and retrieve 
data, Riak Search works best for applications that need full-text search on 
documents that don't change very much, and are in JSON, XML, or plain-text 
format.

To elaborate, there are two core differences between Secondary Indexes and 
Search.

1. In Secondary Indexes, the *application* tokenizes the document

80% of the problems that developers face when first using Riak Search relate to 
either the schema, document formats, or tokenization. Setting this up correctly 
can be confusing and error prone.

Secondary Indexes skips these issues by pushing responsibility for tokenizing 
the document to the application. This simplifies indexing; Riak indexes your 
object exactly how you instruct it to, no more, no less.

2. Secondary Indexes use document partitioning (aka: local indexes)

Document partitioning means that the index for a document is stored locally on 
the same server (or in our case, vnode) as the document itself. Picture many 
small indexes that work together to form a big index. This is the approach that 
Secondary Indexes use. Riak Search, on the other hand, uses term partitioning 
(aka: a global index). The document is tokenized, and then the postings (the 
entries in the index) are written to the different vnodes in the cluster. 
Picture one big index.

There are many tradeoffs between the two approaches, but the most significant 
is that term-partitioning (the Riak Search approach) has more overhead at write 
time. In a typical "search" use case, documents don't change very often, so a 
system sees more query traffic than write traffic. Losing some write 
performance to gain query performance makes sense. Also, in a term partitioned 
system, write overhead can be mitigated somewhat by writing documents in 
batches, which Riak Search supports through the Solr interface.

In comparison, document partitioning (the Secondary Index approach) is 
optimized for the typical KV use case, where the system sees more KV reads and 
writes and index queries. A document-partitioned system makes KV reads and 
writes fast, but with more overhead at query time. That said, queries still run 
within a typical web response time; we’ve done a lot of work to make sure that 
this is fast.

One thing to note is that the Secondary Indexes feature is new, and we’ve 
deliberately aimed to keep things simple in this first release to get something 
out there while still leaving room for more advanced features down the road. 
Secondary Indexes won’t ever support the *full* query interface of Riak Search, 
but I’m looking forward to seeing it get fleshed out in future releases.

So to summarize, while there is some overlap, there are distinct ideal use 
cases for both products. It's important to understand the tradeoffs, but in 
general most applications that currently use Riak KV will be better served with 
Secondary Indexes than Riak Search.

Hope that helps!

Best,
Rusty

On Tue, Oct 25, 2011 at 4:53 AM, Soren Hansen 
<so...@linux2go.dk<mailto:so...@linux2go.dk>> wrote:
>From a user's perspective, 2I and Search seem incredibly similar.

Both offer a way to efficiently query Riak for objects based on things
other than their keys. The fact that 2I uses explicitly set indices,
while Riak Search indexes the contents of Riak objects[1] seems like a
minor detail.

The interface for Riak search is much richer, and notably supports
querying on multiple terms in one go.

My question is: What would be my motivation for using 2I? As far as I
can tell, anything I can do with 2I, I can also do with Search, so the
differences must lie elsewhere (performance? availability?
consistency?), and I'm at a bit of a loss here.

[1]: I realise Riak Search can index things that aren't in Riak KV,
but that's beside the point for this particular discussion.

--
Soren Hansen        | http://linux2go.dk/
Ubuntu Developer    | http://www.ubuntu.com/
OpenStack Developer | http://www.openstack.org/

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com<mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



--
Rusty Klophaus (@rustyio)

Basho Technologies, Inc.
www.basho.com<http://www.basho.com/>


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com<mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Difference between 2I and Search

Reply via email to