Re: Secondary Indexes - Feedback?

Greg Pascale Thu, 01 Dec 2011 10:50:30 -0800

> That's a concern, but you gain parallelism, compared to Search's single term 
> index.
> 
> 
>


> While they are more expensive in the sense that they require more nodes to 
> participate, they split the load between the nodes, thus overall, the work 
> should be about the same, and unless the nodes are busy with some other work, 
> it should complete sooner, as each node has less work to do.
> 
> 
> 


I think your logic is flawed. Each node has fewer keys to return, but that 
doesn't mean it has that much less work. Whether you're returning 1 key or 100, 
you still have to go to disk to read from the index, and I have to imagine 
that's much more expensive than reading the keys (if there isn't a huge number 
of them). In other words, I believe the latency dominates the cost here. It's 
the same idea as how downloading 100 1MB files is slower than 1 100MB file. For 
a simple query - the only kind 2I supports - I'd rather read the whole index 
with only one disk read.


And what if I don't have a lot of keys? In many cases, my 2I lookup may only 
ever return one result. For example, imagine a Person record with secondary 
indices over email address and username. Presumably, each email address and 
username is unique, so any 2I queries I do on those fields should return one 
result. I really hate the idea that I have to talk to 1/3 of the machines in my 
ring (and they probably all have to go to disk too) just so that one of them 
can ultimately return my one result.
> Not sure why this would be a concern.  Search's presort option must have the 
> full result set before it can fully sort it, no?  There is no reason why 
> sorting the results of a a 2i query should be any slower.  In addition, 2i is 
> stored in leveldb, and leveldb, like merge_index if I recall correctly, 
> stores keys and values sorted. Thus, the result set is already partially 
> ordered.
> 
> 
> 

No - presort is applied to keys before any index documents are retrieved, so 
it's quite fast. Yes, the results are ordered in the index, but that doesn't 
matter to the user. The API states result ordering is undefined.


-- 
Greg
Clipboard

On Wednesday, November 30, 2011 at 3:05 PM, Elias Levy wrote:

> On Wed, Nov 30, 2011 at 1:32 PM, <riak-users-requ...@lists.basho.com 
> (mailto:riak-users-requ...@lists.basho.com)> wrote:
> > Here at Clipboard, we make very heavy use of Riak Search and a couple of 
> > manual indices here and there. I've wanted to use 2i a few times but have 
> > decided against it for a few reasons:
> > 
> >  1) Apprehension about the coverage set query, as Matt articulated.
> 
> That's a concern, but you gain parallelism, compared to Search's single term 
> index.
>  
> >  2) Lack of ordering of returned results. Generally I just want the top 10 
> > or so, and the ordering information is in the primary key. I can accomplish 
> > this with search via the presort parameter.
> 
> Not sure why this would be a concern.  Search's presort option must have the 
> full result set before it can fully sort it, no?  There is no reason why 
> sorting the results of a a 2i query should be any slower.  In addition, 2i is 
> stored in leveldb, and leveldb, like merge_index if I recall correctly, 
> stores keys and values sorted. Thus, the result set is already partially 
> ordered. 
> 
> > To me, the implementations of search and 2I are backwards. Search has 
> > scalability issues because term-based partitioning optimizes for 
> > single-term queries, but creates huge hotspots making many AND queries 
> > prohibitively expensive. 2I's document-based partitioning makes single-term 
> > queries more expensive (coverage set) but should allow AND queries to 
> > scale. But 2i only supports single-term queries!
> 
> While they are more expensive in the sense that they require more nodes to 
> participate, they split the load between the nodes, thus overall, the work 
> should be about the same, and unless the nodes are busy with some other work, 
> it should complete sooner, as each node has less work to do. 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Secondary Indexes - Feedback?

Reply via email to