On Thu, Dec 1, 2011 at 10:50 AM, Greg Pascale <g...@clipboard.com> wrote:

> I think your logic is flawed. Each node has fewer *keys to return*, but
> that doesn't mean it has that much less work. Whether you're returning 1
> key or 100, you still have to go to disk to read from the index, and I have
> to imagine that's much more expensive than reading the keys (if there isn't
> a huge number of them). In other words, I believe the latency dominates the
> cost here. It's the same idea as how downloading 100 1MB files is slower
> than 1 100MB file. For a simple query - the only kind 2I supports - I'd
> rather read the whole index with only one disk read.
>

True, but in our use case we are retrieving lots of keys per 2i index and
we are running in EC2 with slow disks.  Thus, parallelism is actually a
benefit.  Think of it as reading from a single disk spindle, versus reading
from multiple ones.  Yes, your minimum latency will be bounded by the
slowest disk, but your worse case latency will be smaller.

Not sure why this would be a concern.  Search's presort option must have
> the full result set before it can fully sort it, no?  There is no reason
> why sorting the results of a a 2i query should be any slower.  In addition,
> 2i is stored in leveldb, and leveldb, like merge_index if I recall
> correctly, stores keys and values sorted. Thus, the result set is already
> partially ordered.
>
> No - presort is applied to keys before any index documents are retrieved,
> so it's quite fast.
>

You can presort the keys retrieved by 2i in the first phase of a MR job, so
it should be similarly fast.


> Yes, the results are ordered in the index, but that doesn't matter to the
> user. The API states result ordering is undefined.
>

That just means the order is not guaranteed. But in practice they are
partially ordered.

Elias
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to