On Wed, Nov 30, 2011 at 6:01 AM, <riak-users-requ...@lists.basho.com> wrote:

> From: Jeroen van Dijk <jeroentjevand...@gmail.com>
>
> The use case I'm talking about is when you are looking for a term that is
> very common and thus will yield many results. My understanding of the
> implementation of Riak [citation needed] is that the search is divided into
> a few phases. The first one is collecting results for each term. After that
> comes merging, sorting and limiting the result set. So for this particular
> case collecting all results would be infeasible and would kill performance.
> Even when a limit is set because limiting comes in a phase after collecting
> and the merging of results.
>

That's correct.  We have similar issues.  We've resorted to  creating the
equivalent of multicolumn indexes by joining certain fields together and
indexing those.  That is only possible because most of the data we want to
index is structured or semi-structured.  You'd have to determine whether
such an approach is feasible for your purposes.

We also found 2i to be faster than Search, at the expense of requiring our
app to perform tokenization for some of the fields we want to index, but
we've stuck with Search as we need composable queries, which 2i does not
yet provide.

I've read here [1] that one can use search_fold to interrupt the collecting
> phase when enough results are fetched. I would like to know if this a
> best/official practice and if it really solves the issue?
>

Search_fold will only be useful if you plan on developing in Erlang and, if
my understanding is correct, if you don't care about the order of the
results (i.e. no scoring or field sorting).  Actually, the results may be
partially ordered, as the merge_index backend may store the postings sorted
by the inverse of time.
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to