On Wed, Nov 30, 2011 at 6:01 AM, <riak-users-requ...@lists.basho.com> wrote:
> From: Jeroen van Dijk <jeroentjevand...@gmail.com> > > The use case I'm talking about is when you are looking for a term that is > very common and thus will yield many results. My understanding of the > implementation of Riak [citation needed] is that the search is divided into > a few phases. The first one is collecting results for each term. After that > comes merging, sorting and limiting the result set. So for this particular > case collecting all results would be infeasible and would kill performance. > Even when a limit is set because limiting comes in a phase after collecting > and the merging of results. > That's correct. We have similar issues. We've resorted to creating the equivalent of multicolumn indexes by joining certain fields together and indexing those. That is only possible because most of the data we want to index is structured or semi-structured. You'd have to determine whether such an approach is feasible for your purposes. We also found 2i to be faster than Search, at the expense of requiring our app to perform tokenization for some of the fields we want to index, but we've stuck with Search as we need composable queries, which 2i does not yet provide. I've read here [1] that one can use search_fold to interrupt the collecting > phase when enough results are fetched. I would like to know if this a > best/official practice and if it really solves the issue? > Search_fold will only be useful if you plan on developing in Erlang and, if my understanding is correct, if you don't care about the order of the results (i.e. no scoring or field sorting). Actually, the results may be partially ordered, as the merge_index backend may store the postings sorted by the inverse of time.
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com