Hi Jeroen, Your understanding is correct, the search query is parsed into a tree, where each leaf of the tree corresponds to a term. Each leaf sends back all matching terms, and results are intersected (or unioned) where the branches come together. So yes, if you were to run a search on a term with a large number of results, the system reads the entire list of keys (not objects) for that result.
You may want to take another look at inline fields. They allow you to limit the results at the leaf level, and can greatly improve performance for common terms. The example I generally use to illustrate inline fields is to imagine searching for all males living in a specific zip code or postal code. In a normal search, a query on zip code would return ~100k results, and a query on "male" would return roughly half of the world's population. However, you can mark gender as an inline field, and then structure your query as two parts: a primary query on the zip code, and a filter on the gender. The filter is applied directly after the data is fetched from disk, before it is streamed through the rest of the system, so it is a very fast way to limit your results. That said, there are currently known issues around sorting and pagination in Riak Search, the upshot is that if you apply sorting and pagination at the same time, it can give incorrect or unpredictable results; this might be something to consider while planning your application. ( https://issues.basho.com/show_bug.cgi?id=867) I would recommend against using search_fold because it could break in the future, it is not intended to be a part of the public API. Hope that helps, Best, Rusty On Wed, Nov 30, 2011 at 5:01 AM, Jeroen van Dijk <jeroentjevand...@gmail.com > wrote: > Hi all, > > I'm currently evaluating the search functionality of Riak. This involves > porting an application from Postgres/Sphinx to possibly only Riak. The > application I'm porting doesn't need advanced search, but it does need a > level of search I have come to believe this isn't provided in a feasible > way by Riak Search out of the box. I've also seen some sources that make me > worry about the performance of search [1, 2]. I hope to be proved wrong > here or get some advice how to work around this so I can just use Riak > Search and without an external search facility. As a disclaimer, I haven't > done any benchmarks yet and this is just based on what I have read so far. > > The use case I'm talking about is when you are looking for a term that is > very common and thus will yield many results. My understanding of the > implementation of Riak [citation needed] is that the search is divided into > a few phases. The first one is collecting results for each term. After that > comes merging, sorting and limiting the result set. So for this particular > case collecting all results would be infeasible and would kill performance. > Even when a limit is set because limiting comes in a phase after collecting > and the merging of results. > > The first question is, can the above be confirmed? I've read about Riak > Search performance optimization here [3], but that seems to be for a > different problem. > > I've read here [1] that one can use search_fold to interrupt the > collecting phase when enough results are fetched. I would like to know if > this a best/official practice and if it really solves the issue? > > I guess what I'm missing is a wiki page of "when and when not to use Riak > Search" or "how and how not to use Riak search". If this already exists I > completely missed it. > > Cheers, > Jeroen > > [1] http://blog.inagist.com/searching-with-riaksearch > [2] > http://www.productionscale.com/home/2011/11/20/building-an-application-upon-riak-part-1.html#axzz1enL4I6KTl > [3] > http://basho.com/blog/technical/2011/07/18/Boosting-Riak-Search-Query-Performance-With-Inline-Fields/ > > > http://wiki.basho.com/Riak-Search.html > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > -- Rusty Klophaus (@rustyio) *Basho Technologies, Inc.* www.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com