Hi Rusty, On Wed, Nov 30, 2011 at 5:49 PM, Rusty Klophaus <ru...@basho.com> wrote:
> Hi Jeroen, > > Your understanding is correct, the search query is parsed into a tree, > where each leaf of the tree corresponds to a term. Each leaf sends back all > matching terms, and results are intersected (or unioned) where the branches > come together. So yes, if you were to run a search on a term with a large > number of results, the system reads the entire list of keys (not objects) > for that result. > > You may want to take another look at inline fields. They allow you to > limit the results at the leaf level, and can greatly improve performance > for common terms. > > The example I generally use to illustrate inline fields is to imagine > searching for all males living in a specific zip code or postal code. In a > normal search, a query on zip code would return ~100k results, and a query > on "male" would return roughly half of the world's population. However, you > can mark gender as an inline field, and then structure your query as two > parts: a primary query on the zip code, and a filter on the gender. The > filter is applied directly after the data is fetched from disk, before it > is streamed through the rest of the system, so it is a very fast way to > limit your results. > > I have to test this out. I currently don't see filters that would apply for this case. I could maybe simplify the problem by ignoring these common terms for parts that they are common and regard them as stopwords there. So to make the example more concrete I would allow to search for the term in titles but not in the description where it is common. I'm guessing this will be a custom solution where one needs to manipulate the query before sending it to Riak Search. > That said, there are currently known issues around sorting and pagination > in Riak Search, the upshot is that if you apply sorting and pagination at > the same time, it can give incorrect or unpredictable results; this might > be something to consider while planning your application. ( > https://issues.basho.com/show_bug.cgi?id=867) > Thanks for pointing this out. I'll keep an eye on this issue. > I would recommend against using search_fold because it could break in the > future, it is not intended to be a part of the public API. > Thanks for this advise :) > Hope that helps, > Definitely, thank you. Cheers, Jeoren > Best, > Rusty > > > > On Wed, Nov 30, 2011 at 5:01 AM, Jeroen van Dijk < > jeroentjevand...@gmail.com> wrote: > >> Hi all, >> >> I'm currently evaluating the search functionality of Riak. This involves >> porting an application from Postgres/Sphinx to possibly only Riak. The >> application I'm porting doesn't need advanced search, but it does need a >> level of search I have come to believe this isn't provided in a feasible >> way by Riak Search out of the box. I've also seen some sources that make me >> worry about the performance of search [1, 2]. I hope to be proved wrong >> here or get some advice how to work around this so I can just use Riak >> Search and without an external search facility. As a disclaimer, I haven't >> done any benchmarks yet and this is just based on what I have read so far. >> >> The use case I'm talking about is when you are looking for a term that is >> very common and thus will yield many results. My understanding of the >> implementation of Riak [citation needed] is that the search is divided into >> a few phases. The first one is collecting results for each term. After that >> comes merging, sorting and limiting the result set. So for this particular >> case collecting all results would be infeasible and would kill performance. >> Even when a limit is set because limiting comes in a phase after collecting >> and the merging of results. >> >> The first question is, can the above be confirmed? I've read about Riak >> Search performance optimization here [3], but that seems to be for a >> different problem. >> >> I've read here [1] that one can use search_fold to interrupt the >> collecting phase when enough results are fetched. I would like to know if >> this a best/official practice and if it really solves the issue? >> >> I guess what I'm missing is a wiki page of "when and when not to use Riak >> Search" or "how and how not to use Riak search". If this already exists I >> completely missed it. >> >> Cheers, >> Jeroen >> >> [1] http://blog.inagist.com/searching-with-riaksearch >> [2] >> http://www.productionscale.com/home/2011/11/20/building-an-application-upon-riak-part-1.html#axzz1enL4I6KTl >> [3] >> http://basho.com/blog/technical/2011/07/18/Boosting-Riak-Search-Query-Performance-With-Inline-Fields/ >> >> >> http://wiki.basho.com/Riak-Search.html >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> > > > -- > Rusty Klophaus (@rustyio) > *Basho Technologies, Inc.* > www.basho.com > > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com