Hi Rusty,

On Wed, Nov 30, 2011 at 5:49 PM, Rusty Klophaus <ru...@basho.com> wrote:

> Hi Jeroen,
>
> Your understanding is correct, the search query is parsed into a tree,
> where each leaf of the tree corresponds to a term. Each leaf sends back all
> matching terms, and results are intersected (or unioned) where the branches
> come together. So yes, if you were to run a search on a term with a large
> number of results, the system reads the entire list of keys (not objects)
> for that result.
>
> You may want to take another look at inline fields. They allow you to
> limit the results at the leaf level, and can greatly improve performance
> for common terms.
>
> The example I generally use to illustrate inline fields is to imagine
> searching for all males living in a specific zip code or postal code. In a
> normal search, a query on zip code would return ~100k results, and a query
> on "male" would return roughly half of the world's population. However, you
> can mark gender as an inline field, and then structure your query as two
> parts: a primary query on the zip code, and a filter on the gender. The
> filter is applied directly after the data is fetched from disk, before it
> is streamed through the rest of the system, so it is a very fast way to
> limit your results.
>
>
I have to test this out. I currently don't see filters that would apply for
this case. I could maybe simplify the problem by ignoring these common
terms for parts that they are common and regard them as stopwords there. So
to make the example more concrete I would allow to search for the term in
titles but not in the description where it is common. I'm guessing this
will be a custom solution where one needs to manipulate the query before
sending it to Riak Search.


> That said, there are currently known issues around sorting and pagination
> in Riak Search, the upshot is that if you apply sorting and pagination at
> the same time, it can give incorrect or unpredictable results; this might
> be something to consider while planning your application. (
> https://issues.basho.com/show_bug.cgi?id=867)
>

Thanks for pointing this out. I'll keep an eye on this issue.



> I would recommend against using search_fold because it could break in the
> future, it is not intended to be a part of the public API.
>

Thanks for this advise :)



> Hope that helps,
>

Definitely, thank you.

Cheers,
Jeoren



> Best,
> Rusty
>
>
>
> On Wed, Nov 30, 2011 at 5:01 AM, Jeroen van Dijk <
> jeroentjevand...@gmail.com> wrote:
>
>> Hi all,
>>
>> I'm currently evaluating the search functionality of Riak. This involves
>> porting an application from Postgres/Sphinx to possibly only Riak. The
>> application I'm porting doesn't need advanced search, but it does need a
>> level of search I have come to believe this isn't provided in a feasible
>> way by Riak Search out of the box. I've also seen some sources that make me
>> worry about the performance of search [1, 2]. I hope to be proved wrong
>> here or get some advice how to work around this so I can just use Riak
>> Search and without an external search facility. As a disclaimer, I haven't
>> done any benchmarks yet and this is just based on what I have read so far.
>>
>> The use case I'm talking about is when you are looking for a term that is
>> very common and thus will yield many results. My understanding of the
>> implementation of Riak [citation needed] is that the search is divided into
>> a few phases. The first one is collecting results for each term. After that
>> comes merging, sorting and limiting the result set. So for this particular
>> case collecting all results would be infeasible and would kill performance.
>> Even when a limit is set because limiting comes in a phase after collecting
>> and the merging of results.
>>
>> The first question is, can the above be confirmed? I've read about Riak
>> Search performance optimization here [3], but that seems to be for a
>> different problem.
>>
>> I've read here [1] that one can use search_fold to interrupt the
>> collecting phase when enough results are fetched. I would like to know if
>> this a best/official practice and if it really solves the issue?
>>
>> I guess what I'm missing is a wiki page of "when and when not to use Riak
>> Search" or "how and how not to use Riak search". If this already exists I
>> completely missed it.
>>
>> Cheers,
>> Jeroen
>>
>> [1] http://blog.inagist.com/searching-with-riaksearch
>> [2]
>> http://www.productionscale.com/home/2011/11/20/building-an-application-upon-riak-part-1.html#axzz1enL4I6KTl
>> [3]
>> http://basho.com/blog/technical/2011/07/18/Boosting-Riak-Search-Query-Performance-With-Inline-Fields/
>>
>>
>> http://wiki.basho.com/Riak-Search.html
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
>
> --
> Rusty Klophaus (@rustyio)
> *Basho Technologies, Inc.*
> www.basho.com
>
>
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to