Re: Relevancy tests

Doug Turnbull Thu, 12 Jun 2014 11:53:09 -0700

Relevancy judgement lists ARE very context sensitive. For example, in a
medical search application you'll have very different relevancy
requirements between a point-of-care applications vs an application being
used to perform general "sit at your desk" research ***even if the content
being served is identical*

Point-of-care is about getting to a solution fast. Its targeted. Recency
may be more of a factor. Specific solutions to medical problems may be more
important.

Sit-at-your-desk Research may be more about futzing around with general
knowledge and more about  the "discovery" aspect of search.

Even IF the data sets for the two applications were 100% identical, you
would almost certainly provide different relevancy rules based on the
different use cases.

We do a lot of testing with judgement lists (mostly through our product
Quepid <http://quepid.com> but there are other home-grown scripted tools
people use too). Judgement lists are great for collaborating closely with
your client on what you expect search to do -- ie capture informal use
cases. It lets them make assertions about what the correct order search
results should be. This allows you to optimize for a reasonable set of use
cases.

We've had it work well as long as they're representative in nature. For
example, in a name search application you don't need a "D. Turnbull" and a
"Y. Seeley" to test the case of "first initial/last name" search. You often
just need one exemplar to test/work against to prove you've solved (and
continue to solve) that problem.

Judgement lists based on "experts" tend to break down occasionally when the
person you're collaborating with does not actually reflect the actual
behavior of users. So we'll also work on relevancy in the context of
judgement lists generated programatically through user behavior (ie query
logs) not just what the expert is. More integration work, and requires more
data, but potentially more beneficial for relevancy tuning.

We blog a fair amount about relevancy preproduction and regression testing.
You can read more here
<http://www.opensourceconnections.com/2013/10/21/search-quality-is-about-effective-collaboration/>
, here
<http://www.opensourceconnections.com/blog/2014/06/10/what-is-search-relevancy/>,
and here
<http://www.opensourceconnections.com/2013/10/14/what-is-test-driven-search-relevancy/>.
Hope its helpful to you.

Good luck
-Doug
Search Relevancy Consultant
OpenSource Connections

On Thu, Jun 12, 2014 at 1:47 PM, Ivan Brusic <i...@brusic.com> wrote:

> Perhaps more of an NLP question, but are there any tests regarding
> relevance for Lucene? Given an example corpus of documents, what are the
> golden sets for specific queries? The Wikidump dump is used as a
> benchmarking tool for both indexing and querying in Lucene, but there are
> no metrics in terms of precision.
>
> The Open Relevance project was closed yesterday (
> http://lucene.apache.org/openrelevance/), which is what prompted me to ask
> this question. Was the sub-project closed because others have found
> alternate solutions?
>
> Relevancy is of course extremely context-dependent and objective, but my
> hope is that there is an example catalog somewhere with defined golden
> sets.
>
> Cheers,
>
> Ivan
>

-- 
Doug Turnbull
Search & Big Data Architect
OpenSource Connections <http://o19s.com>

Re: Relevancy tests

Reply via email to