Hi Rusty,

Thanks for the answer.

We have indexed the following json object:


{
    "@class": "com.starsite.data.Answer",
    "answer_text": "momo is the best nepalese food",
    "keywords": null,
    "metaDescription": null,
    "post_date": null,
    "id": "202ba4ac-0fd3-4709-ba84-463e0caa413c",
    "version": 1,
    "scope": [
        "type|com.starsite.data.Answer"
    ]
}

we issued the following query:

answer_text: "food"

and the data we got in keydata was as follows:

[{"p":[4,0],"score":[4.855199135883779,1.8398742574541822]}]


What does 0-indexing mean ? If the scoring in riak-search is done based on 
vector-space model like in lucene, I was expecting the scores to be normalized 
between 0 and 1.

In case of position information, I assume the words 'is' and 'the' are removed 
as part of stopwords removal. If they're not removed the position should have 
been 5. If they are removed, the position should have been 3. The word "food" 
occurs only once. Shouldn't we be getting just one position ?

Thanks,
Archana



On Aug 5, 2011, at 11:08 AM, Rusty Klophaus wrote:

Hi Archana,

Yes, the 'p' attribute is positional information. That list is indicating that 
the term occurs on the 0th and 43rd positions in the document, and is 
0-indexed. Not sure why you are getting two positions if the word only occurred 
once. What was the original query?

The scoring information that you see is a bug. For now, as a workaround, you 
can add the scores together. This will give you a *relative* score, allowing 
you to rank results for the current query.

To fix this issue, some processing needs to happen within riak to combine and 
normalize the scores into a final score that can be used for correct ranking 
against other queries as well. (This is being done for the Solr interface, but 
not the Map/Reduce interface.) Riak Search models scoring after Lucene as much 
as possible, so you can read this for more information about scoring, 
especially the final normalization step: 
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html

This issue is tracked in https://issues.basho.com/show_bug.cgi?id=1154

Best,
Rusty


On Thu, Aug 4, 2011 at 3:27 PM, Archana Bhattarai 
<abhatta...@sharecare.com<mailto:abhatta...@sharecare.com>> wrote:
Hi Rusty,

Thanks a lot for the answer. We could get some data in the keydata  as follows:


[{"p":[43,0],"score":[5.3669048584479,1.7201627119528418]}

But couldn't exactly interpret what it's representing. I believe p is giving 
positional information. But why is it two dimensional when the word we searched 
only occurred once in the document. Does the position ignore stopword positions 
and just count other words? Also why are there two scores ? Isn't the score 
normalized ? Or am I doing something wrong to get these scores ?


Thanks a lot in advance,
Archana


On Jul 22, 2011, at 11:09 AM, Rusty Klophaus wrote:

Hi Archana,

Yes. When you use a search query to initiate a map/reduce job, the scores are 
fed into the first phase as keydata, along with other metadata about the search 
result including positional information and any inline fields.

More information in the links below:

 *   
http://wiki.basho.com/Riak-Search---Querying.html#Querying-Integrated-with-Map-Reduce
 *   http://wiki.basho.com/MapReduce.html (search for "keydata")

Best,
Rusty

On Fri, Jul 22, 2011 at 10:53 AM, Archana Bhattarai 
<abhatta...@sharecare.com<mailto:abhatta...@sharecare.com>> wrote:
Hi,

Is there a way to get back the score while querying via solr interface or 
ideally mapreduce over search ? It looks like solr interface only supports 
sorting.


Thanks in advance,
Archana
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com<mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



--
Rusty Klophaus

Basho Technologies, Inc.
11921 Freedom Drive, Suite 550
Reston, VA 20190
www.basho.com<http://www.basho.com/>





_______________________________________________
riak-users mailing list
riak-users@lists.basho.com<mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




--
Rusty Klophaus

Basho Technologies, Inc.
11921 Freedom Drive, Suite 550
Reston, VA 20190
www.basho.com<http://www.basho.com/>



_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to