[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

Diego Ceccarelli (JIRA) Thu, 10 Mar 2016 09:09:10 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189552#comment-15189552
 ]


Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:08 PM:
-----------------------------------------------------------------

[~joel.bernstein] thanks for pointing out about the {MergeStrategy}. I uploaded 
a new patch with a first step. I agree that merge strategy must stay there, 
that's why I wrote "partially moved" :)   as well as there's IndexSearcher and 
SolrIndexSearcher, I moved {RankQuery} in Lucene and created lucene 
{SolrRankQuery}.  The reason is that the {RankQuery} works by manipulating the 
collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

    Query q = cmd.getQuery();
    if (q instanceof RankQuery) {
      RankQuery rq = (RankQuery) q;
      return rq.getTopDocsCollector(len, cmd, this);
    }
    ..
{code}

Instead of creating a top collector using the {TopScoreDocCollector.create}, we 
wrap a topScoreCollector into a 'RankQuery' collector.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map {<group -> score>} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group in the top groups documents in the group are ranked and top 
documents for each group are returned.

This logic is mainly implemented into 
{Abstract(First|Second)PassGroupingCollector} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind RankQuery is that you don't want to 
apply the query to all the documents in the collection, so the "group-reranking"
should: 

   1 in the first stage, we iterate on the documents scoring them as usual and 
keep a map {group -> score>};
   2 for each group, RankQuery is applied to the top documents in the group;
   3 groups will be reranked according to the new scores.

In this patch, I'm able to perform 2. I had to move RankQuery into Lucene, 
because what happens in the 
{AbstractSecondPassGroupingCollector} is that for each group a collector is 
created: 

{code:java}
 for (SearchGroup<GROUP_VALUE_TYPE> group : groups) {
      //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
      TopDocsCollector<?> collector;
      if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
        // Sort by score
        collector = TopScoreDocCollector.create(maxDocsPerGroup);
    ...
{code}

... so no way to 'inject' the reranking collector from Solr. Moving the 
RankQuery into lucene I modified the code in: 

{code:java}
        collector = TopScoreDocCollector.create(maxDocsPerGroup);
        if (query != null && query instanceof RankQuery){
          collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
        }
{code}

and now documents in groups are reranked. I'll work now on 3. i.e., reordering 
the groups based on the new rerank score
(I added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {SolrRankQuery} doesn't extend {ExtendedQueryBase}, I have to 
check if it is a problem. RankQuery could become an interface maybe.
  - I did some changes to the interface of {RankQuery.getTopDocsCollector}: 
{QueryCommand} was in solr but used only for getting {Sort}, len was never 
used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {RankQuery}. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the MergeStrategy. I uploaded a 
new patch with a first step.
I agree that merge strategy must stay there, that's why I wrote "partially 
moved" :)  
as well as there's IndexSearcher and SolrIndexSearcher, I moved {RankQuery} in 
Lucene and created lucene {SolrRankQuery}. 
The reason is that the {RankQuery} works by manipulating the collector, through 
this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

    Query q = cmd.getQuery();
    if (q instanceof RankQuery) {
      RankQuery rq = (RankQuery) q;
      return rq.getTopDocsCollector(len, cmd, this);
    }
    ..
{code}

Instead of creating a topCollector using the {TopScoreDocCollector.create}, we 
wrap a topScoreCollector into a ReRanking 
collector.

Let me remind that grouping works in two separate stages:
  1. in the first stage, we iterate on the documents scoring them and keep a 
map {<group -> score>} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
  2. for each group in the top groups documents in the group are ranked and top 
documents for each group are returned.

This logic is mainly implemented into 
{Abstract(First|Second)PassGroupingCollector} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind RankQuery is that you don't want to 
apply the query to all the documents in the collection, so the "group-reranking"
should: 

   1 in the first stage, we iterate on the documents scoring them as usual and 
keep a map {group -> score>};
   2 for each group, RankQuery is applied to the top documents in the group;
   3 groups will be reranked according to the new scores.

In this patch, I'm able to perform 2. I had to move RankQuery into Lucene, 
because what happens in the 
{AbstractSecondPassGroupingCollector} is that for each group a collector is 
created: 

{code:java}
 for (SearchGroup<GROUP_VALUE_TYPE> group : groups) {
      //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
      TopDocsCollector<?> collector;
      if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
        // Sort by score
        collector = TopScoreDocCollector.create(maxDocsPerGroup);
    ...
{code}

... so no way to 'inject' the reranking collector from Solr. Moving the 
RankQuery into lucene I modified the code in: 

{code:java}
        collector = TopScoreDocCollector.create(maxDocsPerGroup);
        if (query != null && query instanceof RankQuery){
          collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
        }
{code}

and now documents in groups are reranked. I'll work now on 3. i.e., reordering 
the groups based on the new rerank score
(I added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {SolrRankQuery} doesn't extend {ExtendedQueryBase}, I have to 
check if it is a problem. RankQuery could become an interface maybe.
  - I did some changes to the interface of {RankQuery.getTopDocsCollector}: 
{QueryCommand} was in solr but used only for getting {Sort}, len was never 
used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {RankQuery}. 

> Support RankQuery in grouping
> -----------------------------
>
>                 Key: SOLR-8776
>                 URL: https://issues.apache.org/jira/browse/SOLR-8776
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: master
>            Reporter: Diego Ceccarelli
>            Priority: Minor
>             Fix For: master
>
>         Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

Reply via email to