[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

Diego Ceccarelli (JIRA) Fri, 11 Mar 2016 04:23:05 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189552#comment-15189552
 ]


Diego Ceccarelli edited comment on SOLR-8776 at 3/11/16 12:21 PM:
------------------------------------------------------------------

[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
{{IndexSearcher}} and {{SolrIndexSearcher}}, I moved {{RankQuery}} into Lucene 
and created {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works by 
manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment  in {{SolrIndexSearcher}} there's a special case if a query is a 
{{RankQuery}},
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

    Query q = cmd.getQuery();
    if (q instanceof RankQuery) {
      RankQuery rq = (RankQuery) q;
      return rq.getTopDocsCollector(len, cmd, this);
    }
    ..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map {{<group -> score>}} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group,  the documents in the group are ranked and TOP-n documents 
for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so "group-reranking" 
should: 
   *  in the first stage, iterate on the documents scoring them as usual and 
keep a map {{<group -> score>}};
   * for each group, apply RankQuery to the top documents in the group;
   * rerank the groups according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the {{AbstractSecondPassGroupingCollector}} is that for 
each group a collector is created: 

{code:java}
 for (SearchGroup<GROUP_VALUE_TYPE> group : groups) {
      //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
      TopDocsCollector<?> collector;
      if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
        // Sort by score
        collector = TopScoreDocCollector.create(maxDocsPerGroup);
    ...
{code}

... so no way to 'inject' the RankQuery collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
        collector = TopScoreDocCollector.create(maxDocsPerGroup);
        if (query != null && query instanceof RankQuery){
          collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
        }
{code}

and now documents in groups are reranked based on the RankQuery scores. I'll 
work now on 3. i.e., reordering the groups based on the new RankQuery score (I 
added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. Otherwise {{RankQuery}} could become an 
interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in Solr but used only for getting {{Sort}}, {{len}} was 
never used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 
  - Please keep in mind that, as starting point, I'm trying to solve the issue 
in the non distributed setting and if we're grouping on a field. 


was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I 
uploaded a new patch with a first step. I agree that merge strategy must stay 
there, that's why I wrote "partially moved" :)   as well as there's 
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and 
created {{SolrRankQuery}}.  The reason is that the {{RankQuery}} works by 
manipulating the collector, through this method:

{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, 
IndexSearcher searcher) throws IOException;
{code}

At the moment what happens is that if the query is a RankQuery, and into the 
SolrIndexSearcher: 
{code:java}
  private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) 
throws IOException {

    Query q = cmd.getQuery();
    if (q instanceof RankQuery) {
      RankQuery rq = (RankQuery) q;
      return rq.getTopDocsCollector(len, cmd, this);
    }
    ..
{code}

Instead of creating a top collector using the {{TopScoreDocCollector.create}}, 
we wrap a topScoreCollector into a 'RankQuery collector'.

Let me remind that grouping works in two separate stages:
   *  in the first stage, we iterate on the documents scoring them and keep a 
map {{<group -> score>}} where score is the highest score of a document in the 
group (the map contains only the TOP-k groups with the highest scores);
   * for each group,  the documents in the group are ranked and TOP-n documents 
for each group are returned.

This logic is mainly implemented into 
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene). 

We should probably discuss what means reranking for groups: in my opinion we 
should keep in mind that the idea behind {{RankQuery}} is that you don't want 
to apply the query to all the documents in the collection, so "group-reranking" 
should: 
   *  in the first stage, iterate on the documents scoring them as usual and 
keep a map {{group -> score>}};
   * for each group, apply RankQuery to the top documents in the group;
   * rerank the groups according to the new scores.

In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene, 
because what happens in the {{AbstractSecondPassGroupingCollector}} is that for 
each group a collector is created: 

{code:java}
 for (SearchGroup<GROUP_VALUE_TYPE> group : groups) {
      //System.out.println("  prep group=" + (group.groupValue == null ? "null" 
: group.groupValue.utf8ToString()));
      TopDocsCollector<?> collector;
      if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use 
TopScoreDocCollector
        // Sort by score
        collector = TopScoreDocCollector.create(maxDocsPerGroup);
    ...
{code}

... so no way to 'inject' the RankQuery collector from Solr. Moving the 
{{RankQuery}} into lucene I modified the code in: 

{code:java}
        collector = TopScoreDocCollector.create(maxDocsPerGroup);
        if (query != null && query instanceof RankQuery){
          collector = ((RankQuery)query).getTopDocsCollector(collector, null, 
searcher);
        }
{code}

and now documents in groups are reranked based on the RankQuery scores. I'll 
work now on 3. i.e., reordering the groups based on the new RankQuery score (I 
added a new test that fails at the moment). 
Happy to discuss about this first change, if you have comments.

Minor notes: 
  - At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I 
have to check if it is a problem. Otherwise {{RankQuery}} could become an 
interface maybe.
  - I did some changes to the interface of {{RankQuery.getTopDocsCollector}}: 
{{QueryCommand}} was in Solr but used only for getting {{Sort}}, {len} was 
never used. I added in input the previous collector, instead of creating a new 
TopDocScore collector inside {{RankQuery}}. 

> Support RankQuery in grouping
> -----------------------------
>
>                 Key: SOLR-8776
>                 URL: https://issues.apache.org/jira/browse/SOLR-8776
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: master
>            Reporter: Diego Ceccarelli
>            Priority: Minor
>             Fix For: master
>
>         Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together 
> (see also [3]). In some situations Grouping can be replaced by Collapse and 
> Expand Results [4] (that supports reranking), but i) collapse cannot 
> guarantee that at least a minimum number of groups will be returned for a 
> query, and ii) in the Solr Cloud setting you will have constraints on how to 
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start 
> attaching a patch with a test that fails because grouping does not support 
> the rank query and then I'll try to fix the problem, starting from the non 
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery 
> should be refactored and moved (or partially moved) there. 
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API 
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4] 
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping

Reply via email to