[
https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189552#comment-15189552
]
Diego Ceccarelli edited comment on SOLR-8776 at 3/11/16 12:21 PM:
------------------------------------------------------------------
[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I
uploaded a new patch with a first step. I agree that merge strategy must stay
there, that's why I wrote "partially moved" :) as well as there's
{{IndexSearcher}} and {{SolrIndexSearcher}}, I moved {{RankQuery}} into Lucene
and created {{SolrRankQuery}}. The reason is that the {{RankQuery}} works by
manipulating the collector, through this method:
{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd,
IndexSearcher searcher) throws IOException;
{code}
At the moment in {{SolrIndexSearcher}} there's a special case if a query is a
{{RankQuery}},
{code:java}
private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd)
throws IOException {
Query q = cmd.getQuery();
if (q instanceof RankQuery) {
RankQuery rq = (RankQuery) q;
return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}
Instead of creating a top collector using the {{TopScoreDocCollector.create}},
we wrap a topScoreCollector into a 'RankQuery collector'.
Let me remind that grouping works in two separate stages:
* in the first stage, we iterate on the documents scoring them and keep a
map {{<group -> score>}} where score is the highest score of a document in the
group (the map contains only the TOP-k groups with the highest scores);
* for each group, the documents in the group are ranked and TOP-n documents
for each group are returned.
This logic is mainly implemented into
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene).
We should probably discuss what means reranking for groups: in my opinion we
should keep in mind that the idea behind {{RankQuery}} is that you don't want
to apply the query to all the documents in the collection, so "group-reranking"
should:
* in the first stage, iterate on the documents scoring them as usual and
keep a map {{<group -> score>}};
* for each group, apply RankQuery to the top documents in the group;
* rerank the groups according to the new scores.
In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene,
because what happens in the {{AbstractSecondPassGroupingCollector}} is that for
each group a collector is created:
{code:java}
for (SearchGroup<GROUP_VALUE_TYPE> group : groups) {
//System.out.println(" prep group=" + (group.groupValue == null ? "null"
: group.groupValue.utf8ToString()));
TopDocsCollector<?> collector;
if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}
... so no way to 'inject' the RankQuery collector from Solr. Moving the
{{RankQuery}} into lucene I modified the code in:
{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
collector = ((RankQuery)query).getTopDocsCollector(collector, null,
searcher);
}
{code}
and now documents in groups are reranked based on the RankQuery scores. I'll
work now on 3. i.e., reordering the groups based on the new RankQuery score (I
added a new test that fails at the moment).
Happy to discuss about this first change, if you have comments.
Minor notes:
- At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I
have to check if it is a problem. Otherwise {{RankQuery}} could become an
interface maybe.
- I did some changes to the interface of {{RankQuery.getTopDocsCollector}}:
{{QueryCommand}} was in Solr but used only for getting {{Sort}}, {{len}} was
never used. I added in input the previous collector, instead of creating a new
TopDocScore collector inside {{RankQuery}}.
- Please keep in mind that, as starting point, I'm trying to solve the issue
in the non distributed setting and if we're grouping on a field.
was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the {{MergeStrategy}}. I
uploaded a new patch with a first step. I agree that merge strategy must stay
there, that's why I wrote "partially moved" :) as well as there's
IndexSearcher and {{SolrIndexSearcher}}, I moved {{RankQuery}} in Lucene and
created {{SolrRankQuery}}. The reason is that the {{RankQuery}} works by
manipulating the collector, through this method:
{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd,
IndexSearcher searcher) throws IOException;
{code}
At the moment what happens is that if the query is a RankQuery, and into the
SolrIndexSearcher:
{code:java}
private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd)
throws IOException {
Query q = cmd.getQuery();
if (q instanceof RankQuery) {
RankQuery rq = (RankQuery) q;
return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}
Instead of creating a top collector using the {{TopScoreDocCollector.create}},
we wrap a topScoreCollector into a 'RankQuery collector'.
Let me remind that grouping works in two separate stages:
* in the first stage, we iterate on the documents scoring them and keep a
map {{<group -> score>}} where score is the highest score of a document in the
group (the map contains only the TOP-k groups with the highest scores);
* for each group, the documents in the group are ranked and TOP-n documents
for each group are returned.
This logic is mainly implemented into
{{Abstract(First|Second)PassGroupingCollector}} (within Lucene).
We should probably discuss what means reranking for groups: in my opinion we
should keep in mind that the idea behind {{RankQuery}} is that you don't want
to apply the query to all the documents in the collection, so "group-reranking"
should:
* in the first stage, iterate on the documents scoring them as usual and
keep a map {{group -> score>}};
* for each group, apply RankQuery to the top documents in the group;
* rerank the groups according to the new scores.
In this patch, I'm able to perform 2. I had to move {{RankQuery}} into Lucene,
because what happens in the {{AbstractSecondPassGroupingCollector}} is that for
each group a collector is created:
{code:java}
for (SearchGroup<GROUP_VALUE_TYPE> group : groups) {
//System.out.println(" prep group=" + (group.groupValue == null ? "null"
: group.groupValue.utf8ToString()));
TopDocsCollector<?> collector;
if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use
TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}
... so no way to 'inject' the RankQuery collector from Solr. Moving the
{{RankQuery}} into lucene I modified the code in:
{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
collector = ((RankQuery)query).getTopDocsCollector(collector, null,
searcher);
}
{code}
and now documents in groups are reranked based on the RankQuery scores. I'll
work now on 3. i.e., reordering the groups based on the new RankQuery score (I
added a new test that fails at the moment).
Happy to discuss about this first change, if you have comments.
Minor notes:
- At the moment {{SolrRankQuery}} doesn't extend {{ExtendedQueryBase}}, I
have to check if it is a problem. Otherwise {{RankQuery}} could become an
interface maybe.
- I did some changes to the interface of {{RankQuery.getTopDocsCollector}}:
{{QueryCommand}} was in Solr but used only for getting {{Sort}}, {len} was
never used. I added in input the previous collector, instead of creating a new
TopDocScore collector inside {{RankQuery}}.
> Support RankQuery in grouping
> -----------------------------
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
> Issue Type: Improvement
> Components: search
> Affects Versions: master
> Reporter: Diego Ceccarelli
> Priority: Minor
> Fix For: master
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch,
> 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together
> (see also [3]). In some situations Grouping can be replaced by Collapse and
> Expand Results [4] (that supports reranking), but i) collapse cannot
> guarantee that at least a minimum number of groups will be returned for a
> query, and ii) in the Solr Cloud setting you will have constraints on how to
> partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start
> attaching a patch with a test that fails because grouping does not support
> the rank query and then I'll try to fix the problem, starting from the non
> distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery
> should be refactored and moved (or partially moved) there.
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3]
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E
> [4]
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]