RE: Re: Custom scores and sort

Claude Lepère Fri, 25 Mar 2022 14:43:10 -0700

Hi Adrien . Thank you for your reply.


Here a detailed example in order to clarify what I try to do:
The name of the “only once score” field = “onlyOnce”, its boost = 5;
2 documents:

  1.  doc1 has 2 times the onlyOnce field with the values “2” and “3”, plus 
some other fields
  2.  doc2 has 1 onlyOnce field with the value “2”, plus some other fields
The SHOULD query = custom(onlyOnce:2) custom(onlyOnce:3)

The “onlyOnce” field must be counted only once per document; to this end, I 
give my CustomScoreQuery subclass a map “doc ID to field name” as argument (doc 
ID is my ID, not the doc of Lucene):

  1.  doc1:
     *   at calculation of the custom score of onlyOnce 2, the map is filled 
with doc1 ID to “onlyOnce” and the returned subscore = 1
     *   at calculation of the custom score of onlyOnce 3, as the map already 
contains the key doc1 ID with the value “onlyOnce” the returned subscore = 0
  2.  doc2:
     *   at calculation of the custom score of onlyOnce 2, the pair doc2 ID to 
“onlyOnce” is added to the map and the returned subscore = 1

Therefore doc1 and doc2 get the same final subscore for the “onlyOnce” field: 
subscore x boost = 1 x 5 = 5.
The TopFieldDocs search as well as the TopDocs return the same correct final 
score. All is OK for the final score.

But, actually, there are other fields than “onlyOnce” and I use a TopFieldDocs 
search to sort by score first and then by a date field and by a third field.
The Lucene explanation shows that the TopFieldDocs search does not use the 
correct final score to sort: for doc1 as well as doc2, it uses a score 
(fields[0]) where the contribution of the “onlyOnce” field is 0 and not 5; the 
reason I suspect is that to sort it passes through the CustomScoreQuery 
subclass while the map contains already the doc1 and doc2 pairs.
And the result is that for some hits a hit with a lower total final score can 
be ranked before a hit with a higher score.

The test with a TopDocs search returns the correct final score of 5 and the 
default sorting by relevance only is correct.

Why is fields[0] which is used to sort the TopFieldDocs hits not the final 
score?

I agree with you, I must conclude that my CustomScoreQuery subclass breaks some 
Lucene assumptions.

About your last question about the LongDistanceFeatureQuery, I don’t know it, 
it is not in the version 5 of Lucene I use.


Claude Lepère

From: Adrien Grand <[email protected]>
Sent: Wednesday, March 23, 2022 17:58
To: Lucene Users Mailing List <[email protected]>
Subject: Re: Re: Custom scores and sort

CAUTION: external mail

Sorry Claude, but I have some trouble following what you are doing
with your CustomScoreQuery. It feels like your query is doing
something that breaks some assumptions that Lucene makes.

Have you looked at existing ways that Lucene supports boosting
documents by recency, such as putting a LongDistanceFeatureQuery as a
SHOULD clause in a BooleanQuery?

On Mon, Mar 14, 2022 at 7:00 PM Claude Lepere 
<[email protected]<mailto:[email protected]>> wrote:
>
> Adrien, thank you for your answer and sorry for the lack of clarity.
>
> No, the score of a document does not depend on the score of another
> document, the problem lies within a document.
>
> There are several "only once score" fields; to simplify, I suppose there is
> only one "only once score" field;
> a document can contain several times this "only once score" field with
> different values;
> a query can contain several clauses on the different values of this field
> and these clauses can be SHOULD or MUST.
> But for such a document, the score of this field should only be counted on
> the first pass through my CustomScoreQuery subclass, on subsequent passes,
> the custom score = 0 ;
> to process so, the constructor of the subclass has as argument the map "my
> document id (not Lucene doc!) to the field".
>
> Then, the score of the first pass is multiplied by a date factor which
> depends on the age of the document (age = maximum date of the query results
> - date of the document):
> the score of a document decreases with its age.
>
> The total score (field + date) is correctly calculated, but the explanation
> log shows that the sort score (the first element of fields[]) is not the
> total score but the total score minus the "only once score" or to put it
> another way, a total score where the "only once score" = 0, and that's why
> a hit with a lower total score happens to be ranked before a hit with a
> higher total score.
>
> The log of my CustomScoreQuery subclass shows that even if the document
> contains only one "only once score" field,
> Lucene passes the CustomScoreProvider's customScore method twice, so the
> score = 0 and it seems to me that this value is retained for the sort score.
>
> I did not find why a TopFieldDocs search (with Sort = SortField.FIELD_SCORE
> and date) uses the "diminished" score and not the total score, as TopDocs
> does.
>
>
> Thanks in advance.
>
>
> Claude Lepère
>
> On 2022/03/14 12:59:45 Adrien Grand wrote:
> > It's a bit hard for me to parse what you are trying to do, but it
> > looks like you are making assumptions about how Lucene works
> > internally that are not correct.
> >
> > Do I understand correctly that your scoring mechanism has dependencies
> > on other documents, ie. the score of a document could depend on the
> > score of other documents? This is something that Lucene doesn't
> > support.
> >
> > On Thu, Mar 10, 2022 at 12:23 PM Claude Lepere 
> > <[email protected]<http://[email protected]>> wrote:
> > >
> > > Hi.
> > > The problem is that although sorting by score a match with a lower
> score is
> > > ranked before a match with a greater score.
> > > The origin of the problem lies in a subclass of CustomScoreQuery which
> > > calculates an "only once" score for each document: on the first pass the
> > > document gets its score and, if the document contains several times the
> > > same field, on the subsequent passes it gets 0.
> > > I wonder if it is possible for Lucene to give a score that depends on a
> > > previous pass in the CustomScoreProvider customScore routine for the
> same
> > > document.
> > > I ran 2 searches with IndexSearcher: the first one returns a TopDocs
> which
> > > is sorted by default by relevance, and the second search - with the Sort
> > > array = [SortField.FIELD_SCORE, a date SortField] argument - returns a
> > > TopFieldDocs.
> > > The TopDocs results are sorted by the score with the first pass value of
> > > the only once method while the TopFieldDocs results are sorted by the
> score
> > > with the value (= 0) of the next pass, hence the ranking errors.
> > > I did not find why does the TopFieldDocs search not use to sort the
> score
> > > of the hit, as the TopDocs search?
> > > I did not find how to tell the TopFieldDocs search to use the hit score
> to
> > > sort.
> > >
> > > Claude Lepère
> >
> >
> >
> > --
> > Adrien
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: 
> > [email protected]<mailto:[email protected]>
> > For additional commands, e-mail: 
> > [email protected]<mailto:[email protected]>
> >
> >



--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: 
[email protected]<mailto:[email protected]>
For additional commands, e-mail: 
[email protected]<mailto:[email protected]>

RE: Re: Custom scores and sort

Reply via email to