Hi All,
Requesting some assistance with this problem!

On Wed, Nov 20, 2024 at 2:48 PM Saksham Gupta <saksham.gu...@indiamart.com>
wrote:

> Hi All,
>
> *Understanding of Duplicity Handling by Solr*
>
> As per an older discussion on solr community [ref mail: *Ranking of
> duplicate documents on solr*], solr handles duplicate documents
> [documents present in multiple shards], by preferring the document which is
> oldest according to indexed date, and if indexed date is same, then it
> compares *version* and document with higher *version* is displayed.
>
> We verified the aforementioned hypothesis for a few cases where the
> indexed date was different and where it was the same, and the hypothesis
> turned out accurate for all of the cases.
>
>
> *Issue Details*
> Recently, I've found a document which is not following the above
> hypothesis, the indexed date for the document[present on 2 shards] on both
> the shards is the same, although the document with lower *version* is
> being ranked [contrary to above hypothesis]. To check if *version*
> visible is correct or not, I filtered the respective copy based on
> *version:*
> 1. [query: *fq=id:{document-copy1-id} AND version:{document-copy1-id}*],
> 2. [query: *fq=id:{document-copy2-id} AND version:{document-copy2-id}*],
> and found that one document is not being displayed if we add fq on
> *version*.
>
>
> *How does solr set the _version_ field? Is there a possibility version
> displayed is incorrect? Does solr maintain a different version internally
> which can differ from one visible?Is this the reason why the above
> hypothesis is failing?*
>
> Would appreciate any help regarding solr duplicity handling/ and my
> aforementioned doubts!
>

Reply via email to