Hi All, Requesting some assistance with this problem! On Wed, Nov 20, 2024 at 2:48 PM Saksham Gupta <saksham.gu...@indiamart.com> wrote:
> Hi All, > > *Understanding of Duplicity Handling by Solr* > > As per an older discussion on solr community [ref mail: *Ranking of > duplicate documents on solr*], solr handles duplicate documents > [documents present in multiple shards], by preferring the document which is > oldest according to indexed date, and if indexed date is same, then it > compares *version* and document with higher *version* is displayed. > > We verified the aforementioned hypothesis for a few cases where the > indexed date was different and where it was the same, and the hypothesis > turned out accurate for all of the cases. > > > *Issue Details* > Recently, I've found a document which is not following the above > hypothesis, the indexed date for the document[present on 2 shards] on both > the shards is the same, although the document with lower *version* is > being ranked [contrary to above hypothesis]. To check if *version* > visible is correct or not, I filtered the respective copy based on > *version:* > 1. [query: *fq=id:{document-copy1-id} AND version:{document-copy1-id}*], > 2. [query: *fq=id:{document-copy2-id} AND version:{document-copy2-id}*], > and found that one document is not being displayed if we add fq on > *version*. > > > *How does solr set the _version_ field? Is there a possibility version > displayed is incorrect? Does solr maintain a different version internally > which can differ from one visible?Is this the reason why the above > hypothesis is failing?* > > Would appreciate any help regarding solr duplicity handling/ and my > aforementioned doubts! >