Hi All, *Understanding of Duplicity Handling by Solr*
As per an older discussion on solr community [ref mail: *Ranking of duplicate documents on solr*], solr handles duplicate documents [documents present in multiple shards], by preferring the document which is oldest according to indexed date, and if indexed date is same, then it compares *version* and document with higher *version* is displayed. We verified the aforementioned hypothesis for a few cases where the indexed date was different and where it was the same, and the hypothesis turned out accurate for all of the cases. *Issue Details* Recently, I've found a document which is not following the above hypothesis, the indexed date for the document[present on 2 shards] on both the shards is the same, although the document with lower *version* is being ranked [contrary to above hypothesis]. To check if *version* visible is correct or not, I filtered the respective copy based on *version:* 1. [query: *fq=id:{document-copy1-id} AND version:{document-copy1-id}*], 2. [query: *fq=id:{document-copy2-id} AND version:{document-copy2-id}*], and found that one document is not being displayed if we add fq on *version* . *How does solr set the _version_ field? Is there a possibility version displayed is incorrect? Does solr maintain a different version internally which can differ from one visible?Is this the reason why the above hypothesis is failing?* Would appreciate any help regarding solr duplicity handling/ and my aforementioned doubts!