unsubscribe On Mon, Mar 22, 2021, 5:33 AM Bernd Fehling <bernd.fehl...@uni-bielefeld.de> wrote:
> Hello, > > I have a SolrCloud with 5 shards 2 Replicas. > I tried everything back and forth with LocalStatsCache, ExactStatsCache > and ExactSharedStatsCache. > I saw some minor advantage between LocalStatsCache and the Exact... pieces. > But as a matter of fact while showing 10 search results per page, as soon > as I switched to the second page (hit 11 to 20) and forced page reload a > couple > of times, the results changed within the page. A result showing up as hit > number 14 was listed as hit number 16 next time. And so on. Nothing > reliable. > Only the first page looked good. > While inspecting the score I saw that there are minor changes between > reloads, > even with ExactStatsCache and ExactSharedStatsCache. > Some more checks on the Replicas pointed out that they are never totally > in sync. > That means the number of docs and segment count are in sync but nothing > else. > > coll1_shard1_replica1: > Num Docs: 53576786 > Max Doc: 57506559 > Deleted Docs: 3929773 > Version: 135351 > Master (Searching) 1616078264682 22756 > Master (Replicable) 1616402397518 22844 > > coll1_shard1_replica2: > Num Docs: 53576786 > Max Doc: 57494890 > Deleted Docs: 3918104 > Version: 135326 > Master (Searching) 1616078264683 22755 > Master (Replicable) 1616402397521 22843 > > Only Num Docs is the same (that is why we always get the same number of > hits > and also the same hits) but everything else is different. > I think this is why we newer get the same order of results if using > ExactStatsCache > or ExactSharedStatsCache. We are using CloudSolrj for loading. > > I did once a test and forced an optimize to the index. > First commit with expungeDeletes true and then an optimize to maxSegments > 1. > After that everything worked fine and the results stayed in order. > But some weeks later the segment numbers drifted apart and the problem was > there again. > > I think that will never work correct. > Only if replicas are totally in sync against each other it might work. > Just my findings without debugging into code. > > Regards > Bernd > > > Am 19.03.21 um 16:15 schrieb Cameron M VandenBerg: > > Hello, > > > > I am using Solr in a distributed environment where I have split my > collection into parts, which I have running on different nodes. When I > create each part of the collection, I set numShards and replicationFactor > to 1. The query speed is most important to us, and we are not worried > about load on the system. > > > > I want a Distributed IDF across all parts of the collection so I have > added this line to my solrconfig.xml: > > <statsCache class="org.apache.solr.search.stats.ExactStatsCache" /> > > > > This seems to work about 90% of the time, but if I run the same request > over and over again, sometimes I get scores with a local IDF for just one > part of the collection. Here is a request example: > > > /solr/collection1,collection2/query?q=fulltext:shark&rows=500&fl=id,url,title,score&sort=score+desc > > > > I still get documents from both collection1 and collection2, but > sometimes I get scores that are the same as when I would just query > collection1. I believe that it is only using the document frequency of > collection one for the term in that case. > > > > Should I use a different configuration? I would like to make sure the > IDF is always distributed and the same every time I run the same query. Is > there any technique I could use to ensure that this happens? > > > > Thank you, > > Cameron VandenBerg > > > > >