unsubscribe

On Mon, Mar 22, 2021, 5:33 AM Bernd Fehling <bernd.fehl...@uni-bielefeld.de>
wrote:

> Hello,
>
> I have a SolrCloud with 5 shards 2 Replicas.
> I tried everything back and forth with LocalStatsCache, ExactStatsCache
> and ExactSharedStatsCache.
> I saw some minor advantage between LocalStatsCache and the Exact... pieces.
> But as a matter of fact while showing 10 search results per page, as soon
> as I switched to the second page (hit 11 to 20) and forced page reload a
> couple
> of times, the results changed within the page. A result showing up as hit
> number 14 was listed as hit number 16 next time. And so on. Nothing
> reliable.
> Only the first page looked good.
> While inspecting the score I saw that there are minor changes between
> reloads,
> even with ExactStatsCache and ExactSharedStatsCache.
> Some more checks on the Replicas pointed out that they are never totally
> in sync.
> That means the number of docs and segment count are in sync but nothing
> else.
>
> coll1_shard1_replica1:
> Num Docs: 53576786
> Max Doc:  57506559
> Deleted Docs: 3929773
> Version:  135351
> Master (Searching)  1616078264682  22756
> Master (Replicable) 1616402397518  22844
>
> coll1_shard1_replica2:
> Num Docs: 53576786
> Max Doc:  57494890
> Deleted Docs: 3918104
> Version:  135326
> Master (Searching)  1616078264683  22755
> Master (Replicable) 1616402397521  22843
>
> Only Num Docs is the same (that is why we always get the same number of
> hits
> and also the same hits) but everything else is different.
> I think this is why we newer get the same order of results if using
> ExactStatsCache
> or ExactSharedStatsCache. We are using CloudSolrj for loading.
>
> I did once a test and forced an optimize to the index.
> First commit with expungeDeletes true and then an optimize to maxSegments
> 1.
> After that everything worked fine and the results stayed in order.
> But some weeks later the segment numbers drifted apart and the problem was
> there again.
>
> I think that will never work correct.
> Only if replicas are totally in sync against each other it might work.
> Just my findings without debugging into code.
>
> Regards
> Bernd
>
>
> Am 19.03.21 um 16:15 schrieb Cameron M VandenBerg:
> > Hello,
> >
> > I am using Solr in a distributed environment where I have split my
> collection into parts, which I have running on different nodes.  When I
> create each part of the collection, I set numShards and replicationFactor
> to 1.  The query speed is most important to us, and we are not worried
> about load on the system.
> >
> > I want a Distributed IDF across all parts of the collection so I have
> added this line to my solrconfig.xml:
> > <statsCache class="org.apache.solr.search.stats.ExactStatsCache" />
> >
> > This seems to work about 90% of the time, but if I run the same request
> over and over again, sometimes I get scores with a local IDF for just one
> part of the collection.  Here is a request example:
> >
> /solr/collection1,collection2/query?q=fulltext:shark&rows=500&fl=id,url,title,score&sort=score+desc
> >
> > I still get documents from both collection1 and collection2, but
> sometimes I get scores that are the same as when I would just query
> collection1.  I believe that it is only using the document frequency of
> collection one for the term in that case.
> >
> > Should I use a different configuration?  I would like to make sure the
> IDF is always distributed and the same every time I run the same query.  Is
> there any technique I could use to ensure that this happens?
> >
> > Thank you,
> > Cameron VandenBerg
> >
> >
>

Reply via email to