Re: Relevancy debugging - idf score

2021-12-07 Thread Alessandro Benedetti
Hi Markus, won't the problem be still present across shards without distributed IDF? You may have skewed shards and then each of them will have a different IDF for the same term (and field). In relation to the performance penalty, Walter highlighted, I definitely see some space for contribution, bu

Re: Relevancy debugging - idf score

2021-12-06 Thread Walter Underwood
When we tried exact IDF, it was about 10X slower in our sharded system, so we couldn’t use it. It is possible to calculate IDF when merging results from shards, with no speed penalty. Infoseek was doing that 25 years ago and the patent has expired. You return df from each shard, then calculate

Re: Relevancy debugging - idf score

2021-12-06 Thread Markus Jelsma
Hello Sjoerd, ExactStatsCache indeed works fine when replicas of the same shard do not share identical term stats, but it comes with some overhead. If you can, upgrade to at least 7.x and change the default NRT replica types to TLOG. You then no longer need to use ExactStatsCache because replicas

Re: Relevancy debugging - idf score

2021-12-06 Thread Alessandro Benedetti
Good to know you solved it! Yes, Distributed IDF is definitely a problem in case you have skewed documents distributions. Cheers -- Alessandro Benedetti Apache Lucene/Solr Committer Director, R&D Software Engineer, Search Consultant www.sease.io On Sun, 5 Dec 2021 at 17:

Re: Relevancy debugging - idf score

2021-12-05 Thread Sjoerd Smeets
Found it! I had to enable the ExactStatsCache Found a description over here. Thanks for pointing me in the right direction. https://solr.pl/en/2019/05/20/distributed-idf/ On Sun, Dec 5, 2021 at 11:09 AM Sjoerd Smeets wrote: > Hi Allessandro, > > Thanks for your reply! Yes, the document are i

Re: Relevancy debugging - idf score

2021-12-05 Thread Sjoerd Smeets
Hi Allessandro, Thanks for your reply! Yes, the document are in the same result list and I'm not doing any indexing at the moment and executed a commit just to be sure. Still the same result. It is an environment with 4 shards. Perhaps that plays a factor? Thanks, Sjoerd On Sun, Dec 5, 2021 at 1

Re: Relevancy debugging - idf score

2021-12-05 Thread Alessandro Benedetti
It's seems like the underline index changed. Are those two documents in the same result set? Is it just one query? It's definitely curious, even if a commit happened search results are consistent in one searcher. On Sun, 5 Dec 2021, 16:28 Sjoerd Smeets, wrote: > Hi all, > > I'm debugging the re

Relevancy debugging - idf score

2021-12-05 Thread Sjoerd Smeets
Hi all, I'm debugging the relevancy scores of my query and I see the following for two documents hits. My question is, why is the idf score not the same for both documents? This is Solr 6.6. Any guidance would be much appreciated. Thanks! *Doc1* "71d72354eea23b9eae934ab616e8ce38de69d760": " 104