Hi Markus,
won't the problem be still present across shards without distributed IDF?
You may have skewed shards and then each of them will have a different IDF
for the same term (and field).
In relation to the performance penalty, Walter highlighted, I definitely
see some space for contribution, bu
When we tried exact IDF, it was about 10X slower in our sharded system, so we
couldn’t use it.
It is possible to calculate IDF when merging results from shards, with no speed
penalty. Infoseek was doing that 25 years ago and the patent has expired. You
return df from each shard, then calculate
Hello Sjoerd,
ExactStatsCache indeed works fine when replicas of the same shard do not
share identical term stats, but it comes with some overhead. If you can,
upgrade to at least 7.x and change the default NRT replica types to TLOG.
You then no longer need to use ExactStatsCache because replicas
Good to know you solved it!
Yes, Distributed IDF is definitely a problem in case you have skewed
documents distributions.
Cheers
--
Alessandro Benedetti
Apache Lucene/Solr Committer
Director, R&D Software Engineer, Search Consultant
www.sease.io
On Sun, 5 Dec 2021 at 17:
Found it!
I had to enable the
ExactStatsCache
Found a description over here. Thanks for pointing me in the right
direction.
https://solr.pl/en/2019/05/20/distributed-idf/
On Sun, Dec 5, 2021 at 11:09 AM Sjoerd Smeets wrote:
> Hi Allessandro,
>
> Thanks for your reply! Yes, the document are i
Hi Allessandro,
Thanks for your reply! Yes, the document are in the same result list and
I'm not doing any indexing at the moment and executed a commit just to be
sure. Still the same result. It is an environment with 4 shards. Perhaps
that plays a factor?
Thanks,
Sjoerd
On Sun, Dec 5, 2021 at 1
It's seems like the underline index changed.
Are those two documents in the same result set?
Is it just one query?
It's definitely curious, even if a commit happened search results are
consistent in one searcher.
On Sun, 5 Dec 2021, 16:28 Sjoerd Smeets, wrote:
> Hi all,
>
> I'm debugging the re
Hi all,
I'm debugging the relevancy scores of my query and I see the following for
two documents hits. My question is, why is the idf score not the same for
both documents? This is Solr 6.6.
Any guidance would be much appreciated.
Thanks!
*Doc1*
"71d72354eea23b9eae934ab616e8ce38de69d760": "
104