Using local IDF is usually not a problem if documents are randomly distributed 
between shards or collections. It can be a problem if terms are clustered in 
one collection/shard.

Assume a news archive with one collection for the current year and one for 
everything else. A recently hot topic, like “fentanyl”, will have a lower IDF 
in the recent collection. Similar things can happen with collections from each 
part of a company, say all the printer documents are in one collection, so 
“LaserJet” is a common term there.

Global IDF is very slow in Solr right now, but there is a fast method invented 
by Infoseek. That patent expired several years ago, so we  should implement it.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 28, 2022, at 5:35 AM, Eric Pugh <ep...@opensourceconnections.com> 
> wrote:
> 
> For a very long time, that was what folks always say….  “The different IDF” 
> is going to be an issue.   My opinion is that there are many other things 
> that REALLY effect your overall relevance a lot more then unbalanced IDF.   
> Folks worry way too much about IDF, and not enough about “what are your crazy 
> synonyms.txt or stop words.txt doing to you?”.
> 
> You should go use a tool like Quepid (www.quepid.com) and set up a baseline 
> relevance test case, and just try the experiment, that way instead of making 
> decisions based on hunches, you have data!
> 
> 
> 
>> On Dec 28, 2022, at 8:30 AM, Dave <hastings.recurs...@gmail.com> wrote:
>> 
>> Eric, that is super clever.  But how does it effect ranking if you do a 
>> general search?  Since each collection has its own idf etc?
>> -Dave
>> 
>>> On Dec 28, 2022, at 7:03 AM, Eric Pugh <ep...@opensourceconnections.com> 
>>> wrote:
>>> 
>>> You may find it an easier path forward to just move to SolrCloud.  You can 
>>> run a single Solr server with multiple collections and use the embedded ZK 
>>> to avoid setting up the full ZK ensemble….
>>> 
>>>> On Dec 28, 2022, at 12:04 AM, Mike <mz579...@gmail.com> wrote:
>>>> 
>>>> Yes, it should be the same, it works without basic authentication.
>>>> 
>>>> Thank you
>>>> 
>>>>> Am Mi., 28. Dez. 2022 um 05:48 Uhr schrieb Srijan <shree...@gmail.com>:
>>>>> 
>>>>> 
>>>>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-15237/comment/17626195
>>>>> 
>>>>> Same issue?
>>>>> 
>>>>>> On Tue, Dec 27, 2022, 19:59 Mike <mz579...@gmail.com> wrote:
>>>>> 
>>>>>> I get a 401 require authentication error when I query with &shards=
>>>>>> 
>>>>>> Do you or anyone else have any idea why?
>>>>>> 
>>>>>> Am Mi., 28. Dez. 2022 um 04:10 Uhr schrieb Shawn Heisey <
>>>>>> apa...@elyograg.org
>>>>>>> :
>>>>>> 
>>>>>>> On 12/27/22 19:50, Mike wrote:
>>>>>>>> The server is not in cloud mode, it is a standalone server.
>>>>>>>> I don't understand where to put the query line, in the URL, with what
>>>>>>> query
>>>>>>>> parameter (?=) ?
>>>>>>>> 
>>>>>>>> Do I have to change something in solr.xml or solrconfig?
>>>>>>> 
>>>>>>> If you put it in the URL:
>>>>>>> 
>>>>>>> &shards=server:port/solr/core1,server:port/solr/core2
>>>>>>> 
>>>>>>> The way I did it is created a special core with no index of its own and
>>>>>>> put the following line in the solrconfig.xml, in the defaults section
>>>>> of
>>>>>>> the search handler:
>>>>>>> 
>>>>>>> <str
>>>>>>> name="shards">
>>>>>>> 
>>>>>> 
>>>>> idxb2.example.com:8981/solr/inclive,idxb1.example.com:8981/solr/s0live,idxb1.example.com:8981/solr/s1live,idxb1.example.com:8981/solr/s2live,idxb2.example.com:8981/solr/s3live,idxb2.example.com:8981/solr/s4live,idxb2.example.com:8981/solr/s5live
>>>>>>> </str>
>>>>>>> 
>>>>>>> Queries never went directly to the cores with data, they only went to
>>>>>>> the special core.  I wrote an indexing system that would ensure
>>>>>>> documents ended up in the correct shard.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Shawn
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> _______________________
>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
>>> http://www.opensourceconnections.com 
>>> <http://www.opensourceconnections.com/> | My Free/Busy 
>>> <http://tinyurl.com/eric-cal>  
>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
>>> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>>>     
>>> This e-mail and all contents, including attachments, is considered to be 
>>> Company Confidential unless explicitly stated otherwise, regardless of 
>>> whether attachments are marked as such.
>>> 
> 
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
> http://www.opensourceconnections.com <http://www.opensourceconnections.com/> 
> | My Free/Busy <http://tinyurl.com/eric-cal>  
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>   
> This e-mail and all contents, including attachments, is considered to be 
> Company Confidential unless explicitly stated otherwise, regardless of 
> whether attachments are marked as such.
> 

Reply via email to