Using local IDF is usually not a problem if documents are randomly distributed between shards or collections. It can be a problem if terms are clustered in one collection/shard.
Assume a news archive with one collection for the current year and one for everything else. A recently hot topic, like “fentanyl”, will have a lower IDF in the recent collection. Similar things can happen with collections from each part of a company, say all the printer documents are in one collection, so “LaserJet” is a common term there. Global IDF is very slow in Solr right now, but there is a fast method invented by Infoseek. That patent expired several years ago, so we should implement it. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 28, 2022, at 5:35 AM, Eric Pugh <ep...@opensourceconnections.com> > wrote: > > For a very long time, that was what folks always say…. “The different IDF” > is going to be an issue. My opinion is that there are many other things > that REALLY effect your overall relevance a lot more then unbalanced IDF. > Folks worry way too much about IDF, and not enough about “what are your crazy > synonyms.txt or stop words.txt doing to you?”. > > You should go use a tool like Quepid (www.quepid.com) and set up a baseline > relevance test case, and just try the experiment, that way instead of making > decisions based on hunches, you have data! > > > >> On Dec 28, 2022, at 8:30 AM, Dave <hastings.recurs...@gmail.com> wrote: >> >> Eric, that is super clever. But how does it effect ranking if you do a >> general search? Since each collection has its own idf etc? >> -Dave >> >>> On Dec 28, 2022, at 7:03 AM, Eric Pugh <ep...@opensourceconnections.com> >>> wrote: >>> >>> You may find it an easier path forward to just move to SolrCloud. You can >>> run a single Solr server with multiple collections and use the embedded ZK >>> to avoid setting up the full ZK ensemble…. >>> >>>> On Dec 28, 2022, at 12:04 AM, Mike <mz579...@gmail.com> wrote: >>>> >>>> Yes, it should be the same, it works without basic authentication. >>>> >>>> Thank you >>>> >>>>> Am Mi., 28. Dez. 2022 um 05:48 Uhr schrieb Srijan <shree...@gmail.com>: >>>>> >>>>> >>>>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-15237/comment/17626195 >>>>> >>>>> Same issue? >>>>> >>>>>> On Tue, Dec 27, 2022, 19:59 Mike <mz579...@gmail.com> wrote: >>>>> >>>>>> I get a 401 require authentication error when I query with &shards= >>>>>> >>>>>> Do you or anyone else have any idea why? >>>>>> >>>>>> Am Mi., 28. Dez. 2022 um 04:10 Uhr schrieb Shawn Heisey < >>>>>> apa...@elyograg.org >>>>>>> : >>>>>> >>>>>>> On 12/27/22 19:50, Mike wrote: >>>>>>>> The server is not in cloud mode, it is a standalone server. >>>>>>>> I don't understand where to put the query line, in the URL, with what >>>>>>> query >>>>>>>> parameter (?=) ? >>>>>>>> >>>>>>>> Do I have to change something in solr.xml or solrconfig? >>>>>>> >>>>>>> If you put it in the URL: >>>>>>> >>>>>>> &shards=server:port/solr/core1,server:port/solr/core2 >>>>>>> >>>>>>> The way I did it is created a special core with no index of its own and >>>>>>> put the following line in the solrconfig.xml, in the defaults section >>>>> of >>>>>>> the search handler: >>>>>>> >>>>>>> <str >>>>>>> name="shards"> >>>>>>> >>>>>> >>>>> idxb2.example.com:8981/solr/inclive,idxb1.example.com:8981/solr/s0live,idxb1.example.com:8981/solr/s1live,idxb1.example.com:8981/solr/s2live,idxb2.example.com:8981/solr/s3live,idxb2.example.com:8981/solr/s4live,idxb2.example.com:8981/solr/s5live >>>>>>> </str> >>>>>>> >>>>>>> Queries never went directly to the cores with data, they only went to >>>>>>> the special core. I wrote an indexing system that would ensure >>>>>>> documents ended up in the correct shard. >>>>>>> >>>>>>> Thanks, >>>>>>> Shawn >>>>>>> >>>>>> >>>>> >>> >>> _______________________ >>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | >>> http://www.opensourceconnections.com >>> <http://www.opensourceconnections.com/> | My Free/Busy >>> <http://tinyurl.com/eric-cal> >>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed >>> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> >>> >>> This e-mail and all contents, including attachments, is considered to be >>> Company Confidential unless explicitly stated otherwise, regardless of >>> whether attachments are marked as such. >>> > > _______________________ > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com <http://www.opensourceconnections.com/> > | My Free/Busy <http://tinyurl.com/eric-cal> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed > <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. >