this is actually something I experienced using things like MLT in order to get "similar" documents, is the corpus has to match, or else it all goes out the window. so yeah if you have multiple cores/collections with the same exact type of documents you can be pretty safe, but once you start mixing a history book collection with a novel, and a news archive collection, things get strange pretty quickly. and god forbid you to have different languages
On Wed, Dec 28, 2022 at 1:25 PM Walter Underwood <wun...@wunderwood.org> wrote: > Using local IDF is usually not a problem if documents are randomly > distributed between shards or collections. It can be a problem if terms are > clustered in one collection/shard. > > Assume a news archive with one collection for the current year and one for > everything else. A recently hot topic, like “fentanyl”, will have a lower > IDF in the recent collection. Similar things can happen with collections > from each part of a company, say all the printer documents are in one > collection, so “LaserJet” is a common term there. > > Global IDF is very slow in Solr right now, but there is a fast method > invented by Infoseek. That patent expired several years ago, so we should > implement it. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Dec 28, 2022, at 5:35 AM, Eric Pugh <ep...@opensourceconnections.com> > wrote: > > > > For a very long time, that was what folks always say…. “The different > IDF” is going to be an issue. My opinion is that there are many other > things that REALLY effect your overall relevance a lot more then unbalanced > IDF. Folks worry way too much about IDF, and not enough about “what are > your crazy synonyms.txt or stop words.txt doing to you?”. > > > > You should go use a tool like Quepid (www.quepid.com) and set up a > baseline relevance test case, and just try the experiment, that way instead > of making decisions based on hunches, you have data! > > > > > > > >> On Dec 28, 2022, at 8:30 AM, Dave <hastings.recurs...@gmail.com> wrote: > >> > >> Eric, that is super clever. But how does it effect ranking if you do a > general search? Since each collection has its own idf etc? > >> -Dave > >> > >>> On Dec 28, 2022, at 7:03 AM, Eric Pugh < > ep...@opensourceconnections.com> wrote: > >>> > >>> You may find it an easier path forward to just move to SolrCloud. > You can run a single Solr server with multiple collections and use the > embedded ZK to avoid setting up the full ZK ensemble…. > >>> > >>>> On Dec 28, 2022, at 12:04 AM, Mike <mz579...@gmail.com> wrote: > >>>> > >>>> Yes, it should be the same, it works without basic authentication. > >>>> > >>>> Thank you > >>>> > >>>>> Am Mi., 28. Dez. 2022 um 05:48 Uhr schrieb Srijan < > shree...@gmail.com>: > >>>>> > >>>>> > >>>>> > https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-15237/comment/17626195 > >>>>> > >>>>> Same issue? > >>>>> > >>>>>> On Tue, Dec 27, 2022, 19:59 Mike <mz579...@gmail.com> wrote: > >>>>> > >>>>>> I get a 401 require authentication error when I query with &shards= > >>>>>> > >>>>>> Do you or anyone else have any idea why? > >>>>>> > >>>>>> Am Mi., 28. Dez. 2022 um 04:10 Uhr schrieb Shawn Heisey < > >>>>>> apa...@elyograg.org > >>>>>>> : > >>>>>> > >>>>>>> On 12/27/22 19:50, Mike wrote: > >>>>>>>> The server is not in cloud mode, it is a standalone server. > >>>>>>>> I don't understand where to put the query line, in the URL, with > what > >>>>>>> query > >>>>>>>> parameter (?=) ? > >>>>>>>> > >>>>>>>> Do I have to change something in solr.xml or solrconfig? > >>>>>>> > >>>>>>> If you put it in the URL: > >>>>>>> > >>>>>>> &shards=server:port/solr/core1,server:port/solr/core2 > >>>>>>> > >>>>>>> The way I did it is created a special core with no index of its > own and > >>>>>>> put the following line in the solrconfig.xml, in the defaults > section > >>>>> of > >>>>>>> the search handler: > >>>>>>> > >>>>>>> <str > >>>>>>> name="shards"> > >>>>>>> > >>>>>> > >>>>> > idxb2.example.com:8981/solr/inclive,idxb1.example.com:8981/solr/s0live,idxb1.example.com:8981/solr/s1live,idxb1.example.com:8981/solr/s2live,idxb2.example.com:8981/solr/s3live,idxb2.example.com:8981/solr/s4live,idxb2.example.com:8981/solr/s5live > >>>>>>> </str> > >>>>>>> > >>>>>>> Queries never went directly to the cores with data, they only went > to > >>>>>>> the special core. I wrote an indexing system that would ensure > >>>>>>> documents ended up in the correct shard. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Shawn > >>>>>>> > >>>>>> > >>>>> > >>> > >>> _______________________ > >>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 > | http://www.opensourceconnections.com < > http://www.opensourceconnections.com/> | My Free/Busy < > http://tinyurl.com/eric-cal> > >>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > > >>> This e-mail and all contents, including attachments, is considered to > be Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. > >>> > > > > _______________________ > > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | > http://www.opensourceconnections.com < > http://www.opensourceconnections.com/> | My Free/Busy < > http://tinyurl.com/eric-cal> > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> > > > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless of > whether attachments are marked as such. > > > >