Re: Multiple cores

2022-12-28 Thread Eric Pugh
You may find it an easier path forward to just move to SolrCloud.  You can run 
a single Solr server with multiple collections and use the embedded ZK to avoid 
setting up the full ZK ensemble….

> On Dec 28, 2022, at 12:04 AM, Mike  wrote:
> 
> Yes, it should be the same, it works without basic authentication.
> 
> Thank you
> 
> Am Mi., 28. Dez. 2022 um 05:48 Uhr schrieb Srijan :
> 
>> 
>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-15237/comment/17626195
>> 
>> Same issue?
>> 
>> On Tue, Dec 27, 2022, 19:59 Mike  wrote:
>> 
>>> I get a 401 require authentication error when I query with &shards=
>>> 
>>> Do you or anyone else have any idea why?
>>> 
>>> Am Mi., 28. Dez. 2022 um 04:10 Uhr schrieb Shawn Heisey <
>>> apa...@elyograg.org
 :
>>> 
 On 12/27/22 19:50, Mike wrote:
> The server is not in cloud mode, it is a standalone server.
> I don't understand where to put the query line, in the URL, with what
 query
> parameter (?=) ?
> 
> Do I have to change something in solr.xml or solrconfig?
 
 If you put it in the URL:
 
 &shards=server:port/solr/core1,server:port/solr/core2
 
 The way I did it is created a special core with no index of its own and
 put the following line in the solrconfig.xml, in the defaults section
>> of
 the search handler:
 
 >>> name="shards">
 
>>> 
>> idxb2.example.com:8981/solr/inclive,idxb1.example.com:8981/solr/s0live,idxb1.example.com:8981/solr/s1live,idxb1.example.com:8981/solr/s2live,idxb2.example.com:8981/solr/s3live,idxb2.example.com:8981/solr/s4live,idxb2.example.com:8981/solr/s5live
 
 
 Queries never went directly to the cores with data, they only went to
 the special core.  I wrote an indexing system that would ensure
 documents ended up in the correct shard.
 
 Thanks,
 Shawn
 
>>> 
>> 

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 


This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.



Re: Multiple cores

2022-12-28 Thread Dave
Eric, that is super clever.  But how does it effect ranking if you do a general 
search?  Since each collection has its own idf etc?
-Dave

> On Dec 28, 2022, at 7:03 AM, Eric Pugh  
> wrote:
> 
> You may find it an easier path forward to just move to SolrCloud.  You can 
> run a single Solr server with multiple collections and use the embedded ZK to 
> avoid setting up the full ZK ensemble….
> 
>> On Dec 28, 2022, at 12:04 AM, Mike  wrote:
>> 
>> Yes, it should be the same, it works without basic authentication.
>> 
>> Thank you
>> 
>>> Am Mi., 28. Dez. 2022 um 05:48 Uhr schrieb Srijan :
>>> 
>>> 
>>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-15237/comment/17626195
>>> 
>>> Same issue?
>>> 
 On Tue, Dec 27, 2022, 19:59 Mike  wrote:
>>> 
 I get a 401 require authentication error when I query with &shards=
 
 Do you or anyone else have any idea why?
 
 Am Mi., 28. Dez. 2022 um 04:10 Uhr schrieb Shawn Heisey <
 apa...@elyograg.org
> :
 
> On 12/27/22 19:50, Mike wrote:
>> The server is not in cloud mode, it is a standalone server.
>> I don't understand where to put the query line, in the URL, with what
> query
>> parameter (?=) ?
>> 
>> Do I have to change something in solr.xml or solrconfig?
> 
> If you put it in the URL:
> 
> &shards=server:port/solr/core1,server:port/solr/core2
> 
> The way I did it is created a special core with no index of its own and
> put the following line in the solrconfig.xml, in the defaults section
>>> of
> the search handler:
> 
>  name="shards">
> 
 
>>> idxb2.example.com:8981/solr/inclive,idxb1.example.com:8981/solr/s0live,idxb1.example.com:8981/solr/s1live,idxb1.example.com:8981/solr/s2live,idxb2.example.com:8981/solr/s3live,idxb2.example.com:8981/solr/s4live,idxb2.example.com:8981/solr/s5live
> 
> 
> Queries never went directly to the cores with data, they only went to
> the special core.  I wrote an indexing system that would ensure
> documents ended up in the correct shard.
> 
> Thanks,
> Shawn
> 
 
>>> 
> 
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
> http://www.opensourceconnections.com  
> | My Free/Busy   
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
> 
> 
> This e-mail and all contents, including attachments, is considered to be 
> Company Confidential unless explicitly stated otherwise, regardless of 
> whether attachments are marked as such.
> 


Re: Multiple cores

2022-12-28 Thread Eric Pugh
For a very long time, that was what folks always say….  “The different IDF” is 
going to be an issue.   My opinion is that there are many other things that 
REALLY effect your overall relevance a lot more then unbalanced IDF.   Folks 
worry way too much about IDF, and not enough about “what are your crazy 
synonyms.txt or stop words.txt doing to you?”.

You should go use a tool like Quepid (www.quepid.com) and set up a baseline 
relevance test case, and just try the experiment, that way instead of making 
decisions based on hunches, you have data!



> On Dec 28, 2022, at 8:30 AM, Dave  wrote:
> 
> Eric, that is super clever.  But how does it effect ranking if you do a 
> general search?  Since each collection has its own idf etc?
> -Dave
> 
>> On Dec 28, 2022, at 7:03 AM, Eric Pugh  
>> wrote:
>> 
>> You may find it an easier path forward to just move to SolrCloud.  You can 
>> run a single Solr server with multiple collections and use the embedded ZK 
>> to avoid setting up the full ZK ensemble….
>> 
>>> On Dec 28, 2022, at 12:04 AM, Mike  wrote:
>>> 
>>> Yes, it should be the same, it works without basic authentication.
>>> 
>>> Thank you
>>> 
 Am Mi., 28. Dez. 2022 um 05:48 Uhr schrieb Srijan :
 
 
 https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-15237/comment/17626195
 
 Same issue?
 
> On Tue, Dec 27, 2022, 19:59 Mike  wrote:
 
> I get a 401 require authentication error when I query with &shards=
> 
> Do you or anyone else have any idea why?
> 
> Am Mi., 28. Dez. 2022 um 04:10 Uhr schrieb Shawn Heisey <
> apa...@elyograg.org
>> :
> 
>> On 12/27/22 19:50, Mike wrote:
>>> The server is not in cloud mode, it is a standalone server.
>>> I don't understand where to put the query line, in the URL, with what
>> query
>>> parameter (?=) ?
>>> 
>>> Do I have to change something in solr.xml or solrconfig?
>> 
>> If you put it in the URL:
>> 
>> &shards=server:port/solr/core1,server:port/solr/core2
>> 
>> The way I did it is created a special core with no index of its own and
>> put the following line in the solrconfig.xml, in the defaults section
 of
>> the search handler:
>> 
>> > name="shards">
>> 
> 
 idxb2.example.com:8981/solr/inclive,idxb1.example.com:8981/solr/s0live,idxb1.example.com:8981/solr/s1live,idxb1.example.com:8981/solr/s2live,idxb2.example.com:8981/solr/s3live,idxb2.example.com:8981/solr/s4live,idxb2.example.com:8981/solr/s5live
>> 
>> 
>> Queries never went directly to the cores with data, they only went to
>> the special core.  I wrote an indexing system that would ensure
>> documents ended up in the correct shard.
>> 
>> Thanks,
>> Shawn
>> 
> 
 
>> 
>> ___
>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
>> http://www.opensourceconnections.com  
>> | My Free/Busy   
>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
>> 
>> 
>> This e-mail and all contents, including attachments, is considered to be 
>> Company Confidential unless explicitly stated otherwise, regardless of 
>> whether attachments are marked as such.
>> 

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 


This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.



Re: Multiple cores

2022-12-28 Thread Walter Underwood
Using local IDF is usually not a problem if documents are randomly distributed 
between shards or collections. It can be a problem if terms are clustered in 
one collection/shard.

Assume a news archive with one collection for the current year and one for 
everything else. A recently hot topic, like “fentanyl”, will have a lower IDF 
in the recent collection. Similar things can happen with collections from each 
part of a company, say all the printer documents are in one collection, so 
“LaserJet” is a common term there.

Global IDF is very slow in Solr right now, but there is a fast method invented 
by Infoseek. That patent expired several years ago, so we  should implement it.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 28, 2022, at 5:35 AM, Eric Pugh  
> wrote:
> 
> For a very long time, that was what folks always say….  “The different IDF” 
> is going to be an issue.   My opinion is that there are many other things 
> that REALLY effect your overall relevance a lot more then unbalanced IDF.   
> Folks worry way too much about IDF, and not enough about “what are your crazy 
> synonyms.txt or stop words.txt doing to you?”.
> 
> You should go use a tool like Quepid (www.quepid.com) and set up a baseline 
> relevance test case, and just try the experiment, that way instead of making 
> decisions based on hunches, you have data!
> 
> 
> 
>> On Dec 28, 2022, at 8:30 AM, Dave  wrote:
>> 
>> Eric, that is super clever.  But how does it effect ranking if you do a 
>> general search?  Since each collection has its own idf etc?
>> -Dave
>> 
>>> On Dec 28, 2022, at 7:03 AM, Eric Pugh  
>>> wrote:
>>> 
>>> You may find it an easier path forward to just move to SolrCloud.  You can 
>>> run a single Solr server with multiple collections and use the embedded ZK 
>>> to avoid setting up the full ZK ensemble….
>>> 
 On Dec 28, 2022, at 12:04 AM, Mike  wrote:
 
 Yes, it should be the same, it works without basic authentication.
 
 Thank you
 
> Am Mi., 28. Dez. 2022 um 05:48 Uhr schrieb Srijan :
> 
> 
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-15237/comment/17626195
> 
> Same issue?
> 
>> On Tue, Dec 27, 2022, 19:59 Mike  wrote:
> 
>> I get a 401 require authentication error when I query with &shards=
>> 
>> Do you or anyone else have any idea why?
>> 
>> Am Mi., 28. Dez. 2022 um 04:10 Uhr schrieb Shawn Heisey <
>> apa...@elyograg.org
>>> :
>> 
>>> On 12/27/22 19:50, Mike wrote:
 The server is not in cloud mode, it is a standalone server.
 I don't understand where to put the query line, in the URL, with what
>>> query
 parameter (?=) ?
 
 Do I have to change something in solr.xml or solrconfig?
>>> 
>>> If you put it in the URL:
>>> 
>>> &shards=server:port/solr/core1,server:port/solr/core2
>>> 
>>> The way I did it is created a special core with no index of its own and
>>> put the following line in the solrconfig.xml, in the defaults section
> of
>>> the search handler:
>>> 
>>> >> name="shards">
>>> 
>> 
> idxb2.example.com:8981/solr/inclive,idxb1.example.com:8981/solr/s0live,idxb1.example.com:8981/solr/s1live,idxb1.example.com:8981/solr/s2live,idxb2.example.com:8981/solr/s3live,idxb2.example.com:8981/solr/s4live,idxb2.example.com:8981/solr/s5live
>>> 
>>> 
>>> Queries never went directly to the cores with data, they only went to
>>> the special core.  I wrote an indexing system that would ensure
>>> documents ended up in the correct shard.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>> 
> 
>>> 
>>> ___
>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
>>> http://www.opensourceconnections.com 
>>>  | My Free/Busy 
>>>   
>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
>>> 
>>> 
>>> This e-mail and all contents, including attachments, is considered to be 
>>> Company Confidential unless explicitly stated otherwise, regardless of 
>>> whether attachments are marked as such.
>>> 
> 
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
> http://www.opensourceconnections.com  
> | My Free/Busy   
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
> 
>   
> This e-mail and all contents, including attachments, is considered to be 
> Company Confidential unless explicitly stated otherwise, regardless of 
> whether attachments are marked as

Re: Multiple cores

2022-12-28 Thread David Hastings
this is actually something I experienced using things like MLT in order to
get "similar" documents, is the corpus has to match, or else it all goes
out the window.  so yeah if you have multiple cores/collections with the
same exact type of documents you can be pretty safe, but once you start
mixing a history book collection with a novel, and a news archive
collection, things get strange pretty quickly.  and god forbid you to have
different languages

On Wed, Dec 28, 2022 at 1:25 PM Walter Underwood 
wrote:

> Using local IDF is usually not a problem if documents are randomly
> distributed between shards or collections. It can be a problem if terms are
> clustered in one collection/shard.
>
> Assume a news archive with one collection for the current year and one for
> everything else. A recently hot topic, like “fentanyl”, will have a lower
> IDF in the recent collection. Similar things can happen with collections
> from each part of a company, say all the printer documents are in one
> collection, so “LaserJet” is a common term there.
>
> Global IDF is very slow in Solr right now, but there is a fast method
> invented by Infoseek. That patent expired several years ago, so we  should
> implement it.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Dec 28, 2022, at 5:35 AM, Eric Pugh 
> wrote:
> >
> > For a very long time, that was what folks always say….  “The different
> IDF” is going to be an issue.   My opinion is that there are many other
> things that REALLY effect your overall relevance a lot more then unbalanced
> IDF.   Folks worry way too much about IDF, and not enough about “what are
> your crazy synonyms.txt or stop words.txt doing to you?”.
> >
> > You should go use a tool like Quepid (www.quepid.com) and set up a
> baseline relevance test case, and just try the experiment, that way instead
> of making decisions based on hunches, you have data!
> >
> >
> >
> >> On Dec 28, 2022, at 8:30 AM, Dave  wrote:
> >>
> >> Eric, that is super clever.  But how does it effect ranking if you do a
> general search?  Since each collection has its own idf etc?
> >> -Dave
> >>
> >>> On Dec 28, 2022, at 7:03 AM, Eric Pugh <
> ep...@opensourceconnections.com> wrote:
> >>>
> >>> You may find it an easier path forward to just move to SolrCloud.
> You can run a single Solr server with multiple collections and use the
> embedded ZK to avoid setting up the full ZK ensemble….
> >>>
>  On Dec 28, 2022, at 12:04 AM, Mike  wrote:
> 
>  Yes, it should be the same, it works without basic authentication.
> 
>  Thank you
> 
> > Am Mi., 28. Dez. 2022 um 05:48 Uhr schrieb Srijan <
> shree...@gmail.com>:
> >
> >
> >
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-15237/comment/17626195
> >
> > Same issue?
> >
> >> On Tue, Dec 27, 2022, 19:59 Mike  wrote:
> >
> >> I get a 401 require authentication error when I query with &shards=
> >>
> >> Do you or anyone else have any idea why?
> >>
> >> Am Mi., 28. Dez. 2022 um 04:10 Uhr schrieb Shawn Heisey <
> >> apa...@elyograg.org
> >>> :
> >>
> >>> On 12/27/22 19:50, Mike wrote:
>  The server is not in cloud mode, it is a standalone server.
>  I don't understand where to put the query line, in the URL, with
> what
> >>> query
>  parameter (?=) ?
> 
>  Do I have to change something in solr.xml or solrconfig?
> >>>
> >>> If you put it in the URL:
> >>>
> >>> &shards=server:port/solr/core1,server:port/solr/core2
> >>>
> >>> The way I did it is created a special core with no index of its
> own and
> >>> put the following line in the solrconfig.xml, in the defaults
> section
> > of
> >>> the search handler:
> >>>
> >>>  >>> name="shards">
> >>>
> >>
> >
> idxb2.example.com:8981/solr/inclive,idxb1.example.com:8981/solr/s0live,idxb1.example.com:8981/solr/s1live,idxb1.example.com:8981/solr/s2live,idxb2.example.com:8981/solr/s3live,idxb2.example.com:8981/solr/s4live,idxb2.example.com:8981/solr/s5live
> >>> 
> >>>
> >>> Queries never went directly to the cores with data, they only went
> to
> >>> the special core.  I wrote an indexing system that would ensure
> >>> documents ended up in the correct shard.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >>
> >
> >>>
> >>> ___
> >>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> | http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> >>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> >>> This e-mail and all contents, including attachments, is considered to
> be Company Confidential unless exp

Re: Multiple cores

2022-12-28 Thread Thomas Corthals
For our corpus, term frequency gets in the way of how we want to rank
search results rather than being helpful.

I put this in our schema to effectively turn Okapi BM25
 into BM15:


0


Thomas



Op wo 28 dec. 2022 om 14:35 schreef Eric Pugh <
ep...@opensourceconnections.com>:

> For a very long time, that was what folks always say….  “The different
> IDF” is going to be an issue.   My opinion is that there are many other
> things that REALLY effect your overall relevance a lot more then unbalanced
> IDF.   Folks worry way too much about IDF, and not enough about “what are
> your crazy synonyms.txt or stop words.txt doing to you?”.
>
> You should go use a tool like Quepid (www.quepid.com) and set up a
> baseline relevance test case, and just try the experiment, that way instead
> of making decisions based on hunches, you have data!
>
>
>
> > On Dec 28, 2022, at 8:30 AM, Dave  wrote:
> >
> > Eric, that is super clever.  But how does it effect ranking if you do a
> general search?  Since each collection has its own idf etc?
> > -Dave
> >
> >> On Dec 28, 2022, at 7:03 AM, Eric Pugh 
> wrote:
> >>
> >> You may find it an easier path forward to just move to SolrCloud.  You
> can run a single Solr server with multiple collections and use the embedded
> ZK to avoid setting up the full ZK ensemble….
> >>
> >>> On Dec 28, 2022, at 12:04 AM, Mike  wrote:
> >>>
> >>> Yes, it should be the same, it works without basic authentication.
> >>>
> >>> Thank you
> >>>
>  Am Mi., 28. Dez. 2022 um 05:48 Uhr schrieb Srijan  >:
> 
> 
> 
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-15237/comment/17626195
> 
>  Same issue?
> 
> > On Tue, Dec 27, 2022, 19:59 Mike  wrote:
> 
> > I get a 401 require authentication error when I query with &shards=
> >
> > Do you or anyone else have any idea why?
> >
> > Am Mi., 28. Dez. 2022 um 04:10 Uhr schrieb Shawn Heisey <
> > apa...@elyograg.org
> >> :
> >
> >> On 12/27/22 19:50, Mike wrote:
> >>> The server is not in cloud mode, it is a standalone server.
> >>> I don't understand where to put the query line, in the URL, with
> what
> >> query
> >>> parameter (?=) ?
> >>>
> >>> Do I have to change something in solr.xml or solrconfig?
> >>
> >> If you put it in the URL:
> >>
> >> &shards=server:port/solr/core1,server:port/solr/core2
> >>
> >> The way I did it is created a special core with no index of its own
> and
> >> put the following line in the solrconfig.xml, in the defaults
> section
>  of
> >> the search handler:
> >>
> >>  >> name="shards">
> >>
> >
> 
> idxb2.example.com:8981/solr/inclive,idxb1.example.com:8981/solr/s0live,idxb1.example.com:8981/solr/s1live,idxb1.example.com:8981/solr/s2live,idxb2.example.com:8981/solr/s3live,idxb2.example.com:8981/solr/s4live,idxb2.example.com:8981/solr/s5live
> >> 
> >>
> >> Queries never went directly to the cores with data, they only went
> to
> >> the special core.  I wrote an indexing system that would ensure
> >> documents ended up in the correct shard.
> >>
> >> Thanks,
> >> Shawn
> >>
> >
> 
> >>
> >> ___
> >> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> | http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> >> This e-mail and all contents, including attachments, is considered to
> be Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
> >>
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>