Re: Multiple cores

2022-12-28 Thread Eric Pugh
You may find it an easier path forward to just move to SolrCloud. You can run a single Solr server with multiple collections and use the embedded ZK to avoid setting up the full ZK ensemble…. > On Dec 28, 2022, at 12:04 AM, Mike wrote: > > Yes, it should be the same, it works without basic au

Re: Multiple cores

2022-12-28 Thread Dave
Eric, that is super clever. But how does it effect ranking if you do a general search? Since each collection has its own idf etc? -Dave > On Dec 28, 2022, at 7:03 AM, Eric Pugh > wrote: > > You may find it an easier path forward to just move to SolrCloud. You can > run a single Solr serve

Re: Multiple cores

2022-12-28 Thread Eric Pugh
For a very long time, that was what folks always say…. “The different IDF” is going to be an issue. My opinion is that there are many other things that REALLY effect your overall relevance a lot more then unbalanced IDF. Folks worry way too much about IDF, and not enough about “what are you

Re: Multiple cores

2022-12-28 Thread Walter Underwood
Using local IDF is usually not a problem if documents are randomly distributed between shards or collections. It can be a problem if terms are clustered in one collection/shard. Assume a news archive with one collection for the current year and one for everything else. A recently hot topic, lik

Re: Multiple cores

2022-12-28 Thread David Hastings
this is actually something I experienced using things like MLT in order to get "similar" documents, is the corpus has to match, or else it all goes out the window. so yeah if you have multiple cores/collections with the same exact type of documents you can be pretty safe, but once you start mixing

Re: Multiple cores

2022-12-28 Thread Thomas Corthals
For our corpus, term frequency gets in the way of how we want to rank search results rather than being helpful. I put this in our schema to effectively turn Okapi BM25 into BM15: 0 Thomas Op wo 28 dec. 2022 om 14:35 schreef Eric Pug