Re: How is fieldNorm calculated when omitNorms is set to true?

2022-09-20 Thread Thomas Corthals
Hi Stian,

You can't upgrade across more than one major version. You'll have to
reindex against a new Solr install.

The ref guide has more on configuring similarity:
https://solr.apache.org/guide/solr/latest/indexing-guide/schema-elements.html#similarity

Wikipedia has an explanation of the BM25 ranking function:
https://en.wikipedia.org/wiki/Okapi_BM25

Kind regards,

Thomas

Op ma 19 sep. 2022 om 08:59 schreef Stian Brattland :

> Hi Thomas,
>
> Yes, it may be about time to upgrade from 3.2.0. I think I will have to
> look into that. Thanks for sharing your insight and configuration example.
>
> Kind regards,
> Stian
>
>
>
> søn. 18. sep. 2022 kl. 16:21 skrev Thomas Corthals  >:
>
> > Hi Stian,
> >
> > We have the same issue with our documents. I fixed that by setting b = 0
> in
> > our schema for BM25 similarity.
> >
> > 
> > 0
> > 
> >
> > I don't know if BM25 can be used with your version of Solr. Personally I
> > think it's worth upgrading for.
> >
> > Thomas
> >
> >
> > Op za 17 sep. 2022 om 19:30 schreef Stian Brattland  >:
> >
> > > Hi,
> > >
> > > I have a Solr (3.2.0) instance with omitNorms=true on all fields. This
> > has
> > > been done in an attempt to not penalize documents which have many
> terms.
> > >
> > > What puzzles me is that, despite omitNorms=true, the fieldNorm is still
> > > calculated and affects the score.
> > >
> > > ---
> > > 3.7455106 = (MATCH) fieldWeight(track_hierarchynode:pop in 258465),
> > > product of:
> > > 2.236068 = tf(termFreq(track_hierarchynode:pop)=5)
> > > 6.700173 = idf(docFreq=6960, maxDocs=2080776)
> > > 0.25 = fieldNorm(field=track_hierarchynode, doc=258465)
> > > ---
> > >
> > > How is the fieldNorm value calculated when omitNorms=true?
> > >
> > > Kind regards,
> > > Stian Brattland
> > >
> >
>


Re: MoreLikeThis with externally supplied text, and facets?

2022-09-20 Thread Mikhail Khludnev
For reference, the trick above doesn't work now, I'll work on it under
https://issues.apache.org/jira/browse/SOLR-16420.
Note, that fix for facet support in /mlt handler will be released under 9.1.

On Fri, Sep 9, 2022 at 2:37 PM Mikhail Khludnev  wrote:

> Hold on. JSON query DSL lets you pass quite long content via body. It
> should support {!mlt}. At least it's worth a try.!
>
> On Thu, Sep 8, 2022 at 2:53 PM Mikhail Khludnev  wrote:
>
>> Hello Batanun
>> I checked {!mlt} source code. It can't swallow external content. I've
>> found that Lucene XML parser
>> https://lucene.apache.org/core/9_1_0/queryparser/org/apache/lucene/queryparser/xml/CorePlusQueriesParser.html
>> is capable to handle . However, it's diverged and not
>> available in Solr out-of-the-box
>> https://solr.apache.org/guide/8_8/other-parsers.html#xml-query-parser
>>
>>
>> On Thu, Sep 8, 2022 at 1:16 PM Batanun B  wrote:
>>
>>> Hi,
>>>
>>> I'm evaluating if the MoreLikeThis (mlt) feature of solr can be useful
>>> for our editors when they are creating new content. We want to trigger this
>>> before the content has been inserted in the system, so there is no document
>>> in solr that we can use as a base for the mlt search. So we want to use the
>>> "externally supplied text" feature, where we provide the article text in
>>> the request body of the search. This works great when we use the mlt
>>> request handler (/mlt). But we also would like to get facets for this
>>> search, and bug SOLR-7883 is stopping us from doing that.
>>>
>>> Some people suggest that we use the mlt query parser instead, as part of
>>> our regular request parser (/select). But I can't get that to work together
>>> with the "externally supplied text". It gives me the error "Bad contentType
>>> for search handler :text/plain".
>>>
>>> So, does anyone know how to do a search that uses MoreLikeThis with
>>> externally supplied text, and facets at the same time?
>>>
>>> Regards
>>>
>>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev


Re: How is fieldNorm calculated when omitNorms is set to true?

2022-09-20 Thread Stian Brattland
Hi Thomas,

Yes, I will most likely go ahead and set up new instances for Solr 9 and
rebuild the index. I'd be surprised if the original schema from 3.2.0 is
100% compatible with Solr 9.

Thanks :-)

Regards,
Stian

tir. 20. sep. 2022 kl. 11:04 skrev Thomas Corthals :

> Hi Stian,
>
> You can't upgrade across more than one major version. You'll have to
> reindex against a new Solr install.
>
> The ref guide has more on configuring similarity:
>
> https://solr.apache.org/guide/solr/latest/indexing-guide/schema-elements.html#similarity
>
> Wikipedia has an explanation of the BM25 ranking function:
> https://en.wikipedia.org/wiki/Okapi_BM25
>
> Kind regards,
>
> Thomas
>
> Op ma 19 sep. 2022 om 08:59 schreef Stian Brattland :
>
> > Hi Thomas,
> >
> > Yes, it may be about time to upgrade from 3.2.0. I think I will have to
> > look into that. Thanks for sharing your insight and configuration
> example.
> >
> > Kind regards,
> > Stian
> >
> >
> >
> > søn. 18. sep. 2022 kl. 16:21 skrev Thomas Corthals <
> tho...@klascement.net
> > >:
> >
> > > Hi Stian,
> > >
> > > We have the same issue with our documents. I fixed that by setting b =
> 0
> > in
> > > our schema for BM25 similarity.
> > >
> > > 
> > > 0
> > > 
> > >
> > > I don't know if BM25 can be used with your version of Solr. Personally
> I
> > > think it's worth upgrading for.
> > >
> > > Thomas
> > >
> > >
> > > Op za 17 sep. 2022 om 19:30 schreef Stian Brattland <
> st...@octetnest.no
> > >:
> > >
> > > > Hi,
> > > >
> > > > I have a Solr (3.2.0) instance with omitNorms=true on all fields.
> This
> > > has
> > > > been done in an attempt to not penalize documents which have many
> > terms.
> > > >
> > > > What puzzles me is that, despite omitNorms=true, the fieldNorm is
> still
> > > > calculated and affects the score.
> > > >
> > > > ---
> > > > 3.7455106 = (MATCH) fieldWeight(track_hierarchynode:pop in
> 258465),
> > > > product of:
> > > > 2.236068 = tf(termFreq(track_hierarchynode:pop)=5)
> > > > 6.700173 = idf(docFreq=6960, maxDocs=2080776)
> > > > 0.25 = fieldNorm(field=track_hierarchynode, doc=258465)
> > > > ---
> > > >
> > > > How is the fieldNorm value calculated when omitNorms=true?
> > > >
> > > > Kind regards,
> > > > Stian Brattland
> > > >
> > >
> >
>


RE: SolrCloud node fail to connect to another node in the cluster

2022-09-20 Thread Marco D'Ambra
Hello everyone,

I am writing in hopes of getting an answer to this mail.
We are struggling with this problem without coming to a solution.

Thanks in advance,

Marco

-Original Message-
From: Matteo Diarena  
Sent: lunedì 5 settembre 2022 11:02
To: users@solr.apache.org
Subject: R: SolrCloud node fail to connect to another node in the cluster

Sorry, my fault. I try to rewrite my email without images:

I’m experiencing a strange behaviour with a SolrCloud cluster.

Cluster description
I have a cluster with a total of 38 nodes. All nodes are installed with the 
following features:
-  OS: Debian GNU/Linux 9.13 (stretch)
-  JRE: openjdk version "11.0.6" 2020-01-14
-  Apache Solr: Apache Solr 8.11.2

The cluster nodes are divided as follows:

Nodes used for indexing
solrindex-01
solrindex-02

Nodes used for queries
solrquery-01
solrquery-02

Cluster nodes with collections
solrnode-01
…
solrnode-34

Configuration of the collection
In the cluster I have a collection (i.e testcollection) divided on the various 
nodes through different shards (one shard for each month, i.e. shard_202201, 
shard_202202, ...)

Problem
From time to time the solrquery-01 node is no longer able to query the entire 
collection and in particular it is unable to contact some replicas of the 
collection present on the other nodes of the cluster. The problem does not 
resolve itself but it is necessary to restart the Apache Solr service on the 
solrquery-01 node.

In particular:
If I try to query a specific replica from the solrquery-01 node, the request 
remains pending until it times out

Query
http://solrquery-01:8080/solr/volocomapi_search/select?q=UniqueReference:DOC_EBF3D4C11F1239852490280F583D052FC214A10D6E716BD98C19CBC599E5EFED&debug=track&shards=http://solrnode-24.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n575/

Response
 {
  "response":{"numFound":0,"start":0,"numFoundExact":true,"docs":[]},
  "debug":{
"track":{
  "rid":"solrquery-01.volo.local-232528",
  "EXECUTE_QUERY":{

"http://solrnode-24.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n575/":{
  "Exception":"Timeout occured while waiting response from server at: 
http://solrnode-24.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n575/select"
}

By executing the same query from another node (eg: solrnode-01) the query is 
successful.

Query
http://solrnode-01:8080/solr/volocomapi_search/select?q=UniqueReference:DOC_EBF3D4C11F1239852490280F583D052FC214A10D6E716BD98C19CBC599E5EFED&debug=track&shards=http://solrnode-24.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n575/

Response:
 {
  
"response":{"numFound":0,"start":0,"maxScore":0.0,"numFoundExact":true,"docs":[]},
  "debug":{
"track":{
  "rid":"solrnode-01.volo.local-1849853",
  "EXECUTE_QUERY":{

"http://solrnode-24.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n575/":{
  "QTime":"0",
  "ElapsedTime":"28",
  "RequestPurpose":"GET_TOP_IDS,SET_TERM_STATS",
  "NumFound":"0",

"Response":"{responseHeader={zkConnected=true,status=0,QTime=0},response={numFound=0,numFoundExact=true,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}"
}

The same happens if I try to run the query from solrquery-01 node to a 
different replica

Query
http://solrquery-01:8080/solr/volocomapi_search/select?q=UniqueReference:DOC_EBF3D4C11F1239852490280F583D052FC214A10D6E716BD98C19CBC599E5EFED&debug=track&shards=http://solrnode-23.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n573/

Response
 {
  
"response":{"numFound":0,"start":0,"maxScore":0.0,"numFoundExact":true,"docs":[]},
  "debug":{
"track":{
  "rid":"solrquery-01.volo.local-232531",
  "EXECUTE_QUERY":{

"http://solrnode-23.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n573/":{
  "QTime":"0",
  "ElapsedTime":"88",
  "RequestPurpose":"GET_TOP_IDS,SET_TERM_STATS",
  "NumFound":"0",
  
"Response":"{responseHeader={zkConnected=true,status=0,QTime=0},response={numFound=0,numFoundExact=true,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}"
}


Checking the network traffic with tcpdump on the solrquery-01 machine does not 
show any connection as it does on the solrnode-01 machine

tcpdump from the solrquery-01 machine
 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode 
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes


tcpdump on the solrnode-01 machine

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode 
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
10:57:10.979736 IP solrnode-01.volo.local.39888 > 
solrnode-24.volo.local.http-alt: Flags [P.], seq 881884455:881885148, ack 
1974049136, win 364, options [nop,nop,TS val 561210041 ecr 561833498], length 
693: HTTP
10:57:11.008007 IP solrnode-01.volo.local

Stream Expression Count matches

2022-09-20 Thread Sergio García Maroto
Hi,

I am trying to find a way to get the number of results after running a few
nested streaming expressions.
Similar to the numFound  parameter on select numFound":50743918.

I found stats something similar. Althought this only applies to stream
expressions with queries embedded.
stats(articles, q="CommentTextS:fintech",count(*))

Let's say I have an stream expression as bellow. Do I need to retrieve the
full list of PersonIDSDV to count it?
sort(
rollup(
merge(
  search(articles, q="CommentTextS:fintech",  qt="/export",
fl="PersonIDSDV", sort="PersonIDSDV asc"),
  merge(
 search(comments, q="CommentTextS:"fintech"", qt="/export",
 fl="PersonIDSDV", sort="PersonIDSDV asc"),
 search(topics, q="CommnetTextS:fintech", qt="/export", fl="PersonIDSDV",
sort="PersonIDSDV asc"),
  on="PersonIDSDV asc"),
on="PersonIDSDV asc"),
over="PersonIDSDV",
count(*)
),
by="count(*) desc"
)

Thanks for the support
Sergio Maroto


Re: ANN: ApacheCON "BoaF" for Solr

2022-09-20 Thread Alessandro Benedetti
Thanks David for organizing!
See you there!
--
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 | Github



On Mon, 19 Sept 2022 at 14:44, David Smiley  wrote:

> FYI
> I arranged for a Solr Birds of a Feather (BoaF) session at ApacheCon, on
> Wednesday October 5th, 5:50-6:30 pm.  You will see it in the Expo Pass
> app.  I expect it'll be relatively informal; a good time to meet 'n greet,
> and make dinner arrangements with community members.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>


Loading solr.xml from zookeeper

2022-09-20 Thread Jan Høydahl
Hi,

It has been possible to load solr.xml centrally from zookeeper for a long time.
However, I'm considering deprecating and removing this feature.
Please see https://issues.apache.org/jira/browse/SOLR-15959 for motivation.

My question to the users list is thus - are you loading solr.xml from zookeeper?
And if yes, why is that capability important for you - i.e. could you not 
configure it per node?

Jan

Re: Loading solr.xml from zookeeper

2022-09-20 Thread Shawn Heisey

On 9/20/22 14:18, Jan Høydahl wrote:

It has been possible to load solr.xml centrally from zookeeper for a long time.
However, I'm considering deprecating and removing this feature.
Please see https://issues.apache.org/jira/browse/SOLR-15959 for motivation.

My question to the users list is thus - are you loading solr.xml from zookeeper?
And if yes, why is that capability important for you - i.e. could you not 
configure it per node?


I have done very little with SolrCloud myself.  I converted my tiny 
little install to cloud with embedded zk, just the one server, one 
collection and one core.  I do not have solr.xml in ZK.  I did this so I 
have access to whatever functionality is cloud-only, should a need ever 
arise.  I fiddle with that install sometimes to try and answer support 
questions.  Rebuilding the index only takes about ten minutes, so if I 
screw something up I just restore the working config, delete the data 
directory, restart, and reindex.


I can see a lot of value in being able to fire up a Solr node with only 
/etc/default/solr.in.sh being provided.  The solr home can then be 
provided completely empty and the node will start, as long as solr.xml 
is in ZK.


One thing I think we should do is make it so that Solr starts with some 
defaults, if solr.xml is not found at all.  We can then bikeshed about 
what the defaults should be.


Making it possible for cloud mode to start without solr.xml might remove 
most people's need to have it live in ZK.  It would make things easier 
on docker users ... they would be able to attach a completely empty 
volume for the solr home and Solr would start. They might then go back 
and add a solr.xml to provide custom settings.


Thanks,
Shawn



Which method is more performant? SubQueries or Collapsing the results?

2022-09-20 Thread Daxesh Parmar
Hii,

Can anyone pls provide documentation for internal working of solr
Subqueries. and also if we compare the same with collapse and expand
queries, which one is efficient in terms of response time and performance.