Re: Solr 6 vs Solr 8 considerable performance gap

2022-08-31 Thread Sidharth Negi
Update: Solr 6 and Solr 8 both parsed the same input query differently. Is
this an expected difference given that the config and schema are the same.
I suspect this difference could also be a reason for the difference in CPU
performance.

How can I tweak Solr 8 to parse the given query as *BoostQuery* instead of
*FunctionScoreQuery *to confirm this?

*Solr 6 Debug Information:*
"querystring": "aliases:\"love\"^700 OR tedge1:love^900 OR
title_variation_edge:\"love\"^900 OR titleNgram:\"love\"^10 OR
artistEdge:\"love\"^250 OR keyword:\"love\"^10 OR title:\"love\"^1000 OR
(entity_edge:love^800)","parsedquery":
"BoostedQuery(boost(+((aliases:love)^700.0
(tedge1:love)^900.0 (title_variation_edge:love)^900.0
(titleNgram:love)^10.0 (artistEdge:love)^250.0 (keyword:love)^10.0
(title:love)^1000.0 (entity_edge:love)^800.0) ((keywords:\"700 900 (10 ten)
250 (10 ten) 1000\"~2)^25.0 | (aliases:\"700 900 (10 ten) 250 (10 ten)
1000\"~2)^50.0 | (title:\"700 900 (10 ten) 250 (10 ten)
1000\"~2)^60.0),product(sum(const(1),double(n_c_pop)),product(const(1.0E-4),sum(const(1),sum(product(const(0.01),long(search_click)),product(const(1),long(search_clickone)),product(const(10),long(search_clicktwo)),product(const(100),long(search_clickthree",
*Solr 8 Debug Information:*"querystring": "aliases:\"love\"^700 OR
tedge:love^900 OR title_variation_edge:\"love\"^900 OR
titleNgram:\"love\"^10 OR artistEdge:\"love\"^250 OR keyword:\"love\"^10 OR
title:\"love\"^1000 OR (entity_edge:love^800)","parsedquery": "
FunctionScoreQuery(FunctionScoreQuery(+((aliases:love)^700.0
(tedge:love)^900.0 (title_variation_edge:love)^900.0 (titleNgram:love)^10.0
(artistEdge:love)^250.0 (keyword:love)^10.0 (title:love)^1000.0
(entity_edge:love)^800.0), scored by
boost(product(sum(const(1),double(n_c_pop)),product(const(1.0E-4),sum(const(1),sum(product(const(0.01),long(search_click)),product(const(1),long(search_clickone)),product(const(10),long(search_clicktwo)),product(const(100),long(search_clickthree)"
,
*Solr 6 Query Params:*
"responseHeader": {"status": 0,"QTime": 366,"params": {"q":
"aliases:\"love\"^700
OR tedge:love^900 OR title_variation_edge:\"love\"^900 OR
titleNgram:\"love\"^10 OR artistEdge:\"love\"^250 OR keyword:\"love\"^10 OR
title:\"love\"^1000 OR (entity_edge:love^800)","debug": "true","fl":
"doc_id,recording_id,title,type,status,keywords,aliases,score,search_click,thumbnail,year,album,artist,primaryArtist,language,original_title,is_hellotune,isCurated,explicitType"
,"start": "0","boost":
"product(sum(1,n_c_pop),product(0.0001,sum(1,sum(product(0.01,search_click),product(1,search_clickone),product(10,search_clicktwo),product(100,search_clickthree)"
,"fq": ["status:(PUBLISH_PUBLISHED LIVE ENRICHED)","((type:song type:artist
type:playlist type:podcast) OR (type:album AND ((creationTime:[* TO
NOW/DAY-31DAYS ] AND size:[2 TO *]) OR creationTime:[NOW/DAY-31DAYS TO *
])))"],"rows": "40","wt": "json","debug.explain.structured": "true"}
*Solr 8 Query Params:*
"responseHeader": {"status": 0,"QTime": 40,"params": {"q":
"aliases:\"love\"^700
OR tedge:love^900 OR title_variation_edge:\"love\"^900 OR
titleNgram:\"love\"^10 OR artistEdge:\"love\"^250 OR keyword:\"love\"^10 OR
title:\"love\"^1000 OR (entity_edge:love^800)","debug": "true","fl":
"doc_id,recording_id,title,type,status,keywords,aliases,score,search_click,thumbnail,year,album,artist,primaryArtist,language,original_title,is_hellotune,isCurated,explicitType"
,"start": "0","boost":
"product(sum(1,n_c_pop),product(0.0001,sum(1,sum(product(0.01,search_click),product(1,search_clickone),product(10,search_clicktwo),product(100,search_clickthree)"
,"fq": ["status:(PUBLISH_PUBLISHED LIVE ENRICHED)","((type:song type:artist
type:playlist type:podcast) OR (type:album AND ((creationTime:[* TO
NOW/DAY-31DAYS ] AND size:[2 TO *]) OR creationTime:[NOW/DAY-31DAYS TO *
])))"],"rows": "40","wt": "json","debug.explain.structured": "true"}}

On Sat, Aug 27, 2022 at 2:38 AM Shawn Heisey 
wrote:

> On 8/26/22 14:18, Sidharth Negi wrote:
> > The disk space taken by the index of both Solr versions was about ~35 GB
> > and the number of docs ~30 million in both.
>
> Unless that system is handling insanely complex queries that chew up
> lots of memory, I would not expect it to need more than about 8GB of
> heap with that index size, and quite possibly even less.
>
> 48GB total system memory would probably work if the server is not
> handling anything other than that Solr index.
>
> If you share solr GC logs that cover a day or more of heavy usage
> (several days being better), I might be able to determine a more precise
> heap size recommendation.  Unfortunately there is no generic way to
> calculate heap requirements based on data statistics.  GC logs provide
> the best info, otherwise it requires experimentation.  My guess of 8GB
> is just that -- a guess.  It might be wrong, either too high or too low.
>
> > Let me run an experiment using the same GC settings on both to see if
> that
> > work

AW: SOLR API - modifying the schema for one collection modifies it for ALL collections ?

2022-08-31 Thread Andreas Mock
Hi Chris,

it is the same I struggled with when I started with Solr.

The ConfigSet is a set of configuration you refer to with one or more cores.
Changing settings for the core changes the ConfigSet, therefore the setting
for all cores refering that ConfigSet.

Solution: Make a full copy of a ConfigSet, when you want to change settings
for a core individually. 
We decided to keep one ConfigSet as a kind of default, not refered by any core.
It is THE template keeping all ConfigSet changes valid for all cores we have.
We copy that template and apply as much config changes programatically
as possible to that base ConfigSet. If we have to "rollback", we delete the 
core,
we delete the ConfigSet (directory), copy the default ConfigSet, add the core,
apply all config changes programatically.

I hope this helps what "cloning" means.

Regards
Andreas


-Ursprüngliche Nachricht-
Von: Christopher Schultz  
Gesendet: Mittwoch, 31. August 2022 00:22
An: users@solr.apache.org; Jan Høydahl 
Betreff: Re: SOLR API - modifying the schema for one collection modifies it for 
ALL collections ?

Jan, Serban,

On 5/3/22 19:09, Jan Høydahl wrote:
> Your Collection's schema is part of a ConfigSet 
> (https://solr.apache.org/guide/8_11/config-sets.html 
> ).
> Multiple collections may use the same ConfigSet.
> The gotcha is that when you modify schema for collection A, what you really 
> do is modify the schema in the ConfigSet for collection A.
> IMO the schema API should have been part of configset api to clarify this.
> 
> You can get around this by clone your ConfigSet and create the new collection 
> using the cloned copy.

I've been playing-around with trying to programmatically alter the schema for a 
Solr core as well, and this explains a LOT of weirdness I was experiencing.

Jan, thanks for explaining the root problem, but your solution of just "cloning 
your Configset" doesn't offer much detail. How does one clone a config set? How 
does one USE a Configset? What if you need to delete a configSet to start-over?

Thanks,
-chris

>> 3. mai 2022 kl. 17:22 skrev Serban Alexe :
>>
>> Hi all,
>>
>> I have a SOLR Server with several collections.
>>
>> To retrieve the properties of the fields, I ran these requests:
>>
>>- GET http://localhost:8983/solr//schema/fields
>>- GET http://localhost:8983/solr//schema/fields
>>
>> Then I ran a POST http://localhost:8983/solr//schema
>> request, with the content needed to change some fields properties.
>>
>> After this, I noticed that *the same fields properties changes were 
>> also modified for* collection_id_2.
>>
>> Is this normal behaviour ?
>> What did I do wrong ?
>> Thank you.
>>
>> --
>> Şerban Alexe
> 
> 


Re: SOLR API - modifying the schema for one collection modifies it for ALL collections ?

2022-08-31 Thread Shawn Heisey

On 8/30/22 16:22, Christopher Schultz wrote:

I've been playing-around with trying to programmatically alter the 
schema for a Solr core as well, and this explains a LOT of weirdness I 
was experiencing.


Jan, thanks for explaining the root problem, but your solution of just 
"cloning your Configset" doesn't offer much detail. How does one clone 
a config set? How does one USE a Configset? What if you need to delete 
a configSet to start-over?


The answer may be different depending on whether you're in cloud mode or 
standalone mode.


In standalone, the configsets are on the disk, in 
${solr.solr.home}/configsets.  In cloud mode, they are in zookeeper.


If you're using the API to modify something, that distinction probably 
doesn't matter.  But it probably does matter when it comes to creating a 
new configset.  collection.  The configset must be in place before it 
can be used in a core/collection creation.  For SolrCloud, configset 
manipulation is possible using the API, but I don't think it's possible 
in standalone.


https://solr.apache.org/guide/solr/latest/configuration-guide/configsets-api.html

Thanks,
Shawn



mod function applied to ms function not working correctly

2022-08-31 Thread gnandre
mod function is not returning correct values when applied to ms(NOW) or
_version_ fields.

"_version_":1697770046865014784, "ms(NOW)":1661979881038, "mod(ms(NOW),10)":
0.0, "mod(_version_,10)":6.0

It should return 8 and 4 respectively. Am I missing something?


Re: mod function applied to ms function not working correctly

2022-08-31 Thread Mike Drob
I think this is
https://issues.apache.org/jira/browse/SOLR-16361 which is already being
worked on, and should be fixed in there next release. I don’t think we have
a workaround currently.

Mike

On Wed, Aug 31, 2022 at 4:07 PM gnandre  wrote:

> mod function is not returning correct values when applied to ms(NOW) or
> _version_ fields.
>
> "_version_":1697770046865014784, "ms(NOW)":1661979881038,
> "mod(ms(NOW),10)":
> 0.0, "mod(_version_,10)":6.0
>
> It should return 8 and 4 respectively. Am I missing something?
>


Log forwarders and Solr popularity

2022-08-31 Thread MOSS, David
The popularity of Solr has waned in recent years with Elasticsearch taking much 
of the "market share".
I believe an important factor in this is the lack of options for forwarding 
logs to Solr.

Most of the log forwarding options target Elasticsearch out of the box. Splunk 
has its own dedicated universal forwarder. That makes it very easy to choose 
Elasticsearch or Splunk. Those who try Solr beyond the tutorial stage  
immediately run into problems forwarding log data to Solr.

Logstash is a popular forwarding option that claims to target Solr. When you 
try to use it however there is a bug in the ruby code for RSolr  that requires 
importing an older version and hand editing. It does not work out of the box. 
Most of the other log forwarding options suggest sending their logs to Logstash 
and having that forward them on to Solr.

So, once a new user has finished testing with the tutorial data and is all 
enthusiastic about Solr, the nightmare begins when they try to send their own 
log data for indexing. I suspect most give up at this point and go to 
Elasticsearch and Splunk. They make ingestion easy.

The solution?
The Solr community could make the effort to fix Logstash, or develop a similar 
log forwarder that does work out of the box with Solr.
Difficulty with log forwarding is a dealbreaker for many people.

So, who is up for it?

Regards,
David Moss
Technology Support Officer, Identity & Access Management,
Information Security Services
Enterprise Technology Services | Information and Technologies Branch
Department of Education

[cid:image001.jpg@01D8BDEE.276583A0]P: 0467441553 | M: 0467441553 | E: 
david.m...@qed.qld.gov.au
Level 13| AM60 | 60 Albert Street | Brisbane QLD 4000
PO Box 15033 | City East QLD 4002

[cid:image002.png@01D8BDEE.276583A0][cid:image003.png@01D8BDEE.276583A0][cid:image004.png@01D8BDEE.276583A0]
 [cid:image005.png@01D8BDEE.276583A0] [cid:image006.png@01D8BDEE.276583A0]
 Pronouns He/Him/His




***

IMPORTANT: This email and any attachments may contain legally privileged, 
confidential or private information, and may be protected by copyright. You may 
only use or disclose this information if you are the intended recipient(s) and 
if you use it in an authorised way. No other person is allowed to use, review, 
alter, transmit, disclose, distribute, print or copy this email and any 
attachments without appropriate authorisation.

If you are not the intended recipient(s) and the email was sent to you by 
mistake, please notify the sender immediately by return email or phone, destroy 
any hardcopies of this email and any attachments and delete it from your 
system. Any legal privilege and confidentiality attached to this email is not 
waived or destroyed by that mistake.

The Department of Education carries out monitoring, scanning and blocking of 
emails and attachments sent from or to addresses within the Department of 
Education for the purposes of operating, protecting, maintaining and ensuring 
appropriate use of its computer network. It is your responsibility to ensure 
that this email does not contain and is not affected by computer viruses, 
defects or interference by third parties or replication problems (including 
incompatibility with your computer system).

The Department of Education does not accept any responsibility for any loss or 
damage that may result from reliance on, or the use of, any information 
contained in the email and any attachments.

***


Re: mod function applied to ms function not working correctly

2022-08-31 Thread gnandre
Thanks, that is exactly it.

On Wed, Aug 31, 2022 at 5:58 PM Mike Drob  wrote:

> I think this is
> https://issues.apache.org/jira/browse/SOLR-16361 which is already being
> worked on, and should be fixed in there next release. I don’t think we have
> a workaround currently.
>
> Mike
>
> On Wed, Aug 31, 2022 at 4:07 PM gnandre  wrote:
>
> > mod function is not returning correct values when applied to ms(NOW) or
> > _version_ fields.
> >
> > "_version_":1697770046865014784, "ms(NOW)":1661979881038,
> > "mod(ms(NOW),10)":
> > 0.0, "mod(_version_,10)":6.0
> >
> > It should return 8 and 4 respectively. Am I missing something?
> >
>


Re: Forcing solr to run query on replica Nodes

2022-08-31 Thread Satya Nand
Thank you Shawn.
If I eliminate this indexing node and create 8 NRT shards on these 8 query
nodes. Meaning indexing will be happening on all 8 nodes and queries too.

Will it create any impact on response time? currency commit interval is 15
minus.

On Tue, Aug 30, 2022 at 8:46 PM Shawn Heisey 
wrote:

> On 8/30/22 08:08, Satya Nand wrote:
> > For querying, we have used *shard.preference as PULL *so that all queries
> > are returned from pull replicas.
> >
> > How can I force solr to use only pull replicas? in case one of the pull
> > replicas is not available then I want partial results to be returned
> from 7
> > replicas but never want to query on NRT replicas.
>
> With that shards.preference set to a replica type of PULL, it will only
> go to NRT if it has no other choice.
>
> I am not aware of any way to *force* it to only use the preferred type.
> Creating an option for that has the potential to interfere with high
> availability, so I don't know how receptive devs will be to the idea.
> You should open an enhancement issue in Jira.
>
> Thanks,
> Shawn
>
>