Difference in response times between direct to shard vs random shard with route param

2021-03-15 Thread yasoobhaider
Hi

I'm trying to create the right design for a Solr Cloud cluster which is
robust and responsive. I've been playing with different versions.

I'll share details about the two versions that I'm comparing.

Cluster details:

1 collection, 4 shards, 2 replicas each.
8 nodes. So 1 replica on 1 node. Each node is 32G memory, 16 cores. Heap
size is 24G. Using Solr 7.6 with G1GC as that gave better performance over
CMS.
Collections size is small ~8GB overall (I know, very small for sharding. But
our queries are extremely complex)

The collection is sharded using implicit router - using city id.

The two configurations I'm trying are:

1. Send the query to a loadbalancer with _route_=city_id (or
_route_=shardnum) which sends it to 1 of the 8 boxes. This box gets the
result from the "owner" shard and returns the results.
2. Send the query directly to 1 of the replica of the owner shard.

I also add "shards.preference=replica.location:local" to queries in both
versions.

So, if I have Nodes N1, N2 with S1R1 and S1R2, N3, N4 with S2R1 and S2R2 and
so on, then a query for a city which is in shard1, either goes to a LB which
can go to N5, which will then query either of N1 or N2 and return the
results, or it goes directly to one of N1 or N2.

The response is pretty big. It fetches ~3000 documents, and a large number
of fields (~30).

I'm measuring the response times at an Ingress Envoy at Solr end.

Results:

Difference in response times between LB and Direct call is ~20-25 ms. The
direct call is significantly faster at ~70ms avg vs ~95ms for LB call.

Checking the logs I noticed that when calling via Loadbalancer, there are
these queries in logs:

2021-03-15 07:47:39.611 INFO  (qtp731870416-2408) [c:collection_xx s:shard1
r:core_node5 x:collection_xx_shard1_replica_n2] o.a.s.c.S.Request
[collection_xx_shard1_replica_n2]  webapp=/solr path=/select
params={vsort=ntp&facet.field=c_ids&facet.field=has_d_offer&facet.field=has_xx_synergy&facet.field=new_ccs_ids&facet.field=new_pd&facet.field=new_pd_dsz_7619&facet.field=new_res_flag&facet.field=primary_category_ids&df=name&distrib=false&aaaq_score_param=7619_6&fl=real_id,name,chain_id,lat,lon,city_id,d_name:display_name,image,new_ccs_ids,cfo:c_for_one,new:if(new_res_flag,1,0),new_on_xx:if(new_on_d,1,0),hygiene_rated:termfreq(new_pd,+100353),pure_veg:termfreq(new_pd,+100354),gold:termfreq(new_pd,+102236),pro:termfreq(new_pd,+166788),otof:termfreq(new_pd,+114445),hyperpure:termfreq(new_pd,+100355),exclusive:if(has_xx_synergy,1,0),has_offer:if(has_d_offer,1,0),trending:termfreq(new_pd_dsz_7619,+1),has_gourmet:termfreq(new_pd,+166253),ncw_offer:termfreq(new_pd,+168125),ncw_brand:termfreq(new_pd,+168176),hygiene_rating,c_ids,avg_commission_per_order,otr_value_um,otr_value_mm,otr_value_la,otr_value_default,primary_category_ids,compliance_level,cgen_embedding,asv,rating_aggregate:rating,votes,is_suspicious,{!key%3Dscore}$raw_score+&fl=id&shards.purpose=64&start=0&fq=(serviceable_cells:(4306215339680591232))+OR+{!geofilt+filter%3Dtrue+sfield%3Dlatlon_location_rpt+pt%3D13.498693941619575,70.84631715107243+d%3D7}&fq=+((d_pondy_5_1_start:[*+TO+2130]++AND+d_pondy_5_1_end:[2130+TO+*])+OR+(d_pondy_5_2_start:[*+TO+2130]++AND+d_pondy_5_2_end:[2130+TO+*])+OR+(d_pondy_5_3_start:[*+TO+2130]++AND+d_pondy_5_3_end:[2130+TO+*]))&fq=+has_online_order_flag:1&fq=+opening_soon_flag:false&fq=+status_id:(1+OR+13)&fq=+temp_closed_flag:false&raw_score=sum(product(def(conversion_score_dsz_v3_final_score_7619_6,+0),+1))&shard.url=http://a.b.c.d:8080/solr/collection_xx_shard1_replica_n2/|http://a.b.c.e:8080/solr/collection_xx_shard1_replica_n17/&rows=3000&version=2&facet.query=dish_score_9d20b49f8cf0c79ce7b44b2ef69f51df_2:[0+TO+*]&facet.limit=1000&q=(*)+AND+_val_:"+sum(+$vsort,+product(400,+0),+-200+)+"&NOW=1615794459480&ids=res_19338624,res_18645801,res_19258866,...[[-->
Some ~2900 ids here  <-]]
,res_18565614,res_19250515,res_19282565,res_18818033,res_18372078,res_19527362,res_18899444&isShard=true&facet.mincount=1&boosted=0&facet=false&wt=javabin}
status=0 QTime=30

Notice the "Some 2900 ids here" part.

Questions:

Is this some inter-node communication happening? Is this what is leading to
the difference in response times.

If not this, then what else could be leading to the difference in response
times?





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Croatian language support

2021-03-15 Thread Georg Sorst
Hi,

according to https://solr.apache.org/guide/8_8/language-analysis.html Solr
currently does not support Croatian out of the box.

Does anyone have suggestions on how to implement this? I would also be
interested in sponsoring the development of this.

Thanks!
Georg


Re: Croatian language support

2021-03-15 Thread Emir Arnautović
Hi Georg,
I did not use it in a while, but your best bet is to try hunspell stemmer. 
There are dictionaries for Croatian.
I used to work on library for morphological analysis for Slavic languages, but 
I never made Solr wrapper for that. That was Java based implementation of phd 
dissertation where rules were precisely described. The paper was publicly 
available so you could also find it and use as starting point for your stemmer.

HTH,
Emir

--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 15 Mar 2021, at 12:16, Georg Sorst  wrote:
> 
> Hi,
> 
> according to https://solr.apache.org/guide/8_8/language-analysis.html Solr
> currently does not support Croatian out of the box.
> 
> Does anyone have suggestions on how to implement this? I would also be
> interested in sponsoring the development of this.
> 
> Thanks!
> Georg



Solr upgrade from version 4.4

2021-03-15 Thread John Nielsen
Hi all,

We are currently running solr 4.4 on a 3 node cluster. We have never had an
incentive to upgrade. That old solr version is heavily integrated into our
infrastructure, so we always considered an upgrade to be a monumental task.
We recently started looking at the new solr features and we are starting to
reconsider if it might be worth it to upgrade after all.

My google-fu has failed me on some points and I was hoping that someone
here might help with them.

We are still using the old static core loading system which was deprecated
in version 4.0. We will obviously need to redo that part.

Apart from that, would it be reasonably safe to assume that our current
schema and configuration would work out of the box with with solr 8.8 or
should we expect to need to redo the configuration?

Phonetic search is the most enticing of the new features for us. It doesn't
look like it has support for Danish, however. Does that mean that this
feature is a no-go for us, or are there other ways of making it work, like
a "generic" language setting? I couldn't find any information regarding
this.

We have looked at the new spell checking and search suggestions. The
documentation references dictionary support, yet i couldn't find anything
for Danish. Are the dictionaries something which is mean to be hand-crafted
or is there an external source we might use? If not, would an index based
spellchecker be a better choice for us?



-- 

JOHN NIELSEN

DevOps / Programmør





www.mcb.dk

Telefon: +45 70-1-2-4-8-16

Lægårdvej 86 
B

DK-7500 Holstebro


Re: Solr upgrade from version 4.4

2021-03-15 Thread matthew sporleder
Your old schema will need minor adjustments as the classes for a lot
of core types have changed.

Your old configs will also just mostly need mechanical translation
from one xml to another, but probably nothing major if you were using
the standard /select and friends.

The biggest issue for most people upgrading is that you will need to
re-index all of your data.
If you were doing custom stuff for core loading/replication that can
also be a big hurdle for people but solrcloud is so *easy* once you
get it going that I think it pays off.


On Mon, Mar 15, 2021 at 8:07 AM John Nielsen  wrote:
>
> Hi all,
>
> We are currently running solr 4.4 on a 3 node cluster. We have never had an
> incentive to upgrade. That old solr version is heavily integrated into our
> infrastructure, so we always considered an upgrade to be a monumental task.
> We recently started looking at the new solr features and we are starting to
> reconsider if it might be worth it to upgrade after all.
>
> My google-fu has failed me on some points and I was hoping that someone
> here might help with them.
>
> We are still using the old static core loading system which was deprecated
> in version 4.0. We will obviously need to redo that part.
>
> Apart from that, would it be reasonably safe to assume that our current
> schema and configuration would work out of the box with with solr 8.8 or
> should we expect to need to redo the configuration?
>
> Phonetic search is the most enticing of the new features for us. It doesn't
> look like it has support for Danish, however. Does that mean that this
> feature is a no-go for us, or are there other ways of making it work, like
> a "generic" language setting? I couldn't find any information regarding
> this.
>
> We have looked at the new spell checking and search suggestions. The
> documentation references dictionary support, yet i couldn't find anything
> for Danish. Are the dictionaries something which is mean to be hand-crafted
> or is there an external source we might use? If not, would an index based
> spellchecker be a better choice for us?
>
>
>
> --
>
> JOHN NIELSEN
>
> DevOps / Programmør
>
>
>
>
>
> www.mcb.dk
>
> Telefon: +45 70-1-2-4-8-16
>
> Lægårdvej 86 
> B
>
> DK-7500 Holstebro


Re: Solr upgrade from version 4.4

2021-03-15 Thread Charlie Hull

Hi John,

Solr 4.4. is very old - if you are to upgrade to 8.8 (and you should 
really) you'll be able to take very little with you. Your old 
configuration can serve as inspiration but not a direct source. It will 
be a big task, but if you approach it the right way you'll be fine.


Solr 8.8 does support Danish in terms of stemming etc. 
https://solr.apache.org/guide/8_0/language-analysis.html and there are 
plenty of people using Solr in Denmark (e.g. the Danish Library - see 
Tove's excellent blog https://sbdevel.wordpress.com/author/eskildsen/ 
and an old client of mine, Infomedia). There has been some work on a 
Danish soundex algorithm it seems (old blog from Findwise here 
https://findwise.com/blog/things-to-consider-when-implementing-phonetic-search/) 
but it was originally built for English it seems. I'd think about what 
soundex will actually help with though, there may be other ways to help 
your users.


Spell suggestion is often done using the index, not a dictionary, 
although you can use a dictionary if you like (e.g. via Hunspell). It 
thus corrects to words you know reference items in your index.


Hope this helps. By the way it looks like you're in ecommerce, we've 
been writing a lot about ecommerce and Solr recently as we're 
contributing to a reference implementation, Chorus 
https://opensourceconnections.com/blog/2020/10/29/a-tool-stack-for-open-source-ecommerce-site-search/


Cheers

Charlie


On 15/03/2021 12:07, John Nielsen wrote:

Hi all,

We are currently running solr 4.4 on a 3 node cluster. We have never had an
incentive to upgrade. That old solr version is heavily integrated into our
infrastructure, so we always considered an upgrade to be a monumental task.
We recently started looking at the new solr features and we are starting to
reconsider if it might be worth it to upgrade after all.

My google-fu has failed me on some points and I was hoping that someone
here might help with them.

We are still using the old static core loading system which was deprecated
in version 4.0. We will obviously need to redo that part.

Apart from that, would it be reasonably safe to assume that our current
schema and configuration would work out of the box with with solr 8.8 or
should we expect to need to redo the configuration?

Phonetic search is the most enticing of the new features for us. It doesn't
look like it has support for Danish, however. Does that mean that this
feature is a no-go for us, or are there other ways of making it work, like
a "generic" language setting? I couldn't find any information regarding
this.

We have looked at the new spell checking and search suggestions. The
documentation references dictionary support, yet i couldn't find anything
for Danish. Are the dictionaries something which is mean to be hand-crafted
or is there an external source we might use? If not, would an index based
spellchecker be a better choice for us?





--
Charlie Hull - Managing Consultant at OpenSource Connections Limited 

Founding member of The Search Network  
and co-author of Searching the Enterprise 


tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828


Unrecognized command line option when starting Solr 8.7 with Java 11 Open J9

2021-03-15 Thread Elizaveta Golova
Hello,


We're currently migrating from Java 8 to Java 11.


On Windows, when starting Solr 8.7.0 with OpenJ9 java, we encounter the 
following message:
JVMJ9VM007W Command-line option unrecognised: 
-Xlog:gc*:file="C:\solr\server\logs\solr_gc.log":time,uptime:filecount=9,filesize=20M
JVMJ9GC063E Unable to open file '"C' for writing


The console is then filled with garbage collection output, but, appears to 
start and run ok.
When changing to Java 11 Hotspot of the same release, we do not see any warning 
or other issues.


This has also been run on a Java 11 Open J9 on Ubuntu. A "Command-line option 
unrecognised" message appears in the Solr logs but not on the console. The 
solr_gc log is created ok and appears to be populated.


This was seen using the OpenJDK 11.0.10+9 release.


Here is an example of our startup command:
C:/solr/bin/solr.cmd start -c -p 8983 -s 
C:/data/solr/8/clusters/is_cluster/nodes/node1 -a 
"-Dcom.ibm.crypto.provider.DoRSATypeChecking=false 
-Djetty.server=com.i2group.disco.search.solr.jetty.wrapper.XmlConfigurationWrapper
 -Dpkiauth.ttl=6 -Dsolr-node-id=node1 
-DzkCredentialsProvider=com.i2group.disco.search.solr.common.zookeeper.auth.internal.EncodedZkCredentialsProvider
 -Djdk.nativeCrypto=false" -m 2g -z "localhost:9983/is_cluster"


Thanks!


Lisa



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



Replication keeps failing for the Repeaters in the Solr legacy setup

2021-03-15 Thread Pushkar Raste
Hi,
We are using the legacy solr setup (I am going to refer to the nodes as
leaders and followers. I think that’s what the Solr community has decided
to call those). Our setup roughly looks like

 *Repeater (Remote DC)* --- *Followers(Remote DC)*
   |
   |
*Leader* --- *Repeater* (*Local DC*)  *Followers (Local DC)*

For the past couple of days repeaters in the geographically remote data
center are failing to keep up with the repeater in Local DC. As per logs
the files on the Local DC get deleted before getting copied to the repeater
in remote DC. In order for the remote DC repeater to sync up, we have to
disable pollin on the repeater in the local DC.

I am trying to figure out how to make the files on the Local DC stick
around longer. I am trying to wrap around the suggestions in
https://solr.apache.org/guide/7_7/index-replication.html

I am trying to understand if tuning commitReserveDuration and/or
maxCommitsToKeep would help. Currently with have commitReserveDuration set
to 1 hr and maxCommitsToKeep to 1.

We are also looking into enabling compression for replication between the
Local DC repeater to Remote DC repeater.

Can people provide other suggestions. I understand that not a lot of us are
using the legacy setup but just wondering if the Pull Replicas use the same
mechanism as the Legaacy setup and if anyone using the Pull Replicas have
faced similar issues.


RE: Unrecognized command line option when starting Solr 8.7 with Java 11 Open J9

2021-03-15 Thread Subhajit Das
Hi Elizaveta,

I have also encountered a VerifyError on OpenJ9, on an unrelated thing. It 
seems that OpenJ9 is not a good choice for stability.
Prefer using Hotspot either way.


From: Elizaveta Golova
Sent: 15 March 2021 09:02 PM
To: solr-u...@lucene.apache.org
Subject: Unrecognized command line option when starting Solr 8.7 with Java 11 
Open J9

Hello,


We're currently migrating from Java 8 to Java 11.


On Windows, when starting Solr 8.7.0 with OpenJ9 java, we encounter the 
following message:
JVMJ9VM007W Command-line option unrecognised: 
-Xlog:gc*:file="C:\solr\server\logs\solr_gc.log":time,uptime:filecount=9,filesize=20M
JVMJ9GC063E Unable to open file '"C' for writing


The console is then filled with garbage collection output, but, appears to 
start and run ok.
When changing to Java 11 Hotspot of the same release, we do not see any warning 
or other issues.


This has also been run on a Java 11 Open J9 on Ubuntu. A "Command-line option 
unrecognised" message appears in the Solr logs but not on the console. The 
solr_gc log is created ok and appears to be populated.


This was seen using the OpenJDK 11.0.10+9 release.


Here is an example of our startup command:
C:/solr/bin/solr.cmd start -c -p 8983 -s 
C:/data/solr/8/clusters/is_cluster/nodes/node1 -a 
"-Dcom.ibm.crypto.provider.DoRSATypeChecking=false 
-Djetty.server=com.i2group.disco.search.solr.jetty.wrapper.XmlConfigurationWrapper
 -Dpkiauth.ttl=6 -Dsolr-node-id=node1 
-DzkCredentialsProvider=com.i2group.disco.search.solr.common.zookeeper.auth.internal.EncodedZkCredentialsProvider
 -Djdk.nativeCrypto=false" -m 2g -z "localhost:9983/is_cluster"


Thanks!


Lisa



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



Re: Unrecognized command line option when starting Solr 8.7 with Java 11 Open J9

2021-03-15 Thread dmitri maziuk

On 2021-03-15 12:32 PM, Subhajit Das wrote:


JVMJ9VM007W Command-line option unrecognised: 
-Xlog:gc*:file="C:\solr\server\logs\solr_gc.log":time,uptime:filecount=9,filesize=20M


Windows to unix path translation's weird. Things like 
'\\c\\solr\\server\\logs\\solr_gc.log' work in some cases.


Dima



Re: Unrecognized command line option when starting Solr 8.7 with Java 11 Open J9

2021-03-15 Thread Chris Hostetter


I know nothing about Open J9, but some quick googling suggests that OpenJ9 
changed their -Xlog parsing rules at some point -- the current docs don't 
mention anything about some of the options you are using (other then 
'file')

https://www.eclipse.org/openj9/docs/xlog/
https://www.eclipse.org/openj9/docs/xxlegacyxlogoption/





: Date: Mon, 15 Mar 2021 13:38:45 -0500
: From: dmitri maziuk 
: Reply-To: users@solr.apache.org
: To: users@solr.apache.org
: Subject: Re: Unrecognized command line option when starting Solr 8.7 with Java
:  11 Open J9
: 
: On 2021-03-15 12:32 PM, Subhajit Das wrote:
: 
: > JVMJ9VM007W Command-line option unrecognised:
: > 
-Xlog:gc*:file="C:\solr\server\logs\solr_gc.log":time,uptime:filecount=9,filesize=20M
: 
: Windows to unix path translation's weird. Things like
: '\\c\\solr\\server\\logs\\solr_gc.log' work in some cases.
: 
: Dima
: 
: 

-Hoss
http://www.lucidworks.com/


RE: Unrecognized command line option when starting Solr 8.7 with Java 11 Open J9

2021-03-15 Thread Elizaveta Golova
Hi,
 
Yes Chris, Open J9 have an issue on their backlog for this:
https://github.com/eclipse/openj9/issues/8195
 
This is related to the GC_LOG_OPTS in solr.cmd. 
Unfortunately, this can't be set in solr.in.cmd due to:
"For Java 9 or higher: GC_LOG_OPTS is currently not supported. If you set it, 
the startup script will exit with failure."
 
As the -verbose:gc option works, we've found that having the OpenJ9 java set 
the pre Java 9 GC_LOG_OPTS in solr.cmd appears to work, and doesn't change how 
HotSpot options are set.
 
if !JAVA_MAJOR_VERSION! GEQ 9 IF "%JAVA_VENDOR%" IF NOT "IBM J9"  (
REM instead of: if !JAVA_MAJOR_VERSION! GEQ 9 IF "%JAVA_VENDOR%"(
  IF NOT "%GC_LOG_OPTS%"=="" (
echo ERROR: On Java 9 you cannot set GC_LOG_OPTS, only default GC logging 
is available. Exiting
GOTO :eof
  )
  set 
GC_LOG_OPTS="-Xlog:gc*:file=\"!SOLR_LOGS_DIR!\solr_gc.log\":time,uptime:filecount=9,filesize=20M"
) else (
  IF "%GC_LOG_OPTS%"=="" (
rem Set defaults for Java 8
set GC_LOG_OPTS=-verbose:gc ^
 -XX:+PrintHeapAtGC ^
 -XX:+PrintGCDetails ^
 -XX:+PrintGCDateStamps ^
 -XX:+PrintGCTimeStamps ^
 -XX:+PrintTenuringDistribution ^
 -XX:+PrintGCApplicationStoppedTime
  )
  if "%JAVA_VENDOR%" == "IBM J9" (
set GC_LOG_OPTS=!GC_LOG_OPTS! "-Xverbosegclog:!SOLR_LOGS_DIR!\solr_gc.log" 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
  ) else (
set GC_LOG_OPTS=!GC_LOG_OPTS! "-Xloggc:!SOLR_LOGS_DIR!\solr_gc.log" 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
  )
)

 
Any chance that something like this can be put in a patch/future release?
 
Lisa
 
 
 
- Original message -
From: Chris Hostetter 
To: users@solr.apache.org
Cc:
Subject: [EXTERNAL] Re: Unrecognized command line option when starting Solr 8.7 
with Java 11 Open J9
Date: Mon, Mar 15, 2021 7:54 PM
 
I know nothing about Open J9, but some quick googling suggests that OpenJ9
changed their -Xlog parsing rules at some point -- the current docs don't
mention anything about some of the options you are using (other then
'file')

https://www.eclipse.org/openj9/docs/xlog/
https://www.eclipse.org/openj9/docs/xxlegacyxlogoption/





: Date: Mon, 15 Mar 2021 13:38:45 -0500
: From: dmitri maziuk 
: Reply-To: users@solr.apache.org
: To: users@solr.apache.org
: Subject: Re: Unrecognized command line option when starting Solr 8.7 with Java
:  11 Open J9
:
: On 2021-03-15 12:32 PM, Subhajit Das wrote:
:
: > JVMJ9VM007W Command-line option unrecognised:
: > 
-Xlog:gc*:file="C:\solr\server\logs\solr_gc.log":time,uptime:filecount=9,filesize=20M
:
: Windows to unix path translation's weird. Things like
: '\\c\\solr\\server\\logs\\solr_gc.log' work in some cases.
:
: Dima
:
:

-Hoss
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com_&d=DwIBAg&c=jf_iaSHvJObTbx-siA1ZOg&r=hYWjY91INT8BxCM7Yo3LAY4kHcOGUOO3miRla3QTVdo&m=IH4OUN74q1a6xeNFAz2UNGMRTt5UOZWmEuKTpWtOSLU&s=9D1evYvViDo924rRr4WPjFEFi5lZmfDF84MxZ-7Iuzk&e=
 
 
 Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU