Re: Solr 8.10.1 performance degradation vs Solr 6.6.1
After testing different options. It seems the scale function has some performance issue. Other functions are fine. I had to replace it by log() to get similar functionallity. Regards, Sergio Maroto On Wed, 6 Apr 2022 at 17:18, Sergio García Maroto wrote: > Thanks Mike. > It seems like Solr 8 is using different parsing > > *Solr6* > {!boost > b=sum(scale(PeopleTotalSD,1,2),scale(AssignmentsTotalSD,1,2))}((CompanyTypeSFD:("Private > Company") OR ((CompNameFreeTextS:(BASF ))^0.5 OR (CompAliasFreeTextS:(BASF > ))^0.5) OR NationalitySFD:(Algeria) OR > CompWebS:(http\:\/\/newCompanyWebsite)) AND ((*:* > -CompanyStatusSFD:("\*\*\*System Delete\*\*\*")) AND type_level:(parent))) > {!boost > b=sum(scale(PeopleTotalSD,1,2),scale(AssignmentsTotalSD,1,2))}((CompanyTypeSFD:("Private > Company") OR ((CompNameFreeTextS:(BASF ))^0.5 OR (CompAliasFreeTextS:(BASF > ))^0.5) OR NationalitySFD:(Algeria) OR > CompWebS:(http\:\/\/newCompanyWebsite)) AND ((*:* > -CompanyStatusSFD:("\*\*\*System Delete\*\*\*")) AND type_level:(parent))) > BoostedQuery(boost(+(CompanyTypeSFD:Private > Company ((CompNameFreeTextS:basf)^0.5 (CompAliasFreeTextS:basf)^0.5) > NationalitySFD:Algeria CompWebS:http://newcompanywebsite) +(+(*:* > -CompanyStatusSFD:***System Delete***) > +type_level:parent),sum(scale(int(PeopleTotalSD),1.0,2.0),scale(int(AssignmentsTotalSD),1.0,2.0 > boost(+(CompanyTypeSFD:Private > Company ((CompNameFreeTextS:basf)^0.5 (CompAliasFreeTextS:basf)^0.5) > NationalitySFD:Algeria CompWebS:http://newcompanywebsite) +(+(*:* > -CompanyStatusSFD:***System Delete***) > +type_level:parent),sum(scale(int(PeopleTotalSD),1.0,2.0),scale(int(AssignmentsTotalSD),1.0,2.0))) > > > *Solr 8* > {!boost > b=sum(scale(PeopleTotalSD,1,2),scale(AssignmentsTotalSD,1,2))}((CompanyTypeSFD:("Private > Company") OR ((CompNameFreeTextS:(BASF ))^0.5 OR (CompAliasFreeTextS:(BASF > ))^0.5) OR NationalitySFD:(Algeria) OR > CompWebS:(http\:\/\/newCompanyWebsite)) AND ((*:* > -CompanyStatusSFD:("\*\*\*System Delete\*\*\*")) AND type_level:(parent))) > {!boost > b=sum(scale(PeopleTotalSD,1,2),scale(AssignmentsTotalSD,1,2))}((CompanyTypeSFD:("Private > Company") OR ((CompNameFreeTextS:(BASF ))^0.5 OR (CompAliasFreeTextS:(BASF > ))^0.5) OR NationalitySFD:(Algeria) OR > CompWebS:(http\:\/\/newCompanyWebsite)) AND ((*:* > -CompanyStatusSFD:("\*\*\*System Delete\*\*\*")) AND type_level:(parent))) > name="parsedquery">FunctionScoreQuery(FunctionScoreQuery(+(CompanyTypeSFD:Private > Company ((CompNameFreeTextS:basf)^0.5 (CompAliasFreeTextS:basf)^0.5) > NationalitySFD:Algeria CompWebS:http://newcompanywebsite) +(+(*:* > -CompanyStatusSFD:***System Delete***) +type_level:parent), scored by > boost(sum(scale(int(PeopleTotalSD),1.0,2.0),scale(int(AssignmentsTotalSD),1.0,2.0) > name="parsedquery_toString">FunctionScoreQuery(+(CompanyTypeSFD:Private > Company ((CompNameFreeTextS:basf)^0.5 (CompAliasFreeTextS:basf)^0.5) > NationalitySFD:Algeria CompWebS:http://newcompanywebsite) +(+(*:* > -CompanyStatusSFD:***System Delete***) +type_level:parent), scored by > boost(sum(scale(int(PeopleTotalSD),1.0,2.0),scale(int(AssignmentsTotalSD),1.0,2.0 > > > On Wed, 6 Apr 2022 at 16:31, Mike Drob wrote: > >> Can you try running with debug=query to see if the two are getting parsed >> differently? >> >> On Wed, Apr 6, 2022 at 8:26 AM Sergio García Maroto >> wrote: >> >> > Forgot to mention. >> > Solr 8 = 5 seconds >> > Solr 6 = 1 second >> > >> > On Wed, 6 Apr 2022 at 14:58, Sergio García Maroto >> > wrote: >> > >> > > Hi, >> > > >> > > I am in the process of upgrading Solr 6.6.1 to Solr 8.10.1. >> > > In general performance it´s almost the same or even a bit better when >> > > running performance load testing. >> > > >> > > There is a particular scenario where I see an important degradation. >> > > That´s when I boost results base on a function. I boost results based >> on >> > > two fields. >> > > If I take the this part out both Solr 8 and Solr 6 are same >> > performance. *{!boost >> > > b=sum(scale(PeopleTotalSD,1,2),scale(AssignmentsTotalSD,1,2))}* >> > > Both servers have identical machines and data. Actually results coming >> > > back are the same number. >> > > >> > > q={!boost >> > > >> > >> b=sum(scale(PeopleTotalSD,1,2),scale(AssignmentsTotalSD,1,2))}((CompanyTypeSFD:("Private >> > > Company") OR ((CompNameFreeTextS:(kaiku))^0.5 OR >> > > (CompAliasFreeTextS:(kaiku))^0.5) OR NationalitySFD:(Algeria) OR >> > > CompWebS:(http\:\/\/newCompanyWebsite)) AND ((*:* >> > > -CompanyStatusSFD:("\*\*\*System Delete\*\*\*")) AND >> > > type_level:(parent)))&start=0&rows=7&fl=CompanyID&sort=score desc >> > > >> > > Any ideas if this is for some reason? >> > > >> > > Regards, >> > > Sergio >> > > >> > >> >
Re: Solr as a dedicated data store?
On 2022-04-07 11:51 PM, Shawn Heisey wrote: ... As I understand it, ES offers reindex capability by storing the entire input document into a field in the index. Which means that the index will be lot bigger than it needs to be, which is going to affect performance. If the field is not indexed, then the performance impact may not be huge, but it will not be zero. And it wouldn't really improve the speed of a full reindex, it just makes it possible to do a reindex without an external data source. The same thing can be done with Solr, and it is something I would definitely say needs to be part of any index design where Solr will be a primary data store. That capability should be available in Solr, but I do not think it should be enabled by default. What would be the advantage over dumping the documents into a text file (xml, json) and doing a full re-import? In principle you could dump everything Solr needs into the file and only check if it's all there during the import; that plus the protocol overhead would be the only downside. And deleting the existing index will take a little extra time. The upside if we can stick the files into git and have versions, it should compress really well, we can clone it to off-site storage etc. etc. Dima
Verifying the replica.type parameter behavior
Hello, I've been trying out the shards.preference=replica.type:PULL as a parameter appended onto queries, as well as trying out including it in the search request handler. For context, we have a collection that is 2 shards, 2 TLOGs per shard, and n number of PULLs (can change depending on if we wish to add more replicas during higher periods of traffic). This is being tested in Solr 8.8.2. In an effort to verify that the queries were being handled by only the PULL replicas, I've been looking at our Solr request logs, expecting to see only pull replica types handling our queries. Yet, I am seeing a number of "replica": "x:collectionname_shardx_replica_tx" included in the request logs, which seems to insinuate that the TLOG replicas are still serving queries. I have two questions: 1.) am I right in assuming that setting the replica.type should be exclusively sending requests to PULL replicas? 2.) If that is true, why would I still be seeing TLOG types on the Solr request logs? Is there some type of routing done behind-the-scenes that is not visible in the request logs? Thank you in advance for any guidance you can provide. [cid:image001.png@01D84B2C.876A3B40] Olivia Crusoe Software Engineer Lead - Search
launched solr 8.10 fails to "get system information"/create core, error: 'Cannot invoke "String.contains(java.lang.CharSequence)" because the return value of "org.apache.http.client.ClientProtocolExce
i'm re-installing an instance of solr 8.10.1 on a dedicated box it's launched, ps ax | grep solr 85664 ?Sl 1:26 /usr/lib/jvm/java-18-openjdk/bin/java -server -Xms512m -Xmx512m -XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=250 -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -Xlog:gc*:file=/var/log/solr/solr_gc.log:time,uptime:filecount=9,filesize=20M -Dsolr.jetty.inetaccess.includes=10.1.1.50, 127.0.0.1, -Dsolr.jetty.inetaccess.excludes= -Dsolr.log.level=DEBUG -Dsolr.log.dir=/var/log/solr -Djetty.port=8984 -DSTOP.PORT=7984 -DSTOP.KEY=solrrocks -Dhost=solr.example.com -Duser.timezone=America/New_York -XX:-OmitStackTraceInFastThrow -XX:OnOutOfMemoryError=/srv/solr/bin/oom_solr.sh 8984 /var/log/solr -Djetty.home=/srv/solr/server -Dsolr.solr.home=/data/solr/data -Dsolr.data.home= -Dsolr.install.dir=/srv/solr/solr -Dsolr.default.confdir=/srv/solr/server/solr/configsets/_default/conf -Dlog4j.configurationFile=/data/solr/log4j2.xml -Djetty.host=solr.example.com -Xss256k -Dsolr.jetty.keystore=/srv/ssl/solr/solr.example.com.server.EC.pfx -Dsolr.jetty.keystore.type=PKCS12 -Dsolr.jetty.truststore=/srv/ssl/solr/solr.example.com.server.EC.pfx -Dsolr.jetty.truststore.type=PKCS12 -Dsolr.jetty.ssl.needClientAuth=false -Dsolr.jetty.ssl.wantClientAuth=false -Djavax.net.ssl.keyStore=/srv/ssl/solr/solr.example.com.server.EC.pfx -Djavax.net.ssl.keyStoreType=PKCS12 -Dsolr.ssl.checkPeerName=false -Djavax.net.ssl.trustStore=/srv/ssl/solr/solr.example.com.server.EC.pfx -Djavax.net.ssl.trustStoreType=PKCS12 -Dsolr.jetty.https.port=8984 -Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory -Dbasicauth=solradm:solrRocks -Dsolr.log.muteconsole -jar start.jar --module=https --lib=/srv/solr/server/solr-webapp/webapp/WEB-INF/lib/* --module=gzip and responds, as usual/expected, in browser @ https://solr.example.com:8984/solr where host solr.example.com solr.example.com has address 10.1.1.50 as well as at shell sudo -u solr /srv/solr/bin/solr version 8.10.1 but, checking status, FAILs, sudo -u solr /srv/solr/bin/solr status Found 1 Solr nodes: Solr process 85664 running on port 8984 ERROR: Failed to get system information from http://localhost:8984/solr due to: java.lang.NullPointerException: Cannot invoke "String.contains(java.lang.CharSequence)" because the return value of "org.apache.http.client.ClientProtocolException.getMessage()" is null it's looking for http://localhost:8984/solr which, @ browser, is non responsive, as in cat /etc/default/solr.in.sh ... SOLR_HOST="solr.example.com" SOLR_OPTS="$SOLR_OPTS -Djetty.host=solr.example.com" ... and checking @ https://solr.example.com:8984/solr/#/ ... -Dhost=solr.example.com ... -Djetty.host=solr.example.com -Djetty.port=8984 ... also, and more problematic, creating a new core fails sudo -u solr /srv/solr/bin/solr create \ -c test \ -p 8984 WARNING: Using _default configset with data driven schema functionality. NOT RECOMMENDED for production use. To turn off: bin/solr config -c dovecot -p 8984 -action set-user-property -property update.autoCreateFields -value false ERROR: Cannot invoke "String.contains(java.lang.CharSequence)" because the return value of "org.apache.http.client.ClientProtocolException.getMessage()" is null java.lang.NullPointerException: Cannot invoke "String.contains(java.lang.CharSequence)" because the return value of "org.apache.http.client.ClientProtocolException.getMessage()" is null at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:761) at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:673) at org.apache.solr.util.SolrCLI$CreateTool.runImpl(SolrCLI.java:2170) at org.apache.solr.util.SolrCLI$ToolBase.runTool(SolrCLI.java:196) at org.apache.solr.util.SolrCLI.main(SolrCLI.java:304) i suspect i need to tell the instance config to "get system information" NOT from http://localhost:8984/solr but instead https://solr.example.com:8984/solr where's that set/configured ? or is there a different cause at work here?
Re: Verifying the replica.type parameter behavior
`shards.preference` only affects the backend routing of requests to individual cores/shards. These backend requests should have an additional `distrib=false` param, and are the requests that are generally the most resource-intensive, in that they do the initial per-shard domain-narrowing. I'm fairly certain that "top-level" requests are logged as being associated with some arbitrary shard (of the associated collection) on whatever node the external request happens to hit. I suspect that the requests you're seeing that appear to be associated with an unexpected shard are top-level requests (without a `distrib=false` param). If so, then `shards.preference` is likely working as intended. I'm curious whether you're able to confirm that all `distrib=false` requests are all indeed associated with PULL replicas? On Fri, Apr 8, 2022 at 2:22 PM Olivia Crusoe wrote: > Hello, > > > > I’ve been trying out the shards.preference=replica.type:PULL as a > parameter appended onto queries, as well as trying out including it in the > search request handler. For context, we have a collection that is 2 shards, > 2 TLOGs per shard, and n number of PULLs (can change depending on if we > wish to add more replicas during higher periods of traffic). This is being > tested in Solr 8.8.2. > > > > In an effort to verify that the queries were being handled by only the > PULL replicas, I’ve been looking at our Solr request logs, expecting to see > only pull replica types handling our queries. Yet, I am seeing a number of > *"replica": > "x:collectionname_shardx_replica_tx"* included in the request logs, which > seems to insinuate that the TLOG replicas are still serving queries. > > > > I have two questions: 1.) am I right in assuming that setting the > replica.type should be exclusively sending requests to PULL replicas? 2.) > If that is true, why would I still be seeing TLOG types on the Solr request > logs? Is there some type of routing done behind-the-scenes that is not > visible in the request logs? > > > > Thank you in advance for any guidance you can provide. > > > > Olivia Crusoe > > Software Engineer Lead – Search > > >
Re: Need help with DIH plugin SOLR
Thanks Dominique...will look into it. On 06/04/2022 22:56, Dominique Bejean wrote: Hi, I suggest to take a look at Apache Nifi ETL in order to replace DIH. It can read and write into Solr, Dominique Le mer. 6 avr. 2022 à 12:44, Jan Høydahl a écrit : Hi, The upcoming 9.0 release does not have DIH. And it is unclear whether the plugin on github will be updated to work with 9.0, if it does, you may of course use it. But the common recommendation is to replace DIH with some other DB indexing tool outside of Solr. Either find some tool that supports Solr OOTB, like Apache ManifoldCF, or perhaps better, roll your own client code reading from DB and pushing documents to Solr. Jan 6. apr. 2022 kl. 11:29 skrev Neha Gupta : Ok Thanks a lot. I have one more question even with SOLR 8.11, the data import handler is coming but yes there is a message on GUI that it is deprecated and will be removed in the future version. Can we use that for indexing relational database as i am able to index a part of my data with the default one which is coming with SOLR 8.11 without any DIH package installed? PS: - Our data is static and won't change in future. Need to re-index only when indexed data in SOLR goes corrupt. Thanks and Regards Neha Gupta On 05/04/2022 15:27, James Greene wrote: Stand alone mode does not use zookeeper, you do not need to upload configs using zkcli.sh. On Tue, Apr 5, 2022, 6:55 AM Neha Gupta wrote: Dear Solr Community, Need your help. I am running SOLR(8.11) as a standalone (on Windows) and want to index from the relational database(Postgres) and as such i tried to install DIH plugin by following the instructions given at: - https://github.com/rohitbemax/dataimporthandler I am stuck at step "Add the configurations and reload the collection" sh zkcli.sh -z localhost:9983 -cmd putfile "/configs/products.AUTOCREATED/data-config.xml" data-config.xml I am getting error : - *Error: Could not find or load main class org.apache.solr.cloud.ZkCLI* Request you to please help me with this. Thanks and Regards Neha Gupta
Re: Need help with DIH plugin SOLR
Hi Dominique, Are there any guides available on using Nifi ETL with Solr? What do you consider to be good references for it? Thanks, Mike On Wed, Apr 6, 2022 at 3:56 PM Dominique Bejean wrote: > Hi, > > I suggest to take a look at Apache Nifi ETL in order to replace DIH. It can > read and write into Solr, > > Dominique > > Le mer. 6 avr. 2022 à 12:44, Jan Høydahl a écrit : > > > Hi, > > > > The upcoming 9.0 release does not have DIH. And it is unclear whether the > > plugin on github will be updated to work with 9.0, if it does, you may of > > course use it. > > But the common recommendation is to replace DIH with some other DB > > indexing tool outside of Solr. Either find some tool that supports Solr > > OOTB, like Apache ManifoldCF, or perhaps better, roll your own client > code > > reading from DB and pushing documents to Solr. > > > > Jan > > > > > 6. apr. 2022 kl. 11:29 skrev Neha Gupta : > > > > > > Ok Thanks a lot. > > > > > > I have one more question even with SOLR 8.11, the data import handler > is > > coming but yes there is a message on GUI that it is deprecated and will > be > > removed in the future version. > > > > > > Can we use that for indexing relational database as i am able to index > a > > part of my data with the default one which is coming with SOLR 8.11 > without > > any DIH package installed? > > > > > > > > > PS: - Our data is static and won't change in future. Need to re-index > > only when indexed data in SOLR goes corrupt. > > > > > > > > > Thanks and Regards > > > > > > Neha Gupta > > > > > > > > > On 05/04/2022 15:27, James Greene wrote: > > >> Stand alone mode does not use zookeeper, you do not need to upload > > configs > > >> using zkcli.sh. > > >> > > >> > > >> > > >> On Tue, Apr 5, 2022, 6:55 AM Neha Gupta > wrote: > > >> > > >>> Dear Solr Community, > > >>> > > >>> Need your help. > > >>> I am running SOLR(8.11) as a standalone (on Windows) and want to > index > > >>> from the relational database(Postgres) and as such i tried to install > > >>> DIH plugin by following the instructions given at: - > > >>> https://github.com/rohitbemax/dataimporthandler > > >>> > > >>> I am stuck at step "Add the configurations and reload the collection" > > >>> > > >>> sh zkcli.sh -z localhost:9983 -cmd putfile > > >>> "/configs/products.AUTOCREATED/data-config.xml" data-config.xml > > >>> > > >>> I am getting error : - > > >>> *Error: Could not find or load main class > org.apache.solr.cloud.ZkCLI* > > >>> > > >>> > > >>> > > >>> Request you to please help me with this. > > >>> > > >>> > > >>> Thanks and Regards > > >>> Neha Gupta > > >>> > > > > >
Re: Need help with DIH plugin SOLR
On 2022-04-08 4:45 PM, Mike Drob wrote: Hi Dominique, Are there any guides available on using Nifi ETL with Solr? What do you consider to be good references for it? ETL is likely an overkill for DIH, about the only reason you'd use it is if you can use a scripting language. Dima
Re: Solr as a dedicated data store?
I think you are speaking to the point that the requirement to have all your data rebuildable from source isn't a hard requirement as their are ways to re-index without having access to the original source (you still need the full docs stored in solr just not indexed). By looking at solr from that pov it becomes more approachable as a primary data store. On Fri, Apr 8, 2022, 1:53 PM dmitri maziuk wrote: > On 2022-04-07 11:51 PM, Shawn Heisey wrote: > ... > > As I understand it, ES offers reindex capability by storing the entire > > input document into a field in the index. Which means that the index > > will be lot bigger than it needs to be, which is going to affect > > performance. If the field is not indexed, then the performance impact > > may not be huge, but it will not be zero. And it wouldn't really > > improve the speed of a full reindex, it just makes it possible to do a > > reindex without an external data source. > > > > The same thing can be done with Solr, and it is something I would > > definitely say needs to be part of any index design where Solr will be a > > primary data store. That capability should be available in Solr, but I > > do not think it should be enabled by default. > > > What would be the advantage over dumping the documents into a text file > (xml, json) and doing a full re-import? In principle you could dump > everything Solr needs into the file and only check if it's all there > during the import; that plus the protocol overhead would be the only > downside. And deleting the existing index will take a little extra time. > > The upside if we can stick the files into git and have versions, it > should compress really well, we can clone it to off-site storage etc. etc. > > Dima >
Re: Solr as a dedicated data store?
As long as your documents are simple in structure. A key value or an array for any given field, you’re good to go. Anything multi level, you’re out of luck. Not sure how relevant this link is still but: https://stackoverflow.com/questions/22192904/is-solr-support-complex-types-like-structure-for-multivalued-fields It’s from 2017 but believe it still holds true however there are possibilities with nested documents https://solr.apache.org/guide/8_1/indexing-nested-documents.html Admittedly I have not gotten too in depth myself with child documents for more complex data structures. And yeah you could just store the complex data structure into a single large text stored and non indexed field as json and only index what you will be searching on. Another option I’ve experimented with is two completely different cores or even completely different solr servers (I use stand alone a lot) use one for searching and use the result to pull the raw data from the other “storage server” by an identifier. This is actually surprisingly fast. It’s a hack, you’re using the wrong tool for the job, but it can be done if you REALLY want to and get creative. Good luck. Curious to hear what you come up with -dave On Fri, Apr 8, 2022 at 8:36 PM James Greene wrote: > I think you are speaking to the point that the requirement to have all your > data rebuildable from source isn't a hard requirement as their are ways to > re-index without having access to the original source (you still need the > full docs stored in solr just not indexed). By looking at solr from that > pov it becomes more approachable as a primary data store. > > On Fri, Apr 8, 2022, 1:53 PM dmitri maziuk > wrote: > > > On 2022-04-07 11:51 PM, Shawn Heisey wrote: > > ... > > > As I understand it, ES offers reindex capability by storing the entire > > > input document into a field in the index. Which means that the index > > > will be lot bigger than it needs to be, which is going to affect > > > performance. If the field is not indexed, then the performance impact > > > may not be huge, but it will not be zero. And it wouldn't really > > > improve the speed of a full reindex, it just makes it possible to do a > > > reindex without an external data source. > > > > > > The same thing can be done with Solr, and it is something I would > > > definitely say needs to be part of any index design where Solr will be > a > > > primary data store. That capability should be available in Solr, but I > > > do not think it should be enabled by default. > > > > > What would be the advantage over dumping the documents into a text file > > (xml, json) and doing a full re-import? In principle you could dump > > everything Solr needs into the file and only check if it's all there > > during the import; that plus the protocol overhead would be the only > > downside. And deleting the existing index will take a little extra time. > > > > The upside if we can stick the files into git and have versions, it > > should compress really well, we can clone it to off-site storage etc. > etc. > > > > Dima > > >
Re: Solr as a dedicated data store?
On 2022-04-08 7:36 PM, James Greene wrote: I think you are speaking to the point that the requirement to have all your data rebuildable from source isn't a hard requirement as their are ways to re-index without having access to the original source (you still need the full docs stored in solr just not indexed). By looking at solr from that pov it becomes more approachable as a primary data store. I may have a different definition of primary data store, one in which it's a store for primary data. Dima