Re: Partial Updates for EnumFields in Solr 8.8.1
Hi, On 2021/03/30 14:59:40, Shawn Heisey wrote: On 3/30/2021 4:07 AM, Daniel Exner wrote: When updating some different field, like say "stock" for a document using partial Update, *all* enum Fields with stored=false vanish from the document. There is no error or Warning about this somewhere in Error log. If you have fields that are not stored, and do not have docValues with the flag to use docValues as stored, you will lose those fields when you do an atomic update. No error is logged when this happens because the behavior isn't an error. It's inherent to the way that the Atomic Update feature works. The documentation states this requirement pretty clearly, though it doesn't mention the need for useDocValuesAsStored. This is known and the fields in question all have the attribute docValues=true. I might have not stated clearly: the exact same mechanism works flawlessly with Solr 8.2. From reading the documentation useDocValuesAsStored is a query parameter to change the behaviour of ommitting the fl parameter or setting it to * for not stored fields and got nothing todo whether those fields are actually in the document or not. https://solr.apache.org/guide/8_8/updating-parts-of-documents.html#field-storage Can you share your full schema? You will need to either paste the whole thing into the email text or put it on an info-sharing site and give us a link to it. If you try attaching a file to an email, the mailing list will most likely remove the attachment before any of us receive the email. Sure. I uploaded the schema to: https://pastebin.com/v6NwWnkx The enumsConfig.xml is here: https://pastebin.com/RZw1xPN6 Testcase we used so far: * fill Index * read mxlang field for some id * try in place update for "oxvarstock" Field for that id * try rereading mxlang field --> nothing Hope someone can make up some sense of that. If not I'm going to open up a bug report. Greetings Daniel Exner OpenPGP_signature Description: OpenPGP digital signature
Possible bug in internal SolR communication when the CertAuthPlugin is active
Hi all, while I was testing out the CertAuthPlugin for the new SolR 9 it came to my attention that various internal HTTP calls in SolR fail. For example when I try to add a BinaryResponseWriter via curl it fails with lots of authentication errors (HTTP status code 401). Other actions (like creating schema fields for collections) via curl work fine. To reproduce the problem, following steps have to be taken (on Linux): - git clone https://github.com/apache/solr.git (I used commit caf8cbc0aa11e32f894a90531e3e9f20edf75efa) - cd solr - ./gradlew assemble - cd solr/packaging/build/solr-9.0.0-SNAPSHOT/ - keytool -genkeypair -alias solr-ssl -keyalg RSA -keysize 2048 -keypass secret -storepass secret -validity -keystore solr-ssl.keystore.p12 -storetype PKCS12 -ext SAN=DNS:localhost,IP:127.0.0.1 -dname "CN=localhost, OU=Organizational Unit, O=Organization, L=Location, ST=State, C=Country" - openssl pkcs12 -in solr-ssl.keystore.p12 -out solr-ssl.keystore.key -nodes -nocerts - openssl pkcs12 -in solr-ssl.keystore.p12 -out solr-ssl.keystore.crt -nodes -nokeys - echo 'SOLR_SSL_ENABLED=true' >> bin/solr.in.sh - echo 'SOLR_SSL_KEY_STORE=../solr-ssl.keystore.p12' >> bin/solr.in.sh - echo 'SOLR_SSL_KEY_STORE_PASSWORD=secret' >> bin/solr.in.sh - echo 'SOLR_SSL_TRUST_STORE=../solr-ssl.keystore.p12' >> bin/solr.in.sh - echo 'SOLR_SSL_TRUST_STORE_PASSWORD=secret' >> bin/solr.in.sh - echo 'SOLR_SSL_NEED_CLIENT_AUTH=true' >> bin/solr.in.sh - echo 'SOLR_SSL_WANT_CLIENT_AUTH=false' >> bin/solr.in.sh - echo 'SOLR_SSL_CHECK_PEER_NAME=false' >> bin/solr.in.sh - echo '{ "authentication": { "class": "org.apache.solr.security.CertAuthPlugin" }, "authorization": { "class": "solr.RuleBasedAuthorizationPlugin", "permissions": [ { "name": "all", "role": [ "admin-role" ] } ], "user-role": { "CN=localhost,OU=Organizational Unit,O=Organization,L=Location,ST=State,C=Country": [ "admin-role"] } } }' > /tmp/security.json - ./bin/solr start -v -c - server/scripts/cloud-scripts/zkcli.sh -z localhost:9983 -cmd clusterprop -name urlScheme -val https - ./bin/solr zk cp file:///tmp/security.json zk:/security.json -z localhost:9983 - ./bin/solr stop - ./bin/solr start -v -c - ./bin/solr create -c testcollection - curl --cacert ./solr-ssl.keystore.crt --key ./solr-ssl.keystore.key --cert ./solr-ssl.keystore.crt "https://localhost:8983/api/collections/testcollection/config"; -H "Content-Type: application/json" --data-binary '{ "add-queryresponsewriter":{ "class":"solr.BinaryResponseWriter", "name":"test" }}' After the last curl command (which takes about 30 seconds) the following error message is printed: { "responseHeader":{ "status":500, "QTime":30017}, "errorMessages":["1 out of 2 the property overlay to be of version 0 within 30 seconds! Failed cores: [https://localhost:8983/solr/testcollection_shard1_replica_n1/]\n";], "WARNING":"This response format is experimental. It is likely to change in the future.", "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException"], "msg":"1 out of 2 the property overlay to be of version 0 within 30 seconds! Failed cores: [https://localhost:8983/solr/testcollection_shard1_replica_n1/]";, "trace":"org.apache.solr.common.SolrException: 1 out of 2 the property overlay to be of version 0 within 30 seconds! Failed cores: [https://localhost:8983/solr/testcollection_shard1_replica_n1/]\n\tat org.apache.solr.handler.SolrConfigHandler.waitForAllReplicasState(SolrConfigHandler.java:829)\n\tat org.apache.solr.handler.SolrConfigHandler$Command.handleCommands(SolrConfigHandler.java:549)\n\tat org.apache.solr.handler.SolrConfigHandler$Command.handlePOST(SolrConfigHandler.java:381)\n\tat org.apache.solr.handler.SolrConfigHandler.handleRequestBody(SolrConfigHandler.java:140 )\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)\n\tat org.apache.solr.api.ApiBag$ReqHandlerToApi.call(ApiBag.java:269)\n\tat org.apache.solr.api.V2HttpCall.execute(V2HttpCall.java:354)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:567)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:518)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:432)\n\tat org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201)\n\tat org.eclipse.jetty.servlet. ServletHandler$Chain.doFilter(ServletHandler.java:1601)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:
NRT Merge Load on NAS SDD (Cloud) Advice
Hi all. I’m looking for some opinions on how to best configure the Merges to run optimally on GCP SSD’s (network attached). For context; we have a 9 node NRT 8.8.1 Solr Cloud cluster, each node has an index which is between 25 and 35gb in size, depending on the current merge state / deleted docs. The index is both heavy write, and heavy read, so we’re always typically merging (which is somewhat fine). Now the SSD’s that we have are 512gb, and on GCP they scale with #cpus and ram amount. The disk we have are therefore rated for: * Sustained read IOPS 15k * Sustained write IOPS 15k * Sustained read throughput 250mb/s * Sustained write throughput 250mb/s Both read and write can be sustained in parallel at the peak. Now what we observe, as you can see from this graph is that we typically have a mean write throughput of 16-20mbs (way below our peak), but we’re also peaking at above 250, which is causing us to get write throttled: [Graphical user interface, chart Description automatically generated] So really what I believe (if possible) we need is a configuration that is less “bursty”, but more sustained over perhaps a longer duration. As they are network attached disk, they suffer from initial iops latency, but sustained throughput is high. I’ve graphed the merge statistics out here, as you can see at any given time we have a maximum of 3 concurrent minor merges running, with the occasional major. P95 on the minor is typically around 2 minutes, but occasionally (correlating with a throttle on the above graphs) we can see a minor merge taking 12->15mins. [Chart Description automatically generated] Our index policy looks like this: 512 10 10 5000 30 10 15 true I feel like I’d be guessing which of these settings may help the scenario I describe above, which is somewhat fine – I can experiment and measure. But the feedback loop is relatively slow so I wanted to lean on others experience/input first. My instinct is to perhaps lower `maxThreadCount`, but seeing as we only ever peak at 3 in progress merges, it feels like I’d have to go low (2, or even 1) which is on par with spindle disks, which these aren’t. Thanks in advance for any help Unless expressly stated otherwise in this email, this e-mail is sent on behalf of Auto Trader Limited Registered Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 03909628). Auto Trader Limited is part of the Auto Trader Group Plc group. This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
Re: Unified Highlighter and Fuzzy Searches
I tried both Standard and DisMax query parsers and the issue is easily reproducible. And forgot to mention earlier, I am trying this on Solr 8.6.3 Just to add more clarity this is what I am doing: I have say a field called File_Content_Field with the following values indexed and stored: "runnings, running, run, runs" My query is something like this: q=File_Content_Field:runnings~0&hl.fl=File_Content_Field&hl=on With original highlighter, I see the following response: "highlighting": {"document_id": {"File_Content_Field": ["\n \nTest Dataset 1 Running and Runnings and Runs and R\n \n "] With unified highlighter, no highlighting is returned: q=File_Content_Field:runnings~0&hl.fl=File_Content_Field&hl=on&hl.method=unified "highlighting": {"document_id": {"File_Content_Field": []} However, runnings~1 works as expected (highlights both running and runnings) q=File_Content_Field:runnings~0&hl.fl=File_Content_Field&hl=on&hl.method=unified "highlighting": { "document_id": {"File_Content_Field": [ "\n \nTest Dataset 1 Running and Runnings and Runs and R\n \n ] } On Thu, Apr 1, 2021 at 12:32 AM David Smiley wrote: > I tried this in tests both at the Lucene layer and Solr layer and I'm not > seeing the failure to highlight for the UH. What query parser are you > using? > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Wed, Mar 31, 2021 at 11:39 AM seez wrote: > > > Hello, > > > > I have the following fuzzy search criteria: > > > > runnings~0 > > > > Search itself returns expected results and I see documents that have the > > exact term "runnings". However the same query criteria is not honored by > > unified highlighter. It gives back no matching results. Although > > "runnings~1" works (with the added caveat of also honoring the "1" edit > > distance). > > > > So it appears unified highlighter only supports edit distance > 0 for > fuzzy > > searches. And this is not an issue with original or fastVector > > highlighters. > > Is this a real problem or am I missing something? > > > > > > > > > > > > -- > > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html > > >
Re: Possible bug in internal SolR communication when the CertAuthPlugin is active
Hello Dominik, The mailing list strips attachments, so we’re not able to see your Admin UI errors. If you can create a jira issue to track this, that would be great. I don’t remember testing adding a response writer when working in the plugin, so it’s very possible that there is a bug. If possible to get the reproduction in a unit test that works be even more helpful, but by no means required. Thanks, Mike On Thu, Apr 1, 2021 at 5:58 AM Dresel, Dominik wrote: > Hi all, > > > > while I was testing out the CertAuthPlugin for the new SolR 9 it came to > my attention that various internal HTTP calls in SolR fail. For example > when I try to add a BinaryResponseWriter via curl it fails with lots of > authentication errors (HTTP status code 401). Other actions (like creating > schema fields for collections) via curl work fine. > > > > To reproduce the problem, following steps have to be taken (on Linux): > > - git clone https://github.com/apache/solr.git (I used commit > caf8cbc0aa11e32f894a90531e3e9f20edf75efa) > > - cd solr > > - ./gradlew assemble > > - cd solr/packaging/build/solr-9.0.0-SNAPSHOT/ > > - keytool -genkeypair -alias solr-ssl -keyalg RSA -keysize 2048 -keypass > secret -storepass secret -validity -keystore solr-ssl.keystore.p12 > -storetype PKCS12 -ext SAN=DNS:localhost,IP:127.0.0.1 -dname "CN=localhost, > OU=Organizational Unit, O=Organization, L=Location, ST=State, C=Country" > > - openssl pkcs12 -in solr-ssl.keystore.p12 -out solr-ssl.keystore.key > -nodes -nocerts > > - openssl pkcs12 -in solr-ssl.keystore.p12 -out solr-ssl.keystore.crt > -nodes -nokeys > > - echo 'SOLR_SSL_ENABLED=true' >> bin/solr.in.sh > > - echo 'SOLR_SSL_KEY_STORE=../solr-ssl.keystore.p12' >> bin/solr.in.sh > > - echo 'SOLR_SSL_KEY_STORE_PASSWORD=secret' >> bin/solr.in.sh > > - echo 'SOLR_SSL_TRUST_STORE=../solr-ssl.keystore.p12' >> bin/solr.in.sh > > - echo 'SOLR_SSL_TRUST_STORE_PASSWORD=secret' >> bin/solr.in.sh > > - echo 'SOLR_SSL_NEED_CLIENT_AUTH=true' >> bin/solr.in.sh > > - echo 'SOLR_SSL_WANT_CLIENT_AUTH=false' >> bin/solr.in.sh > > - echo 'SOLR_SSL_CHECK_PEER_NAME=false' >> bin/solr.in.sh > > - echo '{ "authentication": { "class": > "org.apache.solr.security.CertAuthPlugin" }, "authorization": { "class": > "solr.RuleBasedAuthorizationPlugin", "permissions": [ { "name": "all", > "role": [ "admin-role" ] } ], "user-role": { > "CN=localhost,OU=Organizational > Unit,O=Organization,L=Location,ST=State,C=Country": [ "admin-role"] } } }' > > /tmp/security.json > > - ./bin/solr start -v -c > > - server/scripts/cloud-scripts/zkcli.sh -z localhost:9983 -cmd clusterprop > -name urlScheme -val https > > - ./bin/solr zk cp file:///tmp/security.json zk:/security.json -z > localhost:9983 > > - ./bin/solr stop > > - ./bin/solr start -v -c > > - ./bin/solr create -c testcollection > > - curl --cacert ./solr-ssl.keystore.crt --key ./solr-ssl.keystore.key > --cert ./solr-ssl.keystore.crt " > https://localhost:8983/api/collections/testcollection/config"; -H > "Content-Type: application/json" --data-binary '{ > "add-queryresponsewriter":{ "class":"solr.BinaryResponseWriter", > "name":"test" }}' > > > > After the last curl command (which takes about 30 seconds) the following > error message is printed: > > > > { > > "responseHeader":{ > > "status":500, > > "QTime":30017}, > > "errorMessages":["1 out of 2 the property overlay to be of version 0 > within 30 seconds! Failed cores: [ > https://localhost:8983/solr/testcollection_shard1_replica_n1/]\n";], > > "WARNING":"This response format is experimental. It is likely to change > in the future.", > > "error":{ > > "metadata":[ > > "error-class","org.apache.solr.common.SolrException", > > "root-error-class","org.apache.solr.common.SolrException"], > > "msg":"1 out of 2 the property overlay to be of version 0 within 30 > seconds! Failed cores: [ > https://localhost:8983/solr/testcollection_shard1_replica_n1/]";, > > "trace":"org.apache.solr.common.SolrException: 1 out of 2 the property > overlay to be of version 0 within 30 seconds! Failed cores: [ > https://localhost:8983/solr/testcollection_shard1_replica_n1/]\n\tat > org.apache.solr.handler.SolrConfigHandler.waitForAllReplicasState(SolrConfigHandler.java:829)\n\tat > org.apache.solr.handler.SolrConfigHandler$Command.handleCommands(SolrConfigHandler.java:549)\n\tat > org.apache.solr.handler.SolrConfigHandler$Command.handlePOST(SolrConfigHandler.java:381)\n\tat > org.apache.solr.handler.SolrConfigHandler.handleRequestBody(SolrConfigHandler.java:140 > )\n\tat > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)\n\tat > org.apache.solr.api.ApiBag$ReqHandlerToApi.call(ApiBag.java:269)\n\tat > org.apache.solr.api.V2HttpCall.execute(V2HttpCall.java:354)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:567)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:518)\n\tat > org.apache.solr.servlet.SolrDispatchF
Re: Missing one record in Streaming gather nodes expression
Hi Joel, It's a simple use case, i have two collections products and related products, product_id_l is a unique key in products collection and reference field in related products collection, please find sample documents below. product: { "id":"12345", "id_l":12345, "product_name":"product1", "related_product_count_l":2 } related products: { "id":"56789", "id_l":56789, "main_product_id_l":12345, "product_name":"product10" "relation_type_code_i":4 }, { "id":"98765", "id_l":98765, "main_product_id_l":12345, "product_name":"product11" "relation_type_code_i":2 } gatherNodes(relatedproduct, search(product,q="related_product_count_l:[1 TO *]", fl="id_l", qt="/export", sort="id_l desc"), walk=id_l-> main_product_id_l, gather="related_product_id_l", fq="relation_type_code_i:4",scatter="leaves, branches") I am trying to fetch all products which are having more than one related products then joining with related products collection to get all related products then filtering relation type 4 ones. Please let me know if you need any other details. Thanks in advance !! Best Regards, Deepu On Tue, Mar 30, 2021 at 1:54 PM Joel Bernstein wrote: > You're going to have to provide more detail here. I'll have to be able to > reproduce the error to fix it. > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Tue, Mar 30, 2021 at 2:08 PM Deepu wrote: > > > Hi All, > > > > Any update on this issue in streaming expressions, which is blocking our > > solr migration ? > > > > Thanks in advance !! > > > > Regards, > > Deepu > > > > > > > > On Mon, Mar 29, 2021 at 2:12 PM Deepu wrote: > > > > > Hi All, > > > > > > We are using solr streaming expressions to join to collections, we have > > > product and similar product indexes, joining these two collections > based > > on > > > product id like below. > > > > > > gatherNodes(relatedproduct, > search(product,q="related_product_count_l:[1 > > > TO *]", fl="id_l", qt="/export", sort="id_l desc"), walk=id_l-> > > > main_product_id_l, gather="related_product_id_l", > > > fq="relation_type_code_i:4",scatter="leaves, branches") > > > > > > Observed always one record is missing from outter gatherNodes function. > > > > > > Solr version : 8.4.1 > > > > > > Please let me know if I am missing anything or is it a known issue. > > > > > > Best Regards, > > > Deepu > > > > > >
Re: Missing one record in Streaming gather nodes expression
Hi Joel, It's a simple use case, i have two collections products and related products, product_id_l is a unique key in products collection and reference field in related products collection, please find sample documents below. product: { "id":"12345", "id_l":12345, "product_name":"product1", "related_product_count_l":2 } related products: { "id":"56789", "id_l":56789, "main_product_id_l":12345, "product_name":"product10" "relation_type_code_i":4 }, { "id":"98765", "id_l":98765, "main_product_id_l":12345, "product_name":"product11" "relation_type_code_i":2 } gatherNodes(relatedproduct, search(product,q="related_product_count_l:[1 TO *]", fl="id_l", qt="/export", sort="id_l desc"), walk=id_l-> main_product_id_l, gather="related_product_id_l", fq="relation_type_code_i:4",scatter="leaves, branches") I am trying to fetch all products which are having more than one related products then joining with related products collection to get all related products then filtering relation type 4 ones. Please let me know if you need any other details. Thanks in advance !! Best Regards, Deepu On Tue, Mar 30, 2021 at 1:54 PM Joel Bernstein wrote: > You're going to have to provide more detail here. I'll have to be able to > reproduce the error to fix it. > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Tue, Mar 30, 2021 at 2:08 PM Deepu wrote: > > > Hi All, > > > > Any update on this issue in streaming expressions, which is blocking our > > solr migration ? > > > > Thanks in advance !! > > > > Regards, > > Deepu > > > > > > > > On Mon, Mar 29, 2021 at 2:12 PM Deepu wrote: > > > > > Hi All, > > > > > > We are using solr streaming expressions to join to collections, we have > > > product and similar product indexes, joining these two collections > based > > on > > > product id like below. > > > > > > gatherNodes(relatedproduct, > search(product,q="related_product_count_l:[1 > > > TO *]", fl="id_l", qt="/export", sort="id_l desc"), walk=id_l-> > > > main_product_id_l, gather="related_product_id_l", > > > fq="relation_type_code_i:4",scatter="leaves, branches") > > > > > > Observed always one record is missing from outter gatherNodes function. > > > > > > Solr version : 8.4.1 > > > > > > Please let me know if I am missing anything or is it a known issue. > > > > > > Best Regards, > > > Deepu > > > > > >
Re: Partial Updates for EnumFields in Solr 8.8.1
On 4/1/2021 2:05 AM, Daniel Exner wrote: Sure. I uploaded the schema to: https://pastebin.com/v6NwWnkx Your schema version is 1.5. This means that useDocValuesAsStored is false. See if anything changes if you update the schema version to 1.6, at which point useDocValuesAsStored will default to true. Thanks, Shawn
Re: Unable to force lucena spinning disk detection
: Lucene is detecting our SSDs as Spinning disks, as seen through the admin metrics endpoint: : : : * CONTAINER.fs.coreRoot.spins: true : * CONTAINER.fs.spins: true the spins metric is always going to return whatever the Lucene IOUtils think the disk is -- there is no setting you can use to override that and make the metric "lie" (from it's perspective) about what lucnee thinks *BUT* ... what lucene thinks only matters in terms of the *DEFAULT* behavior when configuring ConcurrentMergeScheduler... : However we can see in the `system.properties` endpoint we are overriding it: : : : * lucene.cms.override_spins: "false" ...with that property set, and use ConcurrentMergeScheduler, then it's default behavior will be to pick "MaxMergesAndThreads" based on hueristics accordingly. If you don't use ConcurrentMergeScheduler, or have explicitly configured the MaxMergesAndThreads, then that system property won't matter. (but either way, the *.spins metrics are going to tell you what lucene things the disk is) FWIW: Instead of focusing on setting lucene.cms.override_spins to override the spins/ssd hueristics, you may want to just focus on explicitly setting the max merges and threads explicitly since that gives you more direct control ... in 9.x the spins hueristic will go away entirely and the same defaults are used entirely regardless. -Hoss http://www.lucidworks.com/