Re: [PR] Cleanup JDK version related logic in scripts [solr]
epugh commented on code in PR #2792: URL: https://github.com/apache/solr/pull/2792#discussion_r1815533533 ## solr/bin/solr: ## @@ -1054,31 +1054,13 @@ fi # Establish default GC logging opts if no env var set (otherwise init to sensible default) if [ -z "${GC_LOG_OPTS}" ]; then - if [[ "$JAVA_VER_NUM" -lt "9" ]] ; then -GC_LOG_OPTS=('-verbose:gc' '-XX:+PrintHeapAtGC' '-XX:+PrintGCDetails' \ - '-XX:+PrintGCDateStamps' '-XX:+PrintGCTimeStamps' '-XX:+PrintTenuringDistribution' \ - '-XX:+PrintGCApplicationStoppedTime') - else -GC_LOG_OPTS=('-Xlog:gc*') - fi -else - # TODO: Should probably not overload GC_LOG_OPTS as both string and array, but leaving it be for now - # shellcheck disable=SC2128 - GC_LOG_OPTS=($GC_LOG_OPTS) + GC_LOG_OPTS=('-Xlog:gc*') fi # if verbose gc logging enabled, setup the location of the log file and rotation if [ "${#GC_LOG_OPTS[@]}" -gt 0 ]; then - if [[ "$JAVA_VER_NUM" -lt "9" ]] || [ "$JAVA_VENDOR" == "OpenJ9" ]; then -gc_log_flag="-Xloggc" -if [ "$JAVA_VENDOR" == "OpenJ9" ]; then - gc_log_flag="-Xverbosegclog" -fi -if [ -z ${JAVA8_GC_LOG_FILE_OPTS+x} ]; then Review Comment: what does line 1077 mean? Is it possible that it saying that if this variable "JAVA8_GC_LOG_FILE_OPTS" isn't set then we do the the GC_LOG_OPTS? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-17161) Separate out a solrj-jetty artifact (10.0)
[ https://issues.apache.org/jira/browse/SOLR-17161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892565#comment-17892565 ] Jason Gerlowski commented on SOLR-17161: I guess I have a potential concern about moving the Jetty based clients into their own artifact. Sorry to bring it to the table so late. (To be clear - it's a "concern" and "request for info", and not a veto or anything like that.) In short: it goes without saying how important defaults are in software, and making the JDK-based client the only client available in 'solrj-core' will make it the "effective default" for a lot of folks. Most users will start by just grabbing 'solrj-core', and then their IDE's autocomplete will suggest HttpJdkSolrClient (and only HttpJdkSolrClient). That's a big deal! And I worry about making that sort of change without a discussion about whether it's our best option, holistically. The JDK-based client is a clear winner on some concerns, e.g. dependency footprint. But that's not the only concern users are likely to have: is there any difference in the perf characteristics of the two underlying HttpClients? what sort of hooks does each offer into the request/response or client lifecycle? do the clients differ in how much they let users customize threading or connection-pooling behavior? is there a popularity gap, or are folks already pretty familiar with one HttpClient in particular? any logging or tracing differences? Has this sort of holistic discussion happened somewhere that I just missed? If not, maybe we could have that here? > Separate out a solrj-jetty artifact (10.0) > -- > > Key: SOLR-17161 > URL: https://issues.apache.org/jira/browse/SOLR-17161 > Project: Solr > Issue Type: Sub-task > Components: clients - java >Reporter: Jan Høydahl >Priority: Blocker > Fix For: main (10.0) > > > Given we have a native JDK based client in SOLR-599, we can separate out all > {{Http2SolrClient}} and freiends with their jetty-client dependencies into a > separate artifact {{{}solrj-jetty{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17504: CoreContainer calls UpdateHandler.commit. [solr]
dsmiley commented on code in PR #2786: URL: https://github.com/apache/solr/pull/2786#discussion_r1815544031 ## solr/core/src/java/org/apache/solr/core/CoreContainer.java: ## @@ -2061,13 +2066,16 @@ public void reload(String name, UUID coreId) { RefCounted iwRef = core.getSolrCoreState().getIndexWriter(null); if (iwRef != null) { IndexWriter iw = iwRef.get(); -// switch old core to readOnly -core.readOnly = true; Review Comment: as an aside, I don't like that CoreContainer is doing SolrCore internal manipulations... like this should be a method on SolrCore like core.commit() -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-17497) Pull replicas throws AlreadyClosedException
[ https://issues.apache.org/jira/browse/SOLR-17497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892684#comment-17892684 ] Sanjay Dutt commented on SOLR-17497: {code:java} @Test public void test(){ ExecutorService fsyncService = ExecutorUtil.newMDCAwareSingleThreadExecutor(new SolrNamedThreadFactory("fsyncService")); try { fsyncService.submit(() -> { throw new AlreadyClosedException("Directory is already closed!"); }); } catch (Exception e) { System.out.println(e); } finally { fsyncService.shutdown(); } }{code} In [https://github.com/apache/solr/pull/2707], we have basically replaced ExecutorService#submit with ExecutorService#execute, and now execute throws exception rather than suppressing it. Same can be tested with the above example where running it won't fail, on the other hand If you use execute it will fail immediately. > Pull replicas throws AlreadyClosedException > - > > Key: SOLR-17497 > URL: https://issues.apache.org/jira/browse/SOLR-17497 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Sanjay Dutt >Priority: Major > Attachments: Screenshot 2024-10-23 at 6.01.02 PM.png > > > Recently, a common exception (org.apache.lucene.store.AlreadyClosedException: > this Directory is closed) seen in multiple failed test cases. > FAILED: org.apache.solr.cloud.TestPullReplica.testKillPullReplica > FAILED: > org.apache.solr.cloud.SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull > FAILED: org.apache.solr.cloud.TestPullReplica.testAddDocs > > > {code:java} > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=10271, > name=fsyncService-6341-thread-1, state=RUNNABLE, > group=TGRP-SplitShardWithNodeRoleTest] > at > __randomizedtesting.SeedInfo.seed([3F7DACB3BC44C3C4:E5DB3E97188A8EB9]:0) > Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is > closed > at __randomizedtesting.SeedInfo.seed([3F7DACB3BC44C3C4]:0) > at > app//org.apache.lucene.store.BaseDirectory.ensureOpen(BaseDirectory.java:50) > at > app//org.apache.lucene.store.ByteBuffersDirectory.sync(ByteBuffersDirectory.java:237) > at > app//org.apache.lucene.tests.store.MockDirectoryWrapper.sync(MockDirectoryWrapper.java:214) > at > app//org.apache.solr.handler.IndexFetcher$DirectoryFile.sync(IndexFetcher.java:2034) > at > app//org.apache.solr.handler.IndexFetcher$FileFetcher.lambda$fetch$0(IndexFetcher.java:1803) > at > app//org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$1(ExecutorUtil.java:449) > at > java.base@11.0.24/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base@11.0.24/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base@11.0.24/java.lang.Thread.run(Thread.java:829) > {code} > > Interesting thing about these test cases is that they all share same kind of > setup where each has one shard and two replicas – one NRT and another is PULL. > > Going through one of the test case execution step. > FAILED: org.apache.solr.cloud.TestPullReplica.testKillPullReplica > > Test flow > 1. Create a collection with 1 NRT and 1 PULL replica > 2. waitForState > 3. waitForNumDocsInAllActiveReplicas(0); // *Name says it all* > 4. Index another document. > 5. waitForNumDocsInAllActiveReplicas(1); > 6. Stop Pull replica > 7. Index another document > 8. waitForNumDocsInAllActiveReplicas(2); > 9. Start Pull Replica > 10. waitForState > 11. waitForNumDocsInAllActiveReplicas(2); > > As per the logs the whole sequence executed successfully. Here is the link to > the logs: > [https://ge.apache.org/s/yxydiox3gvlf2/tests/task/:solr:core:test/details/org.apache.solr.cloud.TestPullReplica/testKillPullReplica/1/output] > (link may stop working in the future) > > Last step where they are making sure that all the active replicas should have > two documents each has logged a info which is another proof that it completed > successfully. > > {code:java} > 616575 INFO > (TEST-TestPullReplica.testKillPullReplica-seed#[F30CC837FDD0DC28]) [n: c: s: > r: x: t:] o.a.s.c.TestPullReplica Replica core_node3 > (https://127.0.0.1:35647/solr/pull_replica_test_kill_pull_replica_shard1_replica_n1/) > has all 2 docs 616606 INFO (qtp1091538342-13057-null-11348) > [n:127.0.0.1:38207_solr c:pull_replica_test_kill_pull_replica s:shard1 > r:core_node4 x:pull_replica_test_kill_pull_replica_shard1_replica_p2 > t:null-11348] o.a.s.c.S.Request webapp=/solr path=/select > params={q=*:*&wt=javabin&version=2} rid=null-11348 hits=2 status=0 QTime=
[jira] [Updated] (SOLR-17515) Recovery fails in Solr 9.7.0 if basic-auth is enabled
[ https://issues.apache.org/jira/browse/SOLR-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-17515: --- Description: Several reporters on the users@ list, recently shared a bug they noticed on upgrading to Solr 9.7. Replicas would try to recover, but fail with a NullPointerException: {code} 2024-09-18 09:36:31.238 ERROR (recoveryExecutor-12-thread-1-processing-fts06.host.internal:8983_solr dovecot_fts_shard5_replica_n61 dovecot_fts shard5 core_node62) [c:dovecot_fts s:shard5 r:core_node62 x:dovecot_fts_shard5_replica_n61 t:] o.a.s.c.RecoveryStrategy Error while trying to recover. core=dovecot_fts_shard5_replica_n61 => java.lang.NullPointerException: Cannot invoke "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)" because "this.authenticationStore" is null at org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318) java.lang.NullPointerException: Cannot invoke "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)" because "this.authenticationStore" is null at org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318) ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:97) ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:85) ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.httpClientBuilderSetup(Http2SolrClient.java:1093) ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.build(Http2SolrClient.java:1062) ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:907) ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:633) ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:333) ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:309) ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] ... {code} It turns out that the issue isn't specific to upgrading clusters: *any 9.7.0 cluster (new or existing/upgrading) that uses basic-auth will hit this NPE on during replica recovery*. The result is that replicas will fail to recover, and sit marked as "recovering" indefinitely. The issue can be reproduced locally in a source-checkout using the following steps: {code} git checkout branch_9_7 ./gradlew clean assemble cd solr/packaging/build/solr-9.7.0-SNAPSHOT # At prompts, I chose: 4 nodes, "gettingstarted", 1 shard, 2 replicas, "_default" configset bin/solr start -e cloud bin/solr post -c gettingstarted example/exampledocs/books.json # Stop the node containing the non-leader replica bin/solr stop -p bin/solr post -c gettingstarted example/exampledocs/books.csv # Enable auth and trigger recovery by turning the node back on bin/solr auth enable -type basicAuth -credentials solr:solrRocks -blockUnknown true # This line will need tweaked based on which Solr node was previously stopped "bin/solr" start --cloud -p -s "example/cloud//solr" -z 127.0.0.1:9983 {code} was: Several reporters on the users@ list, recently shared a bug they noticed on upgrading to Solr 9.7. Replicas would try to recover, but fail with a NullPointerException: {code} 2024-09-18 09:36:31.238 ERROR (recoveryExecutor-12-thread-1-processing-fts06.host.internal:8983_solr dovecot_fts_shard5_replica_n61 dovecot_fts shard5 core_node62) [c:dovecot_fts s:shard5 r:core_node62 x:dovecot_fts_shard5_replica_n61 t:] o.a.s.c.RecoveryStrategy Error while trying to recover. core=dovecot_fts_shard5_replica_n61 => java.lang.NullPointerException: Cannot invoke "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.
[jira] [Commented] (SOLR-17515) Recovery fails in Solr 9.7.0 if basic-auth is enabled
[ https://issues.apache.org/jira/browse/SOLR-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892831#comment-17892831 ] Jason Gerlowski commented on SOLR-17515: [~sanjaydutt] pointed me at the likely culprit: At various points, the RecoveryStrategy code bootstraps a new Http2SolrClient based on an existing one. But this bootstrapping overlooks the 'authenticationStore' object from the existing client, which results in a NPE when code later on expects it to be set. The place to fix this is _probably_ in the "withHttpClient" builder method used by RecoveryStrategy (see the calling snippet below): {code:title=RecoveryStrategy#recoverySolrClientBuilder} private Http2SolrClient.Builder recoverySolrClientBuilder(String baseUrl, String leaderCoreName) { final UpdateShardHandlerConfig cfg = cc.getConfig().getUpdateShardHandlerConfig(); return new Http2SolrClient.Builder(baseUrl) .withDefaultCollection(leaderCoreName) .withHttpClient(cc.getUpdateShardHandler().getRecoveryOnlyHttpClient()); } {code} > Recovery fails in Solr 9.7.0 if basic-auth is enabled > - > > Key: SOLR-17515 > URL: https://issues.apache.org/jira/browse/SOLR-17515 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 9.7 >Reporter: Jason Gerlowski >Priority: Major > > Several reporters on the users@ list, recently shared a bug they noticed on > upgrading to Solr 9.7. Replicas would try to recover, but fail with a > NullPointerException: > {code} > 2024-09-18 09:36:31.238 ERROR > (recoveryExecutor-12-thread-1-processing-fts06.host.internal:8983_solr > dovecot_fts_shard5_replica_n61 dovecot_fts shard5 core_node62) [c:dovecot_fts > s:shard5 r:core_node62 x:dovecot_fts_shard5_replica_n61 t:] > o.a.s.c.RecoveryStrategy Error while trying to recover. > core=dovecot_fts_shard5_replica_n61 => java.lang.NullPointerException: Cannot > invoke > "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)" > because "this.authenticationStore" is null > at > org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318) > java.lang.NullPointerException: Cannot invoke > "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)" > because "this.authenticationStore" is null > at > org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:97) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:85) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.httpClientBuilderSetup(Http2SolrClient.java:1093) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.build(Http2SolrClient.java:1062) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:907) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:633) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:333) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum > - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:309) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum > - 2024-09-03 15:05:20] > ... > {code} > It turns out that the issue isn't specific to upgrading clusters: *any 9.7.0 > cluster (new or existing/upgrading) that uses basic-auth will hit this NPE on > during replica recovery*. The result is that replicas will fail to recover, > and sit marked as "recovering" indefinitely. > T
[PR] SOLR-10654: Prometheus regex cloud pattern fix for core names [solr]
mlbiscoc opened a new pull request, #2795: URL: https://github.com/apache/solr/pull/2795 https://issues.apache.org/jira/browse/SOLR-10654 # Description The regex pattern for Solr cloud mode assumed all core names ended with a `replica_n[0-9]+` which is incorrect. Some core names should be able to have any single character letter before the numbers. # Solution Change regex pattern to with `.` instead of `n` to match any single character # Tests `testCloudCorePattern` and `testBadCloudCorePattern` to test the regex cloud pattern. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/solr/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [x] I have added tests for my changes. - [x] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] Bump up Java version to 21 [solr]
iamsanjay merged PR #2682: URL: https://github.com/apache/solr/pull/2682 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-17511) CLI: Resole -i conflicts (async-id, cluster-id)
[ https://issues.apache.org/jira/browse/SOLR-17511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892748#comment-17892748 ] Eric Pugh commented on SOLR-17511: -- think instead of ExportTool you mean SolrExporter ;) > CLI: Resole -i conflicts (async-id, cluster-id) > --- > > Key: SOLR-17511 > URL: https://issues.apache.org/jira/browse/SOLR-17511 > Project: Solr > Issue Type: Sub-task > Components: cli >Affects Versions: 9.7, 9.6.1 >Reporter: Christos Malliaridis >Priority: Minor > Labels: cli > > The CLI flag {{\-i}} is currently used in two options: > - for {{async-id}} in SnapshotExportTool for specifying an asynchronous > request identifier > - for {{cluster-id}} in ExportTool for specifying a unique cluster identifier > Since both short options are not obvious and the letter {{i}} may be used in > another context in the future, we should reserve it and deprecate (9.8) / > remove (10.0) it from the above options. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Updated] (SOLR-17511) CLI: Resole -i conflicts (async-id, cluster-id)
[ https://issues.apache.org/jira/browse/SOLR-17511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christos Malliaridis updated SOLR-17511: Description: The CLI flag {{\-i}} is currently used in two options: - for {{async-id}} in SnapshotExportTool for specifying an asynchronous request identifier - for {{cluster-id}} in SolrExporter for specifying a unique cluster identifier Since both short options are not obvious and the letter {{i}} may be used in another context in the future, we should reserve it and deprecate (9.8) / remove (10.0) it from the above options. was: The CLI flag {{\-i}} is currently used in two options: - for {{async-id}} in SnapshotExportTool for specifying an asynchronous request identifier - for {{cluster-id}} in ExportTool for specifying a unique cluster identifier Since both short options are not obvious and the letter {{i}} may be used in another context in the future, we should reserve it and deprecate (9.8) / remove (10.0) it from the above options. > CLI: Resole -i conflicts (async-id, cluster-id) > --- > > Key: SOLR-17511 > URL: https://issues.apache.org/jira/browse/SOLR-17511 > Project: Solr > Issue Type: Sub-task > Components: cli >Affects Versions: 9.7, 9.6.1 >Reporter: Christos Malliaridis >Priority: Minor > Labels: cli > > The CLI flag {{\-i}} is currently used in two options: > - for {{async-id}} in SnapshotExportTool for specifying an asynchronous > request identifier > - for {{cluster-id}} in SolrExporter for specifying a unique cluster > identifier > Since both short options are not obvious and the letter {{i}} may be used in > another context in the future, we should reserve it and deprecate (9.8) / > remove (10.0) it from the above options. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17511: Deprecate -i CLI usages [solr]
epugh commented on PR #2794: URL: https://github.com/apache/solr/pull/2794#issuecomment-2437644101 Thanks for the review! I think if the tests pass this is ready for merging! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17511: Deprecate -i CLI usages [solr]
malliaridis commented on code in PR #2794: URL: https://github.com/apache/solr/pull/2794#discussion_r1816622405 ## solr/prometheus-exporter/src/java/org/apache/solr/prometheus/scraper/SolrScraper.java: ## @@ -184,7 +184,7 @@ protected MetricSamples request(SolrClient client, MetricsQuery query) throws IO labelValues.add(zkHostLabelValue); } - // Add the unique cluster ID, either as specified on cmdline -i or baseUrl/zkHost + // Add the unique cluster ID, either as specified on cmdline --cluster-id or baseUrl/zkHost Review Comment: ```suggestion // Add the unique cluster ID, either as specified on cmdline --cluster-id or // baseUrl/zkHost ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Comment Edited] (SOLR-17497) Pull replicas throws AlreadyClosedException
[ https://issues.apache.org/jira/browse/SOLR-17497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892798#comment-17892798 ] Sanjay Dutt edited comment on SOLR-17497 at 10/25/24 1:29 PM: -- Yeah you are right, I have to look into this subject (execute vs submit) more and how this whole things works. +1 to this. {quote}For the case here in IndexFetcher, as long as the exception that is thrown is logged, I think we should suppress its propagation further. {quote} I was also looking why we are getting "User aborted replication" messages. RecoveryStrategy in case of PULL replicas cancel the replication. Here is the explanation from the old JIRA. https://issues.apache.org/jira/browse/SOLR-10233 {quote} h3. Passive replica dies (or is unreachable) Replica won’t be query-able. On restart, replica will recover from the leader, following the same flow as _realtime_ replicas: set state to DOWN, then RECOVERING, and finally ACTIVE. _Passive_ replicas will use a different {{RecoveryStrategy}} implementation, that omits *preparerecovery,* and peer sync attempt, it will jump to replication . If the leader didn't change, or if the other replicas are of type “append”, replication should be incremental. Once the first replication is done, passive replica will declare itself active and start serving traffic. {quote} *RecoveryStrategy.java* {noformat} log.info("Stopping background replicate from leader process"); zkController.stopReplicationFromLeader(coreName); replicate(zkController.getNodeName(), core, leaderprops);{noformat} My own theory: 1. RecoveryStrategy cancel replication. 2. FileFetcher#fetchPackets throws ReplicationHandlerException {code:java} if (stop) { stop = false; aborted = true; throw new ReplicationHandlerException("User aborted replication"); }{code} 3. FileFetcher#fetch runs finally block where the sync is executed in async {code:java} fsyncService.submit(() -> { try { file.sync(); } catch (IOException e) { fsyncException = e; } catch (InterruptedException e) { throw new RuntimeException(e); } });{code} 4. At the same time the control gets back to fetchLatestIndex that performs cleanup and closed the directory {code:java} finally { if (!cleanupDone) { cleanup(solrCore, tmpIndexDir, indexDir, deleteTmpIdxDir, tmpTlogDir, successfulInstall); } }{code} And basically there is race condition between step 3 and 4 that's what I believe. Not able to reproduce on my system yet. was (Author: JIRAUSER305513): Yeah you are right, I have to look into this subject (execute vs submit) more and how this whole things works. +1 to this. {quote}For the case here in IndexFetcher, as long as the exception that is thrown is logged, I think we should suppress its propagation further. {quote} I was also looking why we are getting "User aborted replication" messages. RecoveryStrategy in case of PULL replicas cancel the replication. Here is the explanation from the old JIRA. https://issues.apache.org/jira/browse/SOLR-10233 {quote} h3. Passive replica dies (or is unreachable) Replica won’t be query-able. On restart, replica will recover from the leader, following the same flow as _realtime_ replicas: set state to DOWN, then RECOVERING, and finally ACTIVE. _Passive_ replicas will use a different {{RecoveryStrategy}} implementation, that omits *preparerecovery,* and peer sync attempt, it will jump to replication . If the leader didn't change, or if the other replicas are of type “append”, replication should be incremental. Once the first replication is done, passive replica will declare itself active and start serving traffic. {quote} *RecoveryStrategy.java* {noformat} log.info("Stopping background replicate from leader process"); zkController.stopReplicationFromLeader(coreName); replicate(zkController.getNodeName(), core, leaderprops);{noformat} My own theory: # RecoveryStrategy cancel replication. # FileFetcher#fetchPackets throws ReplicationHandlerException {code:java} if (stop) { stop = false; aborted = true; throw new ReplicationHandlerException("User aborted replication"); }{code} # FileFetcher#fetch runs finally block where the sync is executed in async {code:java} fsyncService.submit(() -> { try { file.sync(); } catch (IOException e) { fsyncException = e; } catch (InterruptedException e) { throw new RuntimeException(e); } });{code} # At the same time the control gets back to fetchLatestIndex that performs cleanup and closed the directory {code:java} finally { if (!cleanupDone) { cleanup(solrCore, tmpIndexDir, indexDir, deleteTmpIdxDir, tmpTlogDir, successfulInstall); } }{code} And basically there is race condition between step 3 and 4 that's what I believe. Not able to reproduce on my system yet. > Pull replicas throws AlreadyClosedException > - > >
[jira] [Commented] (SOLR-17515) Recovery fails in Solr 9.7.0 if basic-auth is enabled
[ https://issues.apache.org/jira/browse/SOLR-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892821#comment-17892821 ] Jason Gerlowski commented on SOLR-17515: Credit and thanks to Patrik Peng and Endika Posadas for [reporting this on the users list|https://lists.apache.org/thread/jhs7lkg942nxg2hlb879k6tc832yhm06]! This seems like a pretty serious bug: perhaps worth a 9.7.1? > Recovery fails in Solr 9.7.0 if basic-auth is enabled > - > > Key: SOLR-17515 > URL: https://issues.apache.org/jira/browse/SOLR-17515 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 9.7 >Reporter: Jason Gerlowski >Priority: Major > > Several reporters on the users@ list, recently shared a bug they noticed on > upgrading to Solr 9.7. Replicas would try to recover, but fail with a > NullPointerException: > {code} > 2024-09-18 09:36:31.238 ERROR > (recoveryExecutor-12-thread-1-processing-fts06.host.internal:8983_solr > dovecot_fts_shard5_replica_n61 dovecot_fts shard5 core_node62) [c:dovecot_fts > s:shard5 r:core_node62 x:dovecot_fts_shard5_replica_n61 t:] > o.a.s.c.RecoveryStrategy Error while trying to recover. > core=dovecot_fts_shard5_replica_n61 => java.lang.NullPointerException: Cannot > invoke > "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)" > because "this.authenticationStore" is null > at > org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318) > java.lang.NullPointerException: Cannot invoke > "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)" > because "this.authenticationStore" is null > at > org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:97) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:85) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.httpClientBuilderSetup(Http2SolrClient.java:1093) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.build(Http2SolrClient.java:1062) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:907) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:633) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:333) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum > - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:309) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum > - 2024-09-03 15:05:20] > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:212) > ~[metrics-core-4.2.26.jar:4.2.26] > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) > ~[?:?] > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > ~[?:?] > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$1(ExecutorUtil.java:449) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > ~[?:?] > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > ~[?:?] > at java.base/java.lang.Thread.run(Thread.java:840) [?:?] > 2024-09-18 09:36:31.238 ERROR > (recoveryExecutor-12-thread-1-processing-fts06.host.internal:8
[jira] [Commented] (SOLR-17497) Pull replicas throws AlreadyClosedException
[ https://issues.apache.org/jira/browse/SOLR-17497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892780#comment-17892780 ] David Smiley commented on SOLR-17497: - bq. execute throws exception rather than suppressing it Maybe I'm nitpicking but execute() definitely isn't throwing the exception; it's impossible that it could even do so since the Runnable that does throw an exception happens asynchronously after execute() returns. The change from before is that the thrown exception (from the Runnable) is no longer captured into a Future; it bubbles up to the Thread uncaughtExceptionHandler where our test infrastructure notices it and reports it via com.carrotsearch.randomizedtesting.UncaughtExceptionError. CC [~andreybozhko]. For the case here in IndexFetcher, as long as the exception that is thrown is logged, I think we should suppress its propagation further. > Pull replicas throws AlreadyClosedException > - > > Key: SOLR-17497 > URL: https://issues.apache.org/jira/browse/SOLR-17497 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Sanjay Dutt >Priority: Major > Attachments: Screenshot 2024-10-23 at 6.01.02 PM.png > > > Recently, a common exception (org.apache.lucene.store.AlreadyClosedException: > this Directory is closed) seen in multiple failed test cases. > FAILED: org.apache.solr.cloud.TestPullReplica.testKillPullReplica > FAILED: > org.apache.solr.cloud.SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull > FAILED: org.apache.solr.cloud.TestPullReplica.testAddDocs > > > {code:java} > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=10271, > name=fsyncService-6341-thread-1, state=RUNNABLE, > group=TGRP-SplitShardWithNodeRoleTest] > at > __randomizedtesting.SeedInfo.seed([3F7DACB3BC44C3C4:E5DB3E97188A8EB9]:0) > Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is > closed > at __randomizedtesting.SeedInfo.seed([3F7DACB3BC44C3C4]:0) > at > app//org.apache.lucene.store.BaseDirectory.ensureOpen(BaseDirectory.java:50) > at > app//org.apache.lucene.store.ByteBuffersDirectory.sync(ByteBuffersDirectory.java:237) > at > app//org.apache.lucene.tests.store.MockDirectoryWrapper.sync(MockDirectoryWrapper.java:214) > at > app//org.apache.solr.handler.IndexFetcher$DirectoryFile.sync(IndexFetcher.java:2034) > at > app//org.apache.solr.handler.IndexFetcher$FileFetcher.lambda$fetch$0(IndexFetcher.java:1803) > at > app//org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$1(ExecutorUtil.java:449) > at > java.base@11.0.24/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base@11.0.24/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base@11.0.24/java.lang.Thread.run(Thread.java:829) > {code} > > Interesting thing about these test cases is that they all share same kind of > setup where each has one shard and two replicas – one NRT and another is PULL. > > Going through one of the test case execution step. > FAILED: org.apache.solr.cloud.TestPullReplica.testKillPullReplica > > Test flow > 1. Create a collection with 1 NRT and 1 PULL replica > 2. waitForState > 3. waitForNumDocsInAllActiveReplicas(0); // *Name says it all* > 4. Index another document. > 5. waitForNumDocsInAllActiveReplicas(1); > 6. Stop Pull replica > 7. Index another document > 8. waitForNumDocsInAllActiveReplicas(2); > 9. Start Pull Replica > 10. waitForState > 11. waitForNumDocsInAllActiveReplicas(2); > > As per the logs the whole sequence executed successfully. Here is the link to > the logs: > [https://ge.apache.org/s/yxydiox3gvlf2/tests/task/:solr:core:test/details/org.apache.solr.cloud.TestPullReplica/testKillPullReplica/1/output] > (link may stop working in the future) > > Last step where they are making sure that all the active replicas should have > two documents each has logged a info which is another proof that it completed > successfully. > > {code:java} > 616575 INFO > (TEST-TestPullReplica.testKillPullReplica-seed#[F30CC837FDD0DC28]) [n: c: s: > r: x: t:] o.a.s.c.TestPullReplica Replica core_node3 > (https://127.0.0.1:35647/solr/pull_replica_test_kill_pull_replica_shard1_replica_n1/) > has all 2 docs 616606 INFO (qtp1091538342-13057-null-11348) > [n:127.0.0.1:38207_solr c:pull_replica_test_kill_pull_replica s:shard1 > r:core_node4 x:pull_replica_test_kill_pull_replica_shard1_replica_p2 > t:null-11348] o.a.s.c.S.Request webapp=/solr path=/select > params={q=*:*&wt=javabin&version=2} rid=null-11348 hits=
[jira] [Commented] (SOLR-17497) Pull replicas throws AlreadyClosedException
[ https://issues.apache.org/jira/browse/SOLR-17497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892798#comment-17892798 ] Sanjay Dutt commented on SOLR-17497: Yeah you are right, I have to look into this subject (execute vs submit) more and how this whole things works. +1 to this. {quote}For the case here in IndexFetcher, as long as the exception that is thrown is logged, I think we should suppress its propagation further. {quote} I was also looking why we are getting "User aborted replication" messages. RecoveryStrategy in case of PULL replicas cancel the replication. Here is the explanation from the old JIRA. https://issues.apache.org/jira/browse/SOLR-10233 {quote} h3. Passive replica dies (or is unreachable) Replica won’t be query-able. On restart, replica will recover from the leader, following the same flow as _realtime_ replicas: set state to DOWN, then RECOVERING, and finally ACTIVE. _Passive_ replicas will use a different {{RecoveryStrategy}} implementation, that omits *preparerecovery,* and peer sync attempt, it will jump to replication . If the leader didn't change, or if the other replicas are of type “append”, replication should be incremental. Once the first replication is done, passive replica will declare itself active and start serving traffic. {quote} *RecoveryStrategy.java* {noformat} log.info("Stopping background replicate from leader process"); zkController.stopReplicationFromLeader(coreName); replicate(zkController.getNodeName(), core, leaderprops);{noformat} My own theory: # RecoveryStrategy cancel replication. # FileFetcher#fetchPackets throws ReplicationHandlerException {code:java} if (stop) { stop = false; aborted = true; throw new ReplicationHandlerException("User aborted replication"); }{code} # FileFetcher#fetch runs finally block where the sync is executed in async {code:java} fsyncService.submit(() -> { try { file.sync(); } catch (IOException e) { fsyncException = e; } catch (InterruptedException e) { throw new RuntimeException(e); } });{code} # At the same time the control gets back to fetchLatestIndex that performs cleanup and closed the directory {code:java} finally { if (!cleanupDone) { cleanup(solrCore, tmpIndexDir, indexDir, deleteTmpIdxDir, tmpTlogDir, successfulInstall); } }{code} And basically there is race condition between step 3 and 4 that's what I believe. Not able to reproduce on my system yet. > Pull replicas throws AlreadyClosedException > - > > Key: SOLR-17497 > URL: https://issues.apache.org/jira/browse/SOLR-17497 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Sanjay Dutt >Priority: Major > Attachments: Screenshot 2024-10-23 at 6.01.02 PM.png > > > Recently, a common exception (org.apache.lucene.store.AlreadyClosedException: > this Directory is closed) seen in multiple failed test cases. > FAILED: org.apache.solr.cloud.TestPullReplica.testKillPullReplica > FAILED: > org.apache.solr.cloud.SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull > FAILED: org.apache.solr.cloud.TestPullReplica.testAddDocs > > > {code:java} > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=10271, > name=fsyncService-6341-thread-1, state=RUNNABLE, > group=TGRP-SplitShardWithNodeRoleTest] > at > __randomizedtesting.SeedInfo.seed([3F7DACB3BC44C3C4:E5DB3E97188A8EB9]:0) > Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is > closed > at __randomizedtesting.SeedInfo.seed([3F7DACB3BC44C3C4]:0) > at > app//org.apache.lucene.store.BaseDirectory.ensureOpen(BaseDirectory.java:50) > at > app//org.apache.lucene.store.ByteBuffersDirectory.sync(ByteBuffersDirectory.java:237) > at > app//org.apache.lucene.tests.store.MockDirectoryWrapper.sync(MockDirectoryWrapper.java:214) > at > app//org.apache.solr.handler.IndexFetcher$DirectoryFile.sync(IndexFetcher.java:2034) > at > app//org.apache.solr.handler.IndexFetcher$FileFetcher.lambda$fetch$0(IndexFetcher.java:1803) > at > app//org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$1(ExecutorUtil.java:449) > at > java.base@11.0.24/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base@11.0.24/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base@11.0.24/java.lang.Thread.run(Thread.java:829) > {code} > > Interesting thing about these test cases is that they all share same kind of > setup where each has one shard and two replicas – one NRT and another is PULL. > > Going through
[jira] [Created] (SOLR-17515) Recovery fails in Solr 9.7.0 if basic-auth is enabled
Jason Gerlowski created SOLR-17515: -- Summary: Recovery fails in Solr 9.7.0 if basic-auth is enabled Key: SOLR-17515 URL: https://issues.apache.org/jira/browse/SOLR-17515 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 9.7 Reporter: Jason Gerlowski Several reporters on the users@ list, recently shared a bug they noticed on upgrading to Solr 9.7. Replicas would try to recover, but fail with a NullPointerException: {code} 2024-09-18 09:36:31.238 ERROR (recoveryExecutor-12-thread-1-processing-fts06.host.internal:8983_solr dovecot_fts_shard5_replica_n61 dovecot_fts shard5 core_node62) [c:dovecot_fts s:shard5 r:core_node62 x:dovecot_fts_shard5_replica_n61 t:] o.a.s.c.RecoveryStrategy Error while trying to recover. core=dovecot_fts_shard5_replica_n61 => java.lang.NullPointerException: Cannot invoke "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)" because "this.authenticationStore" is null at org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318) java.lang.NullPointerException: Cannot invoke "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)" because "this.authenticationStore" is null at org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318) ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:97) ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:85) ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.httpClientBuilderSetup(Http2SolrClient.java:1093) ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.build(Http2SolrClient.java:1062) ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:907) ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:633) ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:333) ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:309) ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:212) ~[metrics-core-4.2.26.jar:4.2.26] at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$1(ExecutorUtil.java:449) ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum - 2024-09-03 15:05:20] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.base/java.lang.Thread.run(Thread.java:840) [?:?] 2024-09-18 09:36:31.238 ERROR (recoveryExecutor-12-thread-1-processing-fts06.host.internal:8983_solr dovecot_fts_shard5_replica_n61 dovecot_fts shard5 core_node62) [c:dovecot_fts s:shard5 r:core_node62 x:dovecot_fts_shard5_replica_n61 t:] o.a.s.c.RecoveryStrategy Recovery failed - trying again... (0) 2024-09-18 09:36:31.238 INFO (recoveryExecutor-12-thread-1-processing-fts06.host.internal:8983_solr dovecot_fts_shard5_replica_n61 dovecot_fts shard5 core_node62) [c:dovecot_fts s:shard5 r:core_node62 x:dovecot_fts_shard5_replica_n61 t:] o.a.s.c.RecoveryStrategy Wait [4] seconds before trying to recover again (attempt=1) {code} It turns out that the issue isn't specific
[jira] [Commented] (SOLR-17497) Pull replicas throws AlreadyClosedException
[ https://issues.apache.org/jira/browse/SOLR-17497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893024#comment-17893024 ] David Smiley commented on SOLR-17497: - I'm confused; is this one JIRA issue about two different exception? > Pull replicas throws AlreadyClosedException > - > > Key: SOLR-17497 > URL: https://issues.apache.org/jira/browse/SOLR-17497 > Project: Solr > Issue Type: Task >Reporter: Sanjay Dutt >Priority: Major > Attachments: Screenshot 2024-10-23 at 6.01.02 PM.png > > > Recently, a common exception (org.apache.lucene.store.AlreadyClosedException: > this Directory is closed) seen in multiple failed test cases. > FAILED: org.apache.solr.cloud.TestPullReplica.testKillPullReplica > FAILED: > org.apache.solr.cloud.SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull > FAILED: org.apache.solr.cloud.TestPullReplica.testAddDocs > > > {code:java} > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=10271, > name=fsyncService-6341-thread-1, state=RUNNABLE, > group=TGRP-SplitShardWithNodeRoleTest] > at > __randomizedtesting.SeedInfo.seed([3F7DACB3BC44C3C4:E5DB3E97188A8EB9]:0) > Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is > closed > at __randomizedtesting.SeedInfo.seed([3F7DACB3BC44C3C4]:0) > at > app//org.apache.lucene.store.BaseDirectory.ensureOpen(BaseDirectory.java:50) > at > app//org.apache.lucene.store.ByteBuffersDirectory.sync(ByteBuffersDirectory.java:237) > at > app//org.apache.lucene.tests.store.MockDirectoryWrapper.sync(MockDirectoryWrapper.java:214) > at > app//org.apache.solr.handler.IndexFetcher$DirectoryFile.sync(IndexFetcher.java:2034) > at > app//org.apache.solr.handler.IndexFetcher$FileFetcher.lambda$fetch$0(IndexFetcher.java:1803) > at > app//org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$1(ExecutorUtil.java:449) > at > java.base@11.0.24/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base@11.0.24/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base@11.0.24/java.lang.Thread.run(Thread.java:829) > {code} > > Interesting thing about these test cases is that they all share same kind of > setup where each has one shard and two replicas – one NRT and another is PULL. > > Going through one of the test case execution step. > FAILED: org.apache.solr.cloud.TestPullReplica.testKillPullReplica > > Test flow > 1. Create a collection with 1 NRT and 1 PULL replica > 2. waitForState > 3. waitForNumDocsInAllActiveReplicas(0); // *Name says it all* > 4. Index another document. > 5. waitForNumDocsInAllActiveReplicas(1); > 6. Stop Pull replica > 7. Index another document > 8. waitForNumDocsInAllActiveReplicas(2); > 9. Start Pull Replica > 10. waitForState > 11. waitForNumDocsInAllActiveReplicas(2); > > As per the logs the whole sequence executed successfully. Here is the link to > the logs: > [https://ge.apache.org/s/yxydiox3gvlf2/tests/task/:solr:core:test/details/org.apache.solr.cloud.TestPullReplica/testKillPullReplica/1/output] > (link may stop working in the future) > > Last step where they are making sure that all the active replicas should have > two documents each has logged a info which is another proof that it completed > successfully. > > {code:java} > 616575 INFO > (TEST-TestPullReplica.testKillPullReplica-seed#[F30CC837FDD0DC28]) [n: c: s: > r: x: t:] o.a.s.c.TestPullReplica Replica core_node3 > (https://127.0.0.1:35647/solr/pull_replica_test_kill_pull_replica_shard1_replica_n1/) > has all 2 docs 616606 INFO (qtp1091538342-13057-null-11348) > [n:127.0.0.1:38207_solr c:pull_replica_test_kill_pull_replica s:shard1 > r:core_node4 x:pull_replica_test_kill_pull_replica_shard1_replica_p2 > t:null-11348] o.a.s.c.S.Request webapp=/solr path=/select > params={q=*:*&wt=javabin&version=2} rid=null-11348 hits=2 status=0 QTime=0 > 616607 INFO > (TEST-TestPullReplica.testKillPullReplica-seed#[F30CC837FDD0DC28]) [n: c: s: > r: x: t:] o.a.s.c.TestPullReplica Replica core_node4 > (https://127.0.0.1:38207/solr/pull_replica_test_kill_pull_replica_shard1_replica_p2/) > has all 2 docs{code} > > *Where is the issue then?* > In the logs it has been observed, that after restarting the PULL replica. The > recovery process started and after fetching all the files info from the NRT, > the replication aborted and logged "User aborted replication" > > {code:java} > o.a.s.h.IndexFetcher User aborted Replication => > org.apache.solr.handler.IndexFetcher$ReplicationHandlerException: User > aborted replication at > org.apache.so
[PR] Fix SolrJmxReporterTest#testClosedCore [solr]
iamsanjay opened a new pull request, #2797: URL: https://github.com/apache/solr/pull/2797 This PR addresses a race condition in the code where a separate thread continuously retrieves attributes from an MBean, while the main thread may unload the MBean before the retrieval thread has fully terminated. Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/solr/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [x] I have added tests for my changes. - [ ] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-16962: Restore ability to configure tlog directory [solr]
iamsanjay commented on PR #1895: URL: https://github.com/apache/solr/pull/1895#issuecomment-2439282497 git bisect points to this PR. Require bit more attention to see whether this PR causing it or not. **org.apache.solr.search.TestCollapseQParserPlugin.testMultiSort (:solr:core)** ``` Test history: https://ge.apache.org/scans/tests?search.rootProjectNames=solr-root&tests.container=org.apache.solr.search.TestCollapseQParserPlugin&tests.test=testMultiSort http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.search.TestCollapseQParserPlugin.testMultiSort Test output: /Users/sanjaydutt/Documents/solr/solr/core/build/test-results/test/outputs/OUTPUT-org.apache.solr.search.TestCollapseQParserPlugin.txt Reproduce with: ./gradlew :solr:core:test --tests "org.apache.solr.search.TestCollapseQParserPlugin.testMultiSort" -Ptests.jvms=4 -Ptests.haltonfailure=false "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m" -Ptests.seed=F8EF1414D2733583 -Ptests.multiplier=2 -Ptests.badapples=false -Ptests.file.encoding=US-ASCII ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-16116: Use apache curator to manage the Solr Zookeeper interactions [solr]
HoustonPutman commented on code in PR #760: URL: https://github.com/apache/solr/pull/760#discussion_r1817369402 ## solr/test-framework/build.gradle: ## @@ -43,6 +43,17 @@ dependencies { var zkExcludes = { exclude group: "org.apache.yetus", module: "audience-annotations" } + api('org.apache.curator:curator-client', { Review Comment: I've changed all curator deps here to "implementation" ## solr/solrj-zookeeper/build.gradle: ## @@ -32,6 +32,13 @@ dependencies { implementation project(':solr:solrj') +api('org.apache.curator:curator-client', { Review Comment: Actually implementation should be ok for the `curator-client`. For `curator-framework`, that wouldn't be great because of the `SolrZkClient.multi()` function parameters. ## gradle/testing/randomization/policies/solr-tests.policy: ## @@ -50,6 +50,7 @@ grant { permission java.net.SocketPermission "127.0.0.1:4", "connect,resolve"; permission java.net.SocketPermission "127.0.0.1:6", "connect,resolve"; permission java.net.SocketPermission "127.0.0.1:8", "connect,resolve"; + permission java.net.SocketPermission "--", "connect,resolve"; Review Comment: It's a fake ZK host used in a test, just like all of the other ones. But I added a comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-16470: Create V2 equivalent of V1 Replication: Get files/{filePath} [solr]
gerlowskija commented on code in PR #2734: URL: https://github.com/apache/solr/pull/2734#discussion_r1817490345 ## solr/core/src/java/org/apache/solr/handler/admin/api/CoreReplicationAPI.java: ## @@ -68,6 +71,45 @@ public FileListResponse fetchFileList( return doFetchFileList(gen); } + @GET Review Comment: OK, I'll leave you to it, but if you have any questions or get stuck, lmk and I'll try to help out! (Hope you had a great vacation!) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-17497) Pull replicas throws AlreadyClosedException
[ https://issues.apache.org/jira/browse/SOLR-17497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893025#comment-17893025 ] Sanjay Dutt commented on SOLR-17497: Sorry, Initially I had no idea what's going on so i shared whatever I can found here in this JIRA. There is only one exception that is relevant – AlreadyClosedException. The other one "User aborted Replication" is expected and observed whenever the replication is aborted. Even when you run org.apache.solr.cloud.TestPullReplica.testKillPullReplica, you will see this exception in the logs and that's fine IMO. > Pull replicas throws AlreadyClosedException > - > > Key: SOLR-17497 > URL: https://issues.apache.org/jira/browse/SOLR-17497 > Project: Solr > Issue Type: Task >Reporter: Sanjay Dutt >Priority: Major > Attachments: Screenshot 2024-10-23 at 6.01.02 PM.png > > > Recently, a common exception (org.apache.lucene.store.AlreadyClosedException: > this Directory is closed) seen in multiple failed test cases. > FAILED: org.apache.solr.cloud.TestPullReplica.testKillPullReplica > FAILED: > org.apache.solr.cloud.SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull > FAILED: org.apache.solr.cloud.TestPullReplica.testAddDocs > > > {code:java} > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=10271, > name=fsyncService-6341-thread-1, state=RUNNABLE, > group=TGRP-SplitShardWithNodeRoleTest] > at > __randomizedtesting.SeedInfo.seed([3F7DACB3BC44C3C4:E5DB3E97188A8EB9]:0) > Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is > closed > at __randomizedtesting.SeedInfo.seed([3F7DACB3BC44C3C4]:0) > at > app//org.apache.lucene.store.BaseDirectory.ensureOpen(BaseDirectory.java:50) > at > app//org.apache.lucene.store.ByteBuffersDirectory.sync(ByteBuffersDirectory.java:237) > at > app//org.apache.lucene.tests.store.MockDirectoryWrapper.sync(MockDirectoryWrapper.java:214) > at > app//org.apache.solr.handler.IndexFetcher$DirectoryFile.sync(IndexFetcher.java:2034) > at > app//org.apache.solr.handler.IndexFetcher$FileFetcher.lambda$fetch$0(IndexFetcher.java:1803) > at > app//org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$1(ExecutorUtil.java:449) > at > java.base@11.0.24/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base@11.0.24/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base@11.0.24/java.lang.Thread.run(Thread.java:829) > {code} > > Interesting thing about these test cases is that they all share same kind of > setup where each has one shard and two replicas – one NRT and another is PULL. > > Going through one of the test case execution step. > FAILED: org.apache.solr.cloud.TestPullReplica.testKillPullReplica > > Test flow > 1. Create a collection with 1 NRT and 1 PULL replica > 2. waitForState > 3. waitForNumDocsInAllActiveReplicas(0); // *Name says it all* > 4. Index another document. > 5. waitForNumDocsInAllActiveReplicas(1); > 6. Stop Pull replica > 7. Index another document > 8. waitForNumDocsInAllActiveReplicas(2); > 9. Start Pull Replica > 10. waitForState > 11. waitForNumDocsInAllActiveReplicas(2); > > As per the logs the whole sequence executed successfully. Here is the link to > the logs: > [https://ge.apache.org/s/yxydiox3gvlf2/tests/task/:solr:core:test/details/org.apache.solr.cloud.TestPullReplica/testKillPullReplica/1/output] > (link may stop working in the future) > > Last step where they are making sure that all the active replicas should have > two documents each has logged a info which is another proof that it completed > successfully. > > {code:java} > 616575 INFO > (TEST-TestPullReplica.testKillPullReplica-seed#[F30CC837FDD0DC28]) [n: c: s: > r: x: t:] o.a.s.c.TestPullReplica Replica core_node3 > (https://127.0.0.1:35647/solr/pull_replica_test_kill_pull_replica_shard1_replica_n1/) > has all 2 docs 616606 INFO (qtp1091538342-13057-null-11348) > [n:127.0.0.1:38207_solr c:pull_replica_test_kill_pull_replica s:shard1 > r:core_node4 x:pull_replica_test_kill_pull_replica_shard1_replica_p2 > t:null-11348] o.a.s.c.S.Request webapp=/solr path=/select > params={q=*:*&wt=javabin&version=2} rid=null-11348 hits=2 status=0 QTime=0 > 616607 INFO > (TEST-TestPullReplica.testKillPullReplica-seed#[F30CC837FDD0DC28]) [n: c: s: > r: x: t:] o.a.s.c.TestPullReplica Replica core_node4 > (https://127.0.0.1:38207/solr/pull_replica_test_kill_pull_replica_shard1_replica_p2/) > has all 2 docs{code} > > *Where is the issue then?* > In the logs it has been observed, that after restarting th
[jira] [Updated] (SOLR-16962) updateLog tlog dir location config is silently ignored
[ https://issues.apache.org/jira/browse/SOLR-16962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SOLR-16962: -- Labels: pull-request-available (was: ) > updateLog tlog dir location config is silently ignored > --- > > Key: SOLR-16962 > URL: https://issues.apache.org/jira/browse/SOLR-16962 > Project: Solr > Issue Type: Bug >Affects Versions: main (10.0), 9.2.1 >Reporter: Michael Gibney >Assignee: Michael Gibney >Priority: Minor > Labels: pull-request-available > Fix For: 9.7 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > If you follow the > [instructions|https://solr.apache.org/guide/solr/latest/configuration-guide/commits-transaction-logs.html#transaction-log] > on configuring a non-default tlog location, solr currently silently ignores > explicit configuration and uses the default location > {{[instanceDir]/data/tlog/}}. > Afaict this has been the case for some time, with several layers of faithful > refactorings now somewhat obscuring the initial intent. > This issue proposes to restore the initial intent, and also shore up some of > the nuances of handling this (now that the config actually has an effect): > # resolve relative "dir" spec relative to core instanceDir > # disallow relative "dir" spec that escapes core instanceDir (e.g., > {{dir=../../some_path}}) > # for absolute "dir" spec outside of the core instanceDir, scope the tlog dir > by core name -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-17511) CLI: Resole -i conflicts (async-id, cluster-id)
[ https://issues.apache.org/jira/browse/SOLR-17511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892754#comment-17892754 ] Christos Malliaridis commented on SOLR-17511: - That is right, thanks for the correction. 😅 > CLI: Resole -i conflicts (async-id, cluster-id) > --- > > Key: SOLR-17511 > URL: https://issues.apache.org/jira/browse/SOLR-17511 > Project: Solr > Issue Type: Sub-task > Components: cli >Affects Versions: 9.7, 9.6.1 >Reporter: Christos Malliaridis >Priority: Minor > Labels: cli > > The CLI flag {{\-i}} is currently used in two options: > - for {{async-id}} in SnapshotExportTool for specifying an asynchronous > request identifier > - for {{cluster-id}} in SolrExporter for specifying a unique cluster > identifier > Since both short options are not obvious and the letter {{i}} may be used in > another context in the future, we should reserve it and deprecate (9.8) / > remove (10.0) it from the above options. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Updated] (SOLR-17511) CLI: Resole -i conflicts (async-id, cluster-id)
[ https://issues.apache.org/jira/browse/SOLR-17511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SOLR-17511: -- Labels: cli pull-request-available (was: cli) > CLI: Resole -i conflicts (async-id, cluster-id) > --- > > Key: SOLR-17511 > URL: https://issues.apache.org/jira/browse/SOLR-17511 > Project: Solr > Issue Type: Sub-task > Components: cli >Affects Versions: 9.7, 9.6.1 >Reporter: Christos Malliaridis >Priority: Minor > Labels: cli, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The CLI flag {{\-i}} is currently used in two options: > - for {{async-id}} in SnapshotExportTool for specifying an asynchronous > request identifier > - for {{cluster-id}} in SolrExporter for specifying a unique cluster > identifier > Since both short options are not obvious and the letter {{i}} may be used in > another context in the future, we should reserve it and deprecate (9.8) / > remove (10.0) it from the above options. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Resolved] (SOLR-17488) CLI: Resolve -d conflicts
[ https://issues.apache.org/jira/browse/SOLR-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh resolved SOLR-17488. -- Fix Version/s: 9.8 Resolution: Fixed > CLI: Resolve -d conflicts > - > > Key: SOLR-17488 > URL: https://issues.apache.org/jira/browse/SOLR-17488 > Project: Solr > Issue Type: Sub-task > Components: cli >Affects Versions: 9.7, 9.6.1 >Reporter: Christos Malliaridis >Assignee: Eric Pugh >Priority: Major > Labels: cli, pull-request-available > Fix For: 9.8 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The CLI flag {{\-d}} is currently used in four options: > - {{conf-dir}} for providing the configuration directory in CreateTool, > ConfigSetDownloadTool, ConfigSetUploadTool, ZKCLI > - {{delete-config}} (with argument) for deleting configurations together with > collections in DeleteTool, defualts to {{true}} > - {{server-dir}} for defining the Solr root / server directory in > RunExampleTool > - {{delay}} for delaying recursive posts in PostTool > *Proposed Resolution* > In order to avoid confusion of the {{\-d}} flag, the following changes are > proposed: > - keep {{\-d}} for {{conf-dir}} in CreateTool, ConfigSetDownloadTool, > ConfigSetUploadTool, ZKCLI > - Deprecated (9.8) and remove (10.0) the {{delete-config}} option by > replacing it with {{keep}} ({{\-\-keep}} without arguments) for simplifying > and improving user experience and avoid conflict of {{\-d}}. "{{\-\-keep}}" > should behave equivalent to "{{\-\-delete-config false}}". > - Deprecate (9.8) and remove (10.0) {{\-d}} for {{server-dir}} in > RunExmapleTool. Note that {{\-\-server-dir}} may be removed or renamed to use > better wording in the future. > - Support {{\-\-server-dir}} in {{bin/solr}} and if necessary > {{bin/solr.cmd}} in version 9.8 and 10.0 > - Deprecate (9.8) and remove (10.0) {{\-d}} for {{delay}} in PostTool to > avoid any confusion with {{conf-dir}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Updated] (SOLR-17511) CLI: Resolve -i conflicts (async-id, cluster-id)
[ https://issues.apache.org/jira/browse/SOLR-17511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-17511: - Summary: CLI: Resolve -i conflicts (async-id, cluster-id) (was: CLI: Resole -i conflicts (async-id, cluster-id)) > CLI: Resolve -i conflicts (async-id, cluster-id) > > > Key: SOLR-17511 > URL: https://issues.apache.org/jira/browse/SOLR-17511 > Project: Solr > Issue Type: Sub-task > Components: cli >Affects Versions: 9.7, 9.6.1 >Reporter: Christos Malliaridis >Assignee: Eric Pugh >Priority: Minor > Labels: cli, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The CLI flag {{\-i}} is currently used in two options: > - for {{async-id}} in SnapshotExportTool for specifying an asynchronous > request identifier > - for {{cluster-id}} in SolrExporter for specifying a unique cluster > identifier > Since both short options are not obvious and the letter {{i}} may be used in > another context in the future, we should reserve it and deprecate (9.8) / > remove (10.0) it from the above options. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Assigned] (SOLR-17511) CLI: Resole -i conflicts (async-id, cluster-id)
[ https://issues.apache.org/jira/browse/SOLR-17511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh reassigned SOLR-17511: Assignee: Eric Pugh > CLI: Resole -i conflicts (async-id, cluster-id) > --- > > Key: SOLR-17511 > URL: https://issues.apache.org/jira/browse/SOLR-17511 > Project: Solr > Issue Type: Sub-task > Components: cli >Affects Versions: 9.7, 9.6.1 >Reporter: Christos Malliaridis >Assignee: Eric Pugh >Priority: Minor > Labels: cli, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The CLI flag {{\-i}} is currently used in two options: > - for {{async-id}} in SnapshotExportTool for specifying an asynchronous > request identifier > - for {{cluster-id}} in SolrExporter for specifying a unique cluster > identifier > Since both short options are not obvious and the letter {{i}} may be used in > another context in the future, we should reserve it and deprecate (9.8) / > remove (10.0) it from the above options. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Comment Edited] (SOLR-6122) API to cancel an already submitted/running Collections API call
[ https://issues.apache.org/jira/browse/SOLR-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892240#comment-17892240 ] Yuntong Qu edited comment on SOLR-6122 at 10/25/24 2:54 PM: Made a POC PR using deleteStatus to remove not-started tasks. Using delete status endpoint to forcefully delete not-started tracking. And after cancel, the task will not be present in failure/completed map. TBH, I don't particular love this solution as it limit us forward to do cancel in-progress. But I also want to get some opinions on this. – One of the main problem I am having rn is to deal with OverseerTaskProcesser keeing below in-memory data structure - runningZKTasks (Set of tasks that have been picked up for processing but not cleaned up from ZK work-queue) - blockedTasks (contain tasks which are read from work queue but could not be executed because they are blocked or the execution queue is full) With above 2 data structure, overseer will not have real time view of what's happening on ZK queue ( which is an optimization to reduce ZK read ). I am working on another way to add cancel task to _*collection-queue-work*_ and a new _OverseerMessageHandler_ to handle cancel task specific (instead of using OverseerCollectionMessageHandler), and let that cancel message handler modify ZK queue and in-memory tracking for Overseer – Re [~gerlowskija] on order of cancel: - If we send cancel task to _*collection-queue-work,*_ there are still chances that the cancel won't be picked up, since in OverseerTaskProcessor we limit num of task picked up from the queue, and if we exceed MAX_BLOCKED_TASKS, no new tasks will be picked up. And if there many running task exceeding or MAX_PARALLEL_TASKS, no new cancel tasks can be started. - after a cancel task is being picked up in OverseerTaskProcessor, from my reading of the coding, each queue item will spun up another Runner thread to handle each task, so the processing of queued item should be quite fast since it's non-blocking. - Also locking is on different level (replica/shard/collection), thus if we make cancel task require no lock, cancel can be educated earlier - As long as we make sure that when cancel task is executing. it has real time view of ZK queue, it should not mistaken a started task as pending task - To completely eliminate the concern of cancel task not being handle ASAP when submitted, in my mind, the best approach is to have a another queue to take in cancel task requests. Trade off here is complexity, but submitting to _*collection-queue-work*_ should mostly work. Maybe an improvement will be to add a new queue if needed was (Author: yuntong): Made a POC PR using deleteStatus to remove not-started tasks. Using delete status endpoint to forcefully delete not-started tracking. And after cancel, the task will not be present in failure/completed map. TBH, I don't particular love this solution as it limit us forward to do cancel in-progress. But also I want to get some opinions on this. – One of the main problem I am having rn is to deal with OverseerTaskProcesser keeing below in-memory data structure - runningZKTasks (Set of tasks that have been picked up for processing but not cleaned up from zk work-queue) - blockedTasks (contain tasks which are read from work queue but could not be executed because they are blocked or the execution queue is full) With above 2 data structure, overseer will not have real time view of what's happening on ZK queue ( which is an optimization to reduce ZK read ). I am working on another way to add cancel task to _*collection-queue-work*_ and a new _OverseerMessageHandler_ to handle cancel task specific (instead of using OverseerCollectionMessageHandler), and let that cancel message handler modify ZK queue and in-memory tracking for Overseer – Re [~gerlowskija] on order of cancel: - If we send cancel task to _*collection-queue-work,*_ there are still chances that the cancel won't be picked up, since in OverseerTaskProcessor we limit num of task picked up from the queue, and if we exceed MAX_BLOCKED_TASKS, no new tasks will be picked up. And if there many running task exceeding or MAX_PARALLEL_TASKS, no new cancel tasks can be started. - after a cancel task is being picked up in OverseerTaskProcessor, from my reading of the coding, each queue item will spun up another Runner thread to handle each task, so the processing of queued item should be quite fast since it's non-blocking. - Also locking is on different level (replica/shard/collection), thus if we make cancel task require no lock, cancel can be excuated earlier - As long as we make sure that when cancel task is executing. it has real time view of ZK queue, it should not mistaken a started task as pending task - To completely elimiate the concern of cancle task not beeing handle ASAP wh
[jira] [Assigned] (SOLR-17515) Recovery fails in Solr 9.7.0 if basic-auth is enabled
[ https://issues.apache.org/jira/browse/SOLR-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski reassigned SOLR-17515: -- Assignee: Jason Gerlowski > Recovery fails in Solr 9.7.0 if basic-auth is enabled > - > > Key: SOLR-17515 > URL: https://issues.apache.org/jira/browse/SOLR-17515 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 9.7 >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > > Several reporters on the users@ list, recently shared a bug they noticed on > upgrading to Solr 9.7. Replicas would try to recover, but fail with a > NullPointerException: > {code} > 2024-09-18 09:36:31.238 ERROR > (recoveryExecutor-12-thread-1-processing-fts06.host.internal:8983_solr > dovecot_fts_shard5_replica_n61 dovecot_fts shard5 core_node62) [c:dovecot_fts > s:shard5 r:core_node62 x:dovecot_fts_shard5_replica_n61 t:] > o.a.s.c.RecoveryStrategy Error while trying to recover. > core=dovecot_fts_shard5_replica_n61 => java.lang.NullPointerException: Cannot > invoke > "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)" > because "this.authenticationStore" is null > at > org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318) > java.lang.NullPointerException: Cannot invoke > "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)" > because "this.authenticationStore" is null > at > org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:97) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:85) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.httpClientBuilderSetup(Http2SolrClient.java:1093) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.build(Http2SolrClient.java:1062) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:907) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:633) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:333) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum > - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:309) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum > - 2024-09-03 15:05:20] > ... > {code} > It turns out that the issue isn't specific to upgrading clusters: *any 9.7.0 > cluster (new or existing/upgrading) that uses basic-auth will hit this NPE on > during replica recovery*. The result is that replicas will fail to recover, > and sit marked as "recovering" indefinitely. > The issue can be reproduced locally in a source-checkout using the following > steps: > {code} > git checkout branch_9_7 > ./gradlew clean assemble > cd solr/packaging/build/solr-9.7.0-SNAPSHOT > # At prompts, I chose: 4 nodes, "gettingstarted", 1 shard, 2 replicas, > "_default" configset > bin/solr start -e cloud > bin/solr post -c gettingstarted example/exampledocs/books.json > # Stop the node containing the non-leader replica > bin/solr stop -p > bin/solr post -c gettingstarted example/exampledocs/books.csv > # Enable auth and trigger recovery by turning the node back on > bin/solr auth enable -type basicAuth -credentials solr:solrRocks > -blockUnknown true > # This line will need tweaked based on which Solr node was previously stopped > "bin/solr" start --cloud -p -s "example/cloud//solr" -z > 127.0.0.1:9983 > {code} -- This message was sent
[jira] [Commented] (SOLR-17515) Recovery fails in Solr 9.7.0 if basic-auth is enabled
[ https://issues.apache.org/jira/browse/SOLR-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892962#comment-17892962 ] Sanjay Dutt commented on SOLR-17515: Thank you so much [~gerlowskija] for reproducing it and providing all the details. Though I am bit confused with all the different auth mechanism we have in place. Even last time two auth cases found for which new test case were added. Clearly, more test cases are required. Going to work on this one unless you are already on it. > Recovery fails in Solr 9.7.0 if basic-auth is enabled > - > > Key: SOLR-17515 > URL: https://issues.apache.org/jira/browse/SOLR-17515 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 9.7 >Reporter: Jason Gerlowski >Priority: Major > > Several reporters on the users@ list, recently shared a bug they noticed on > upgrading to Solr 9.7. Replicas would try to recover, but fail with a > NullPointerException: > {code} > 2024-09-18 09:36:31.238 ERROR > (recoveryExecutor-12-thread-1-processing-fts06.host.internal:8983_solr > dovecot_fts_shard5_replica_n61 dovecot_fts shard5 core_node62) [c:dovecot_fts > s:shard5 r:core_node62 x:dovecot_fts_shard5_replica_n61 t:] > o.a.s.c.RecoveryStrategy Error while trying to recover. > core=dovecot_fts_shard5_replica_n61 => java.lang.NullPointerException: Cannot > invoke > "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)" > because "this.authenticationStore" is null > at > org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318) > java.lang.NullPointerException: Cannot invoke > "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)" > because "this.authenticationStore" is null > at > org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:97) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:85) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.httpClientBuilderSetup(Http2SolrClient.java:1093) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.build(Http2SolrClient.java:1062) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:907) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:633) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:333) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum > - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:309) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum > - 2024-09-03 15:05:20] > ... > {code} > It turns out that the issue isn't specific to upgrading clusters: *any 9.7.0 > cluster (new or existing/upgrading) that uses basic-auth will hit this NPE on > during replica recovery*. The result is that replicas will fail to recover, > and sit marked as "recovering" indefinitely. > The issue can be reproduced locally in a source-checkout using the following > steps: > {code} > git checkout branch_9_7 > ./gradlew clean assemble > cd solr/packaging/build/solr-9.7.0-SNAPSHOT > # At prompts, I chose: 4 nodes, "gettingstarted", 1 shard, 2 replicas, > "_default" configset > bin/solr start -e cloud > bin/solr post -c gettingstarted example/exampledocs/books.json > # Stop the node containing the non-leader replica > bin/solr stop -p > bin/solr post -c gettingstarted example/exampledocs/books.csv > # Enable auth and trigger recove
[jira] [Commented] (SOLR-17515) Recovery fails in Solr 9.7.0 if basic-auth is enabled
[ https://issues.apache.org/jira/browse/SOLR-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892967#comment-17892967 ] Sanjay Dutt commented on SOLR-17515: We both updated it same time. That's great! Yes go ahead and take it, and meanwhile I will try to see If why my old test case were not able to caught this one, and try to update them. > Recovery fails in Solr 9.7.0 if basic-auth is enabled > - > > Key: SOLR-17515 > URL: https://issues.apache.org/jira/browse/SOLR-17515 > Project: Solr > Issue Type: Bug >Affects Versions: 9.7 >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > > Several reporters on the users@ list, recently shared a bug they noticed on > upgrading to Solr 9.7. Replicas would try to recover, but fail with a > NullPointerException: > {code} > 2024-09-18 09:36:31.238 ERROR > (recoveryExecutor-12-thread-1-processing-fts06.host.internal:8983_solr > dovecot_fts_shard5_replica_n61 dovecot_fts shard5 core_node62) [c:dovecot_fts > s:shard5 r:core_node62 x:dovecot_fts_shard5_replica_n61 t:] > o.a.s.c.RecoveryStrategy Error while trying to recover. > core=dovecot_fts_shard5_replica_n61 => java.lang.NullPointerException: Cannot > invoke > "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)" > because "this.authenticationStore" is null > at > org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318) > java.lang.NullPointerException: Cannot invoke > "org.apache.solr.client.solrj.impl.AuthenticationStoreHolder.updateAuthenticationStore(org.eclipse.jetty.client.api.AuthenticationStore)" > because "this.authenticationStore" is null > at > org.apache.solr.client.solrj.impl.Http2SolrClient.setAuthenticationStore(Http2SolrClient.java:318) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:97) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory.setup(PreemptiveBasicAuthClientBuilderFactory.java:85) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.httpClientBuilderSetup(Http2SolrClient.java:1093) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.client.solrj.impl.Http2SolrClient$Builder.build(Http2SolrClient.java:1062) > ~[solr-solrj-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:907) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:633) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - > anshum - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:333) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum > - 2024-09-03 15:05:20] > at > org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:309) > ~[solr-core-9.7.0.jar:9.7.0 675a41516e3f3bacfc975590773e7abdca444ff4 - anshum > - 2024-09-03 15:05:20] > ... > {code} > It turns out that the issue isn't specific to upgrading clusters: *any 9.7.0 > cluster (new or existing/upgrading) that uses basic-auth will hit this NPE on > during replica recovery*. The result is that replicas will fail to recover, > and sit marked as "recovering" indefinitely. > The issue can be reproduced locally in a source-checkout using the following > steps: > {code} > git checkout branch_9_7 > ./gradlew clean assemble > cd solr/packaging/build/solr-9.7.0-SNAPSHOT > # At prompts, I chose: 4 nodes, "gettingstarted", 1 shard, 2 replicas, > "_default" configset > bin/solr start -e cloud > bin/solr post -c gettingstarted example/exampledocs/books.json > # Stop the node containing the non-leader replica > bin/solr stop -p > bin/solr post -c gettingstarted example/exampledocs/books.csv > # Enable auth and trigger recovery by turning the node back on > bin/solr auth enable -type basicAuth -credentials solr:solrRocks > -blockUnknown true > # This line will need tweaked based on which Solr node was previo
[PR] Fix release wizard to remove from Solr space before attempting Lucene [solr]
anshumg opened a new pull request, #2796: URL: https://github.com/apache/solr/pull/2796 We should just remove attempting to cleanup the Lucene space once we do the Solr 10.0 release. Right now the wizard fails when it tries to cleanup the Lucene space because the 9x Solr releases are not found there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] EOL Solr 8 [solr-site]
HoustonPutman commented on PR #131: URL: https://github.com/apache/solr-site/pull/131#issuecomment-2438874511 Ok, revised the two sentences. Happy to change it to whatever if you still don't like it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] EOL Solr 8 [solr-site]
anshumg commented on code in PR #131: URL: https://github.com/apache/solr-site/pull/131#discussion_r1817178495 ## content/solr/solr_news/2024-10-25-solr8-eol.md: ## @@ -0,0 +1,8 @@ +Title: Solr 8 reaches End-Of-Life +category: solr/news +save_as: + +After the release of Solr 8.11.4, the Apache Solr community will no longer provide support for Solr 8.11. +With Lucene 10 having been released, and therefore Lucene 8.11 reaching EOL, the Apache Lucene and Solr community are no longer able to provide new releases for Solr 8. Review Comment: This has a lot of overlap with the previous sentence, right? ## content/solr/solr_news/2024-10-25-solr8-eol.md: ## @@ -0,0 +1,8 @@ +Title: Solr 8 reaches End-Of-Life +category: solr/news +save_as: + +After the release of Solr 8.11.4, the Apache Solr community will no longer provide support for Solr 8.11. Review Comment: Let's change that to say 8x instead of 8.11. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-16390: v2 Cluster Property APIs. [solr]
gerlowskija commented on PR #2788: URL: https://github.com/apache/solr/pull/2788#issuecomment-2438568772 Still going through the individual files on this PR, but wanted to respond to some of the high-level comments first: > Wasn't sure how to model the response. Different APIs use different error for "does not exist" responses: /api/collections/collectionName returns a 400, /api/aliases/specificalias returns a 405, /solr/collectionName/schema/fields/fieldName returns a 404 I'm personally a fan of 404 in this case as it seems a little more actionable for users than the more generic '400', so that'd be my preference. But I don't have any strong feelings on that point, and would be open to something else if you do have preferences? When we decide, we should document the decision in `dev-docs/v2-api-conventions.adoc` so there's a "standard" we can align on. (I suspect the 405 returned by `GET /api/aliases/nonexistentAlias` is a bug, FWIW. Will have to file a ticket for that if I can reproduce...) > I wasn't sure if that would be preferred over grouping them all together as was done with AliasPropertyApis/AliasProperty I prefer grouping related APIs into a single file, at least on the 'api' side. IMO it cuts down on boilerplate, and makes reviewing and browsing easier by keeping a bunch of related definitions together. But again, it's a very slight preference on my end if you happen to prefer the alternative. > The new v2 JAX-RS Bulk Update ClusterProp API requires providing a body that looks like {"properties":{"actualPropertyToBeUpdated":...}} because I didn't know how to map an unknown top-level value Hmm - I think you should be able to nuke `SetNestedClusterPropertyRequestBody` altogether, and replace it in the method signature with `Map`? e.g. ``` @PUT @Operation( summary = "Set nested cluster properties in this Solr cluster", tags = {"cluster-properties"}) SolrJerseyResponse createOrUpdateNestedClusterProperty( @RequestBody(description = "Property/ies to be set", required = true) Map propertyValuesByName) ``` Or does that break something or other that I've forgotten about? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[PR] EOL Solr 8 [solr-site]
HoustonPutman opened a new pull request, #131: URL: https://github.com/apache/solr-site/pull/131 Made a small news page, and changed the downloads to state 8.11 is EOL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Created] (SOLR-17516) LBHttpSolrClient: support HttpJdkSolrClient
James Dyer created SOLR-17516: - Summary: LBHttpSolrClient: support HttpJdkSolrClient Key: SOLR-17516 URL: https://issues.apache.org/jira/browse/SOLR-17516 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: SolrJ Reporter: James Dyer With SOLR-599 we added a new SolrJ client *HttpJdkSolrClient* which uses java.net.http.HttpClient internally. We can also support load balancing. This ticket is to factor out common functionality from the existing *LBHttp2SolrClient*, creating a new sibling class *LBHttpJdkSolrClient*. This is a prerequisite for having a version of *CloudSolrClient* that works with *HttpJdkSolrClient*. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-16390: v2 Cluster Property APIs. [solr]
gerlowskija commented on code in PR #2788: URL: https://github.com/apache/solr/pull/2788#discussion_r1817160658 ## solr/api/src/java/org/apache/solr/client/api/endpoint/SetClusterPropertyApi.java: ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.client.api.endpoint; + +import io.swagger.v3.oas.annotations.Operation; +import io.swagger.v3.oas.annotations.Parameter; +import io.swagger.v3.oas.annotations.parameters.RequestBody; +import jakarta.ws.rs.PUT; +import jakarta.ws.rs.Path; +import jakarta.ws.rs.PathParam; +import org.apache.solr.client.api.model.SetClusterPropertyRequestBody; +import org.apache.solr.client.api.model.SolrJerseyResponse; + +@Path("/cluster/properties/{propertyName}") +public interface SetClusterPropertyApi { + + @PUT + @Operation( + summary = "Set a cluster property in this Solr cluster", Review Comment: ```suggestion summary = "Set a single new or existing cluster property in this Solr cluster", ``` ## solr/api/src/java/org/apache/solr/client/api/endpoint/SetNestedClusterPropertyApi.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.client.api.endpoint; + +import io.swagger.v3.oas.annotations.Operation; +import io.swagger.v3.oas.annotations.parameters.RequestBody; +import jakarta.ws.rs.PUT; +import jakarta.ws.rs.Path; +import org.apache.solr.client.api.model.SetNestedClusterPropertyRequestBody; +import org.apache.solr.client.api.model.SolrJerseyResponse; + +@Path("/cluster/properties") +public interface SetNestedClusterPropertyApi { + + @PUT + @Operation( + summary = "Set nested cluster properties in this Solr cluster", + tags = {"cluster-properties"}) + SolrJerseyResponse createOrUpdateNestedClusterProperty( Review Comment: [Q] It's interesting that this API is both the only way to set "nested"/complex cluster properties, and the only way to set multiple properties simultaneously. I guess that's fine, since it mirrors what's supported in v1? I don't really have a question or suggestion here, mostly just making a note of it... ## solr/core/src/test/org/apache/solr/handler/admin/api/ClusterPropsAPITest.java: ## @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.handler.admin.api; + +import static org.apache.solr.common.util.Utils.getObjectByPath; + +import java.net.URL; +import java.util.List; +import org.apache.http.HttpResponse; +import org.apache.http.client.methods.HttpDelete; +import org.apache.http.client.methods.HttpGet; +import org.apache.http.client.methods.HttpPut; +