date:20240829

2024-08-29 Thread ASF subversion and git services (Jira)



epugh merged PR #2674:
URL: https://github.com/apache/solr/pull/2674


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17423) CLI: Resolve -h argument conflict (help/host)



[ 
https://issues.apache.org/jira/browse/SOLR-17423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877664#comment-17877664
 ] 

ASF subversion and git services commented on SOLR-17423:


Commit 5f920b17c59176b275708dfd2d04bdb837628648 in solr's branch 
refs/heads/main from Eric Pugh
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=5f920b17c59 ]

SOLR-17423: Remove -h as short option for --host, and use --host. (#2674)

Co-authored-by: Christos Malliaridis 

> CLI: Resolve -h argument conflict (help/host)
> -
>
> Key: SOLR-17423
> URL: https://issues.apache.org/jira/browse/SOLR-17423
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: cli, examples
>Affects Versions: main (10.0), 9.7
>Reporter: Christos Malliaridis
>Priority: Major
>  Labels: cli, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> At the moment RunExampleTool uses -h for providing the hostname. -h is 
> commonly known for printing the help / usage. To avoid any confusion for 
> beginners and improve the learnability of the CLI, we should deprecate -h in 
> RunExampleTool for future 9.X releases and remove it in 10.0 (two PRs 
> expected). -h should only be used for printing the help information.
> Note that -h is used via Option.builder("h") and in a raw string as "-h" 
> inside RunExampleTool, and deprecation would require both to be updated. 
> Please review for any documentation references as well.
> Once the conflict is resolved, the tracking table in [Solr Arguments - 
> Migration 
> Overview|https://docs.google.com/spreadsheets/d/1ws44kN51WnGwQzOXc8KKRQ93TMgHSqIGb02MRWFel_U/edit?usp=sharing]
>  can be updated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17423) CLI: Resolve -h argument conflict (help/host)

2024-08-29 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-17423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877670#comment-17877670
 ] 

ASF subversion and git services commented on SOLR-17423:


Commit f0398d56fbaed997640f14da97967e55b7e1d5c6 in solr's branch 
refs/heads/branch_9x from Eric Pugh
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=f0398d56fba ]

SOLR-17423: Remove -h as short option for --host, and use --host. (#2674)

Co-authored-by: Christos Malliaridis 
(cherry picked from commit 5f920b17c59176b275708dfd2d04bdb837628648)


> CLI: Resolve -h argument conflict (help/host)
> -
>
> Key: SOLR-17423
> URL: https://issues.apache.org/jira/browse/SOLR-17423
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: cli, examples
>Affects Versions: main (10.0), 9.7
>Reporter: Christos Malliaridis
>Priority: Major
>  Labels: cli, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> At the moment RunExampleTool uses -h for providing the hostname. -h is 
> commonly known for printing the help / usage. To avoid any confusion for 
> beginners and improve the learnability of the CLI, we should deprecate -h in 
> RunExampleTool for future 9.X releases and remove it in 10.0 (two PRs 
> expected). -h should only be used for printing the help information.
> Note that -h is used via Option.builder("h") and in a raw string as "-h" 
> inside RunExampleTool, and deprecation would require both to be updated. 
> Please review for any documentation references as well.
> Once the conflict is resolved, the tracking table in [Solr Arguments - 
> Migration 
> Overview|https://docs.google.com/spreadsheets/d/1ws44kN51WnGwQzOXc8KKRQ93TMgHSqIGb02MRWFel_U/edit?usp=sharing]
>  can be updated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Resolved] (SOLR-17423) CLI: Resolve -h argument conflict (help/host)

2024-08-29 Thread Eric Pugh (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-17423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Pugh resolved SOLR-17423.
--
Fix Version/s: 9.8
   Resolution: Fixed

This one was an easy backport with no deprecation issues!  So could commit to 
main and backport to branch_9x

> CLI: Resolve -h argument conflict (help/host)
> -
>
> Key: SOLR-17423
> URL: https://issues.apache.org/jira/browse/SOLR-17423
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: cli, examples
>Affects Versions: main (10.0), 9.7
>Reporter: Christos Malliaridis
>Priority: Major
>  Labels: cli, pull-request-available
> Fix For: 9.8
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> At the moment RunExampleTool uses -h for providing the hostname. -h is 
> commonly known for printing the help / usage. To avoid any confusion for 
> beginners and improve the learnability of the CLI, we should deprecate -h in 
> RunExampleTool for future 9.X releases and remove it in 10.0 (two PRs 
> expected). -h should only be used for printing the help information.
> Note that -h is used via Option.builder("h") and in a raw string as "-h" 
> inside RunExampleTool, and deprecation would require both to be updated. 
> Please review for any documentation references as well.
> Once the conflict is resolved, the tracking table in [Solr Arguments - 
> Migration 
> Overview|https://docs.google.com/spreadsheets/d/1ws44kN51WnGwQzOXc8KKRQ93TMgHSqIGb02MRWFel_U/edit?usp=sharing]
>  can be updated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] Review Solr Streaming code for typos [solr]



epugh merged PR #2657:
URL: https://github.com/apache/solr/pull/2657


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] SOLR-17158 Terminate distributed processing quickly when query limit is reached [solr]



gus-asf commented on code in PR #2666:
URL: https://github.com/apache/solr/pull/2666#discussion_r1736250661


##
solr/core/src/java/org/apache/solr/response/SolrQueryResponse.java:
##
@@ -139,30 +142,63 @@ public ReturnFields getReturnFields() {
 
   /**
* If {@link #getResponseHeader()} is available, set {@link 
#RESPONSE_HEADER_PARTIAL_RESULTS_KEY}
-   * flag to true.
+   * attribute to true or "omitted" as required.
*/
-  public void setPartialResults() {
+  public void setPartialResults(SolrQueryRequest req) {

Review Comment:
   There's not really a point to making it static, the object already exists in 
every location it's used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[PR] Impact of Large Stored fields blog post (#121) [solr-site]



alessandrobenedetti opened a new pull request, #122:
URL: https://github.com/apache/solr-site/pull/122

   * Create 
2024-08-08-impact-of-large-stored-fields-on-apache-solr-query-performance.md
   
   New blog post added


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] Impact of Large Stored fields blog post [solr-site]



alessandrobenedetti merged PR #121:
URL: https://github.com/apache/solr-site/pull/121


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] Impact of Large Stored fields blog post (#121) [solr-site]



alessandrobenedetti merged PR #122:
URL: https://github.com/apache/solr-site/pull/122


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Created] (SOLR-17427) ApiTool regressions after cli pattern change

Houston Putman created SOLR-17427:
-

 Summary: ApiTool regressions after cli pattern change
 Key: SOLR-17427
 URL: https://issues.apache.org/jira/browse/SOLR-17427
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: cli
Reporter: Houston Putman
Assignee: Houston Putman


After the CLI changes, the ApiTool cannot be used with urls that do not contain 
a port.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Created] (SOLR-17428) ApiTool regressions after cli pattern change

Houston Putman created SOLR-17428:
-

 Summary: ApiTool regressions after cli pattern change
 Key: SOLR-17428
 URL: https://issues.apache.org/jira/browse/SOLR-17428
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: cli
Reporter: Houston Putman
Assignee: Houston Putman


After the CLI changes, the ApiTool cannot be used with urls that do not contain 
a port.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Resolved] (SOLR-17428) ApiTool regressions after cli pattern change



 [ 
https://issues.apache.org/jira/browse/SOLR-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houston Putman resolved SOLR-17428.
---
Resolution: Duplicate

> ApiTool regressions after cli pattern change
> 
>
> Key: SOLR-17428
> URL: https://issues.apache.org/jira/browse/SOLR-17428
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: cli
>Reporter: Houston Putman
>Assignee: Houston Putman
>Priority: Major
>
> After the CLI changes, the ApiTool cannot be used with urls that do not 
> contain a port.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Updated] (SOLR-17427) ApiTool regressions after cli pattern change



 [ 
https://issues.apache.org/jira/browse/SOLR-17427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houston Putman updated SOLR-17427:
--
Description: After the CLI changes, the ApiTool cannot be used with urls 
that do not contain a port. This is because {{url.getAuthority()}} is used to 
build the host+port section of the URL, but it will return a "-1" if a port is 
not included in the URL. We should be very careful about using 
{{url.getAuthority()}} in our code.  (was: After the CLI changes, the ApiTool 
cannot be used with urls that do not contain a port.)

> ApiTool regressions after cli pattern change
> 
>
> Key: SOLR-17427
> URL: https://issues.apache.org/jira/browse/SOLR-17427
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: cli
>Reporter: Houston Putman
>Assignee: Houston Putman
>Priority: Major
>
> After the CLI changes, the ApiTool cannot be used with urls that do not 
> contain a port. This is because {{url.getAuthority()}} is used to build the 
> host+port section of the URL, but it will return a "-1" if a port is not 
> included in the URL. We should be very careful about using 
> {{url.getAuthority()}} in our code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] SOLR-17423: Remove -h as short option for --host, and use --host. [solr]



malliaridis commented on PR #2674:
URL: https://github.com/apache/solr/pull/2674#issuecomment-2318149468

   I just noticed, there is a line in `solr.cmd` in both `branch_9x` and `main` 
(different places), that does:
   
   ```cmd
   set SOLR_HOST=%~2
   set "PASS_TO_RUN_EXAMPLE=-h %~2 !PASS_TO_RUN_EXAMPLE!"
   ```
   
   These hidden arguments are very hard to find. We should definitely move more 
code to Java and have a single place to update everything in the future. #


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] SOLR-17423: Remove -h as short option for --host, and use --host. [solr]



epugh commented on PR #2674:
URL: https://github.com/apache/solr/pull/2674#issuecomment-2318154338

   you found it!   You know, earlier today, I took a spin through `solr.cmd` 
looking for this and did NOT find it...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Updated] (SOLR-17427) ApiTool regressions after cli pattern change

2024-08-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-17427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SOLR-17427:
--
Labels: pull-request-available  (was: )

> ApiTool regressions after cli pattern change
> 
>
> Key: SOLR-17427
> URL: https://issues.apache.org/jira/browse/SOLR-17427
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: cli
>Reporter: Houston Putman
>Assignee: Houston Putman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After the CLI changes, the ApiTool cannot be used with urls that do not 
> contain a port. This is because {{url.getAuthority()}} is used to build the 
> host+port section of the URL, but it will return a "-1" if a port is not 
> included in the URL. We should be very careful about using 
> {{url.getAuthority()}} in our code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] SOLR-14414: Introduce new UI (SIP-7) [solr]



gerlowskija commented on PR #2605:
URL: https://github.com/apache/solr/pull/2605#issuecomment-2318288713

   Curious where we stand on this POC?
   
   If I remember right, the main question that needs answered at this point is: 
"Do we think UI code in Kotlin is a better fit for our community of backend 
developers than JS or another 'traditional' frontend language?"
   
   I think I'm personally reassured on that front - even with this PR being 
massive I found it easier to understand than most JS I've spent time with.  But 
it's a pretty big decision to make without more voices chiming in. @epugh  
@janhoy or others - do you guys have any thoughts/opinions on the language 
choice aspect of this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] SOLR-17427: Fix url building in cli [solr]



epugh commented on code in PR #2678:
URL: https://github.com/apache/solr/pull/2678#discussion_r1736686728


##
solr/core/src/java/org/apache/solr/cli/PostTool.java:
##
@@ -379,37 +379,32 @@ private void doArgsMode(String[] args) {
   private void doWebMode() {
 reset();
 int numPagesPosted;

Review Comment:
   i love it when we can elminate a try and the nesting of code!  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] SOLR-17427: Fix url building in cli [solr]



epugh commented on code in PR #2678:
URL: https://github.com/apache/solr/pull/2678#discussion_r1736689373


##
solr/core/src/test/org/apache/solr/cli/PostToolTest.java:
##
@@ -211,30 +211,25 @@ public void testNormalizeUrlEnding() {
   }
 
   @Test
-  public void testComputeFullUrl() throws IOException {
-
-PostTool webPostTool = new PostTool();

Review Comment:
   Nice!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Created] (SOLR-17429) Find more permanent solution to Deprecated SolrCLI logging

Houston Putman created SOLR-17429:
-

 Summary: Find more permanent solution to Deprecated SolrCLI logging
 Key: SOLR-17429
 URL: https://issues.apache.org/jira/browse/SOLR-17429
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: cli
Affects Versions: 9.7
Reporter: Houston Putman
Assignee: Eric Pugh


Commons CLI now supports deprecated options, and when those options are used, 
they print a log to the user warning them. Unfortunately, by default, the 
logging is done to stdout. Tools that use the SolrCLI and parse the output will 
be broken by this change since they generally always read from stdout. If the 
logging was done to stderr, there would be no problem since the tools would not 
read the deprecation logs with the CLI output.

Either
 * Commons CLI should log deprecations to stderr by default
 * Commons CLI should make {{Option.toDeprecatedString()}} public, so that we 
can more easily make our own DeprecationHandler that merely prints the same 
thing to stderr. As it is, we have to copy the whole method ourselves since it 
is package private.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[PR] Use separate nodeProjectDir for each subproject [solr]

tylerbertrand opened a new pull request, #2680:
URL: https://github.com/apache/solr/pull/2680

# Description

Sometimes, `NpmTask`s are failing with being unable to find files/commands
under `npmProjectDir`. [Here's an example of such a
failure](https://ge.solutions-team.gradle.com/s/edxfitmmecv7i/failure#1). The
failing tasks always execute after the `nodeSetup` task for that project, so in
theory, everything should be in place.

However, in this case, multiple subprojects share the same `npmProjectDir`.
The way the `gradle-node-plugin` works, having multiple projects use the same
`npmProjectDir` doesn't save work, and can actually cause timing issues like
the failures we are seeing. This is because [each `nodeSetup` task clears out
the old directory before it unpacks it
again](https://github.com/node-gradle/gradle-node-plugin/blob/main/src/main/kotlin/com/github/gradle/node/task/NodeSetupTask.kt#L50-L85).
[In the Timeline view of the same Build Scan linked
above](https://ge.solutions-team.gradle.com/s/edxfitmmecv7i/timeline?details=wa5cqs7lpiyvq&end=1724359674712&outcome=failed&start=1724359674660),
we can see that the `:solr:solr-ref-guide:downloadAntoraCli` task (an
`NpmTask`), is executing at the same time as `:solr:packaging:nodeSetup`, and
`downloadAntoraCli` can't find `node` because it's been cleared out by another
suproject's `nodeSetup` task.

# Solution

This PR updates each subproject to use its own individual `nodeProjectDir`
to avoid these timing issues. As mentioned above, separating these out doesn't
result in any extra work being done, since sharing the same `nodeProjectDir`
doesn't save work in the first place.

# Tests

No new tests were added, and no existing tests were modified, but the
`check` and `integrationTests` tasks consistently pass when run locally with
these changes.

# Checklist

Please review the following and check all that apply:

- [x] I have reviewed the guidelines for [How to
Contribute](https://github.com/apache/solr/blob/main/CONTRIBUTING.md) and my
code conforms to the standards described there to the best of my ability.
- [ ] I have created a Jira issue and added the issue ID to my pull request
title.
- [x] I have given Solr maintainers
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
to contribute to my PR branch. (optional but recommended, not available for
branches on forks living under an organisation)
- [x] I have developed this patch against the `main` branch.
- [x] I have run `./gradlew check`.
- [ ] I have added tests for my changes.
- [ ] I have added documentation for the [Reference
Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[PR] SOLR-17149: Introduce ParallelHttpShardHandler [solr]

gerlowskija opened a new pull request, #2681:
URL: https://github.com/apache/solr/pull/2681

https://issues.apache.org/jira/browse/SOLR-17149

# Description

The default ShardHandler implementation, HttpShardHandler, sends all
shard-requests serially, only parallelizing the waiting and parsing of
responses. This works great for collections with few shards, but as the number
of shards increases the serialized sending of shard-requests adds a larger and
larger overhead. This is especially stark when auth is enabled, and PKI
header-generation happens at request-sending time.

# Solution

This commit fixes this by introducing an alternate ShardHandler
implementation, geared towards collections with many shards. This ShardHandler
uses an executor to parallelize both request sending and response
waiting/parsing. This consumes more CPU, but reduces greatly reduces the
latency/QTime observed by users querying many-shard collections.

(I have some really promising perf test results I'll share soon - see
SOLR-17149 for more discussion on that front.)

Remaining TODOs:
- tests for ParallelHttpShardHandler
- precommit/check
- Javadocs
- ref-guide docs for shard handler abstraction
- test randomization for http vs parallel SH

# Tests

Still TBD

# Checklist

Please review the following and check all that apply:

- [ ] I have reviewed the guidelines for [How to
Contribute](https://github.com/apache/solr/blob/main/CONTRIBUTING.md) and my
code conforms to the standards described there to the best of my ability.
- [ ] I have created a Jira issue and added the issue ID to my pull request
title.
- [ ] I have given Solr maintainers
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
to contribute to my PR branch. (optional but recommended, not available for
branches on forks living under an organisation)
- [ ] I have developed this patch against the `main` branch.
- [ ] I have run `./gradlew check`.
- [ ] I have added tests for my changes.
- [ ] I have added documentation for the [Reference
Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Updated] (SOLR-17149) Cannot backup/restore large collection.

2024-08-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-17149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SOLR-17149:
--
Labels: pull-request-available  (was: )

> Cannot backup/restore large collection.
> ---
>
> Key: SOLR-17149
> URL: https://issues.apache.org/jira/browse/SOLR-17149
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 9.4
>Reporter: Pierre Salagnac
>Assignee: Jason Gerlowski
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.5
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
>  
> There is a regression introduced in version 9.4 with SOLR-16879.
>  
> We cannot backup collections with more than 10 shards per node. The thread 
> pool is rejecting new tasks with following error:
> {noformat}
> SolrException: Could not backup all shards
> Task 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda...
>  rejected from 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor[Running, 
> pool size = 5, active threads = 5, queued tasks = 5, completed tasks = 30]"
> {noformat}
> {{expensiveExecutor}} thread pool executor is 
> [created|https://github.com/apache/solr/blob/releases/solr/9.4.1/solr/solrj/src/java/org/apache/solr/common/util/ExecutorUtil.java#L170-L174]
>  with 5 max threads and bounded queue of the same size (5), so the total 
> number of tasks is limited to 10 and all the other tasks are immediately 
> rejected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] SOLR-17158 Terminate distributed processing quickly when query limit is reached [solr]



dsmiley commented on code in PR #2666:
URL: https://github.com/apache/solr/pull/2666#discussion_r1736962583


##
solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java:
##
@@ -42,13 +45,17 @@
 import org.apache.solr.common.params.ShardParams;
 import org.apache.solr.common.params.SolrParams;
 import org.apache.solr.common.util.NamedList;
+import org.apache.solr.common.util.StrUtils;
 import org.apache.solr.core.CoreDescriptor;
 import org.apache.solr.request.SolrQueryRequest;
 import org.apache.solr.request.SolrRequestInfo;
 import org.apache.solr.security.AllowListUrlChecker;
 
 @NotThreadSafe
 public class HttpShardHandler extends ShardHandler {
+  /** */

Review Comment:
   either say something or don't :-)



##
solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java:
##
@@ -80,6 +94,39 @@ public HttpShardHandler(HttpShardHandlerFactory 
httpShardHandlerFactory) {
 shardToURLs = new HashMap<>();
   }
 
+  /**
+   * Parse the {@value ShardParams#SHARDS_TOLERANT} param from 
params as a boolean;
+   * accepts {@value ShardParams#REQUIRE_ZK_CONNECTED} as a valid value 
indicating false
+   * .
+   *
+   * By default, returns false when {@value 
ShardParams#SHARDS_TOLERANT} is not set
+   * in 
+   * params.
+   */
+  public static boolean getShardsTolerantAsBool(SolrQueryRequest req) {
+String shardsTolerantValue = 
req.getParams().get(ShardParams.SHARDS_TOLERANT);
+if (null == shardsTolerantValue
+|| 
shardsTolerantValue.trim().equals(ShardParams.REQUIRE_ZK_CONNECTED)) {

Review Comment:
   we don't normally trim our params.  I think it's very haphazard to do it 
on-read (like once we do it here and there, then everywhere we wonder, should 
we do here too?  a mess IMO)



##
solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java:
##
@@ -62,7 +69,14 @@ public class HttpShardHandler extends ShardHandler {
   private HttpShardHandlerFactory httpShardHandlerFactory;
   private Map> 
responseFutureMap;
   private BlockingQueue responses;
+
+  /**
+   * The number of pending requests. This must be incremented before a {@link 
ShardResponse} is
+   * added to {@link #responses}, and decremented after a ShardResponse is 
removed from {@code
+   * responses}. We cannot rely on responses.size() bec

Review Comment:
   missing explanation



##
solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java:
##
@@ -80,6 +94,39 @@ public HttpShardHandler(HttpShardHandlerFactory 
httpShardHandlerFactory) {
 shardToURLs = new HashMap<>();
   }
 
+  /**
+   * Parse the {@value ShardParams#SHARDS_TOLERANT} param from 
params as a boolean;
+   * accepts {@value ShardParams#REQUIRE_ZK_CONNECTED} as a valid value 
indicating false
+   * .
+   *
+   * By default, returns false when {@value 
ShardParams#SHARDS_TOLERANT} is not set
+   * in 
+   * params.
+   */
+  public static boolean getShardsTolerantAsBool(SolrQueryRequest req) {
+String shardsTolerantValue = 
req.getParams().get(ShardParams.SHARDS_TOLERANT);
+if (null == shardsTolerantValue
+|| 
shardsTolerantValue.trim().equals(ShardParams.REQUIRE_ZK_CONNECTED)) {
+  return false;
+} else {
+  boolean tolerant = StrUtils.parseBool(shardsTolerantValue.trim());
+  if (tolerant && shouldDiscardPartials(req.getParams())) {

Review Comment:
   this word "discard" is new here (I think) and I find it confusing.  Maybe 
`shouldErrorInsteadOfPartialResults`
   
   Or flip the boolean -- allowPartialResults



##
solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java:
##
@@ -191,7 +244,7 @@ protected QueryRequest makeQueryRequest(
   }
 
   /** Subclasses could modify the Response based on the shard */
-  protected ShardResponse transfomResponse(
+  protected ShardResponse transformResponse(

Review Comment:
   I merged a PR that corrects that this wasn't being called above -- 
391cdd04277b83a010f1d33288f9bc344a7ba2d0



##
solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java:
##
@@ -139,48 +186,54 @@ public void submit(
 srsp.setShard(shard);
 SimpleSolrResponse ssr = new SimpleSolrResponse();
 srsp.setSolrResponse(ssr);
+synchronized (RESPONSE_CANCELABLE_LOCK) {
+  pending.incrementAndGet();
+  // if there are no shards available for a slice, urls.size()==0
+  if (urls.isEmpty()) {
+// TODO: what's the right error code here? We should use the same 
thing when
+// all of the servers for a shard are down.
+SolrException exception =
+new SolrException(
+SolrException.ErrorCode.SERVICE_UNAVAILABLE, "no servers 
hosting shard: " + shard);
+srsp.setException(exception);
+srsp.setResponseCode(exception.code());
+responses.add(srsp);
+return;
+  }
 
-pending.incrementAndGet();
-// if there are no shards

Re: [PR] SOLR-17149: Introduce ParallelHttpShardHandler [solr]



HoustonPutman commented on code in PR #2681:
URL: https://github.com/apache/solr/pull/2681#discussion_r1737099868


##
solr/server/solr/configsets/_default/conf/solrconfig.xml:
##
@@ -656,6 +656,10 @@
 
   
   
+

Review Comment:
   I didn't even know this was an option in the solrconfig.xml! Should we make 
this an option in the solr.xml instead, since that is where it is configured 
currently?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] SOLR-17158 Terminate distributed processing quickly when query limit is reached [solr]



fsparv commented on code in PR #2666:
URL: https://github.com/apache/solr/pull/2666#discussion_r1737216013


##
solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java:
##
@@ -80,6 +94,39 @@ public HttpShardHandler(HttpShardHandlerFactory 
httpShardHandlerFactory) {
 shardToURLs = new HashMap<>();
   }
 
+  /**
+   * Parse the {@value ShardParams#SHARDS_TOLERANT} param from 
params as a boolean;
+   * accepts {@value ShardParams#REQUIRE_ZK_CONNECTED} as a valid value 
indicating false
+   * .
+   *
+   * By default, returns false when {@value 
ShardParams#SHARDS_TOLERANT} is not set
+   * in 
+   * params.
+   */
+  public static boolean getShardsTolerantAsBool(SolrQueryRequest req) {
+String shardsTolerantValue = 
req.getParams().get(ShardParams.SHARDS_TOLERANT);
+if (null == shardsTolerantValue
+|| 
shardsTolerantValue.trim().equals(ShardParams.REQUIRE_ZK_CONNECTED)) {

Review Comment:
   https://issues.apache.org/jira/browse/SOLR-6572



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Updated] (SOLR-17416) Streaming Expressions: Exception swallowed and not propagated back to the client leading to inconsistent results



 [ 
https://issues.apache.org/jira/browse/SOLR-17416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-17416:
--
Attachment: SOLR-17416.patch
Status: Open  (was: Open)

> Streaming Expressions:  Exception swallowed and not propagated back to the 
> client leading to inconsistent results
> -
>
> Key: SOLR-17416
> URL: https://issues.apache.org/jira/browse/SOLR-17416
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Lamine
>Priority: Major
> Attachments: SOLR-17416.patch
>
>
> There appears to be a bug in the _ExportWriter/ExportBuffers_ implementation 
> within the Streaming Expressions plugin. Specifically, when an 
> InterruptedException occurs due to an ExportBuffers timeout, the exception is 
> swallowed and not propagated back to the client (still logged on the server 
> side though).
> As a result, the client receives an EOF marker, thinking that it has received 
> the full set of results, when in fact it has only received partial results. 
> This leads to inconsistent search results, as the client is unaware that the 
> export process was interrupted and terminated prematurely.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17416) Streaming Expressions: Exception swallowed and not propagated back to the client leading to inconsistent results



[ 
https://issues.apache.org/jira/browse/SOLR-17416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877915#comment-17877915
 ] 

Chris M. Hostetter commented on SOLR-17416:
---

I've seen this in the wild and after digging into it a bit can elaborate on the 
circumstances...
 * {{ExportBuffers.run(Callable)}} is designed around the use of two threads:
 ** a background thread that acts as a "filler" to read data from disk and 
populate a {{fillBuffer}}
 ** the caller thread, which serially invokes the "writer" logic passed in as 
the {{Callable}} argument, to serialize an {{outputBuffer}}
 * These two threads use {{exchangeBuffers()}} to swap with each other, which 
under the covers uses a {{CylicBarrier}}
 ** The {{CyclicBarrier.await(time)}} calls use a hard coded timeout of 600 
seconds
 *** This seems to have been chosen as an arbitrary "big value" to ensure that 
the neither thread sat around waiting forever (consuming ram) if the other 
thread died.

The problem however is that this 600 second timeout may not be enough to 
account for really slow downstream consumption of the data. With really large 
collections, and really complicated streaming expressions, this can happen even 
when well behaved clients that are actively trying to consume data.

One particular example I've seen is a stream where a {{leftOuterJoin}} is 
wrapped around two very large collections, spread over a large number of 
shards
{noformat}
leftOuterJoin(search(biggest_left_collection,q="*:*",sort="join_field 
asc",fl="...",qt="/export"),
  search(big_right_collection,q="*:*",sort="join_field 
asc",fl="...",qt="/export"),
  on="join_field")
{noformat}
...After an initial burst of fetching the an initial {{batchSize}} of Tuples 
from both streams, and slurping up all the results from both streams that have 
the same {{join_field}} value as the head of the left Tuple, each {{read()}} 
call from the downstream client causes Tuples from the merged set to be 
consumed – but neither of the upstream "left" or "right" streams reads any new 
data for a bit. And even once additional values are read from the "left" 
stream, the "right" stream may sit stagnant for even longer as the node running 
the join code keeps consuming (and passing downstream) Tuples from the left 
stream until it encounters the same {{field_field}} value from the head of 
right stream and _then_ starts fetching more data from the individual replicas 
of that right stream collection.

Which can take a while – and in the meantime the "filler" thread on one (or 
more) of the replicas of {{big_right_collection}} may have encountered a 
{{TimeoutException}} from the {{barrier.await(time)}}

In _theory_ the "filler" code in {{ExportBuffers}} reports the 
{{TimeoutException}}  in this situation by by passing the exception to the 
following method before the filler thread ends...
{code:java}
  public void error(Throwable t) {
error = t;
// break the lock on the other thread too
barrier.reset();
  }
{code}
...expecting the "writer" code (from the caller thread) to check the value of 
{{getError()}} and report it.

But there are two main problems with this approach to passing the exception 
along:

1. The common case "writer" logic in {{ExportWriter}} (which calls 
{{ExportBuffers.run()}} *_NEVER_* checks {{ExportBuffers.getError()}} – it is 
only ever checked by (the special case logic in) {{ExportWriterStream}} and 
even there it's only check in the event of a {{BrokenBarrierException}}
2. The logic inside {{ExportBuffers.run()}} itself only logs (and ignores) any 
type of Throwable that propagates up to it.

A very tightly related third problem is the call to {{barrier.reset()}} :
 * AFAICT The usage of this method call seems to be a complete mistake / 
misunderstanding
 ** It's comment & usage here in the {{error()}} method (ie: after one of the 
two parties gets either a {{TimeoutException}} or {{BrokenBarrierException}} ) 
seems to be with the intent of ensuring that _if_ the other thread is currently 
{{await(time)}} -ing on the barrier, that thread should get a 
{{BrokenBarrierException}} – but that's not what this method is designed for.
 ** The way we are using the {{CyclicBarrier}} in this code, the situation 
described in the comment should never even be possible.
 * In general, anytime any call to {{CyclicBarrier.await()}} throws an 
exception the barrier is left in a "broken" state, and any other calls to 
{{CyclicBarrier.await()}} ({_}either currently waiting, or in the future{_}) 
will get a {{BrokenBarrierException}}
 ** The call to {{barrier.reset()}} here causes that "broken" state to be 
reset, *_preventing_* any future calls to {{CyclicBarrier.await()}} (from the 
thread that didn't already encounter an error) to throw 
{{BrokenBarrierException}} – instead that thread waits the full timeout amount 
(for a thread that has already err

[jira] [Updated] (SOLR-17416) Streaming Expressions: Exception swallowed and not propagated back to the client leading to inconsistent results



 [ 
https://issues.apache.org/jira/browse/SOLR-17416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-17416:
--
Component/s: Export Writer

> Streaming Expressions:  Exception swallowed and not propagated back to the 
> client leading to inconsistent results
> -
>
> Key: SOLR-17416
> URL: https://issues.apache.org/jira/browse/SOLR-17416
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Export Writer, streaming expressions
>Reporter: Lamine
>Priority: Major
> Attachments: SOLR-17416.patch
>
>
> There appears to be a bug in the _ExportWriter/ExportBuffers_ implementation 
> within the Streaming Expressions plugin. Specifically, when an 
> InterruptedException occurs due to an ExportBuffers timeout, the exception is 
> swallowed and not propagated back to the client (still logged on the server 
> side though).
> As a result, the client receives an EOF marker, thinking that it has received 
> the full set of results, when in fact it has only received partial results. 
> This leads to inconsistent search results, as the client is unaware that the 
> export process was interrupted and terminated prematurely.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Created] (SOLR-17430) Redesign ExportWriter / ExportBuffers to work better with large batchSizes and slow consumption

Chris M. Hostetter created SOLR-17430:
-

 Summary: Redesign ExportWriter / ExportBuffers to work better with 
large batchSizes and slow consumption
 Key: SOLR-17430
 URL: https://issues.apache.org/jira/browse/SOLR-17430
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Chris M. Hostetter


As mentioned in SOLR-17416, the design of the {{ExportBuffers}} class used by 
the {{ExportHandler}} is brittle and the absolutely time limit on how long the 
buffer swapping threads will wait for eachother isn't suitable for very long 
running streaming expressions...
{quote}The problem however is that this 600 second timeout may not be enough to 
account for really slow downstream consumption of the data.  With really large 
collections, and really complicated streaming expressions, this can happen even 
when well behaved clients that are actively trying to consume data.
{quote}
...but another sub-optimal aspect of this buffer swapping design is that the 
"writer" thread is initially completely blocked, and can't write out a single 
document, until the "filler" thread has read the full {{batchSize}} of 
documents into it's buffer and opted to swap.  Likewise, after buffer swapping 
has occured at least once, any document in the {{outputBuffer}} that the writer 
has already processed hangs around, taking up ram, until the next swap, while 
one of the threads is idle.  If {{{}batchSize=3{}}}, and the "filler" 
thread is ready to go with a full {{fillBuffer}} while the "writer" has only 
been able to emit 2 of the documents in it's {{outputBuffer}} documents 
before being blocked and forced to wait (due to the downstream consumer of the 
output bytes) before it can emit the last document in it's batch – that means 
both the "writer" thread and the "filler" thread are stalled, taking up 2x the 
batchSize of ram, even though half of that is data that is no longer needed.

The bigger the {{batchSize}} the worse the initial delay (and steady state 
wasted RAM) is.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17430) Redesign ExportWriter / ExportBuffers to work better with large batchSizes and slow consumption



[ 
https://issues.apache.org/jira/browse/SOLR-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877918#comment-17877918
 ] 

Chris M. Hostetter commented on SOLR-17430:
---

 

Without getting too bogged down into the details, I'd like to propose a high 
level strawman replacement for the current {{ExportBuffers}} logic:
 * Eliminate the double buffers and use of CyclicBarrier for swapping
 ** Replace them with a simple consumer->BlockingQueue->producer model
 * The (filler) producer should:
 ** Be implemented as a {{Callable}} (that can throw exceptions) 
 ** "put" items into the queue – ie: block forever, or until interrupted, if 
the queue is full
 *** NOTE: It may still make sense from an "index reading efficiency" 
standpoint for it to read large blocks of documents at a time into it's own 
buffer
 ** On any type of error (including any InterruptedException from trying to 
"put" to the queue) it should throw it's exception
 * The "writer" (request thread) consumer should:
 ** Hold a {{Future}} object backed by the "producer"
 ** Repeatedly "poll" from the queue in a loop (w/a short time limit)
 *** If "poll" returns null: break out of the loop if {{true == 
Future.isDone()}} 
 ** regardless of how we exit our loop, a {{finally}} block(s) should ensure:
 *** {{Future.get()}} is called (so any Exceptions from the producer can be 
propagated up)
 *** {{Future.cancel(true)}} is called (to interrupt the producer if the 
consumer is failing for it's own reasons before the producer is done)
 

> Redesign ExportWriter / ExportBuffers to work better with large batchSizes 
> and slow consumption
> ---
>
> Key: SOLR-17430
> URL: https://issues.apache.org/jira/browse/SOLR-17430
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> As mentioned in SOLR-17416, the design of the {{ExportBuffers}} class used by 
> the {{ExportHandler}} is brittle and the absolutely time limit on how long 
> the buffer swapping threads will wait for eachother isn't suitable for very 
> long running streaming expressions...
> {quote}The problem however is that this 600 second timeout may not be enough 
> to account for really slow downstream consumption of the data.  With really 
> large collections, and really complicated streaming expressions, this can 
> happen even when well behaved clients that are actively trying to consume 
> data.
> {quote}
> ...but another sub-optimal aspect of this buffer swapping design is that the 
> "writer" thread is initially completely blocked, and can't write out a single 
> document, until the "filler" thread has read the full {{batchSize}} of 
> documents into it's buffer and opted to swap.  Likewise, after buffer 
> swapping has occured at least once, any document in the {{outputBuffer}} that 
> the writer has already processed hangs around, taking up ram, until the next 
> swap, while one of the threads is idle.  If {{{}batchSize=3{}}}, and the 
> "filler" thread is ready to go with a full {{fillBuffer}} while the "writer" 
> has only been able to emit 2 of the documents in it's {{outputBuffer}} 
> documents before being blocked and forced to wait (due to the downstream 
> consumer of the output bytes) before it can emit the last document in it's 
> batch – that means both the "writer" thread and the "filler" thread are 
> stalled, taking up 2x the batchSize of ram, even though half of that is data 
> that is no longer needed.
> The bigger the {{batchSize}} the worse the initial delay (and steady state 
> wasted RAM) is.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17416) Streaming Expressions: Exception swallowed and not propagated back to the client leading to inconsistent results



[ 
https://issues.apache.org/jira/browse/SOLR-17416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877919#comment-17877919
 ] 

Chris M. Hostetter commented on SOLR-17416:
---

 
{quote}In general, I think the design of ExportBuffers (and use of a 
CyclicBarrier.await(time) to exchange buffers between two threads) is more 
complicated and error prone then it needs to be, and should to be re-considered 
in order to ensure that these kind of "slow consumption" situations can't 
result the "filler" thread giving up after an arbitrary time limit even though 
the "writer" thread is still around and the client is still consuming results. 
I will open a separate jira to discuss options for redesigning this.
{quote}
https://issues.apache.org/jira/browse/SOLR-17430

> Streaming Expressions:  Exception swallowed and not propagated back to the 
> client leading to inconsistent results
> -
>
> Key: SOLR-17416
> URL: https://issues.apache.org/jira/browse/SOLR-17416
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Export Writer, streaming expressions
>Reporter: Lamine
>Priority: Major
> Attachments: SOLR-17416.patch
>
>
> There appears to be a bug in the _ExportWriter/ExportBuffers_ implementation 
> within the Streaming Expressions plugin. Specifically, when an 
> InterruptedException occurs due to an ExportBuffers timeout, the exception is 
> swallowed and not propagated back to the client (still logged on the server 
> side though).
> As a result, the client receives an EOF marker, thinking that it has received 
> the full set of results, when in fact it has only received partial results. 
> This leads to inconsistent search results, as the client is unaware that the 
> export process was interrupted and terminated prematurely.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] SOLR-17149: Introduce ParallelHttpShardHandler [solr]



gerlowskija commented on code in PR #2681:
URL: https://github.com/apache/solr/pull/2681#discussion_r1737662158


##
solr/server/solr/configsets/_default/conf/solrconfig.xml:
##
@@ -656,6 +656,10 @@
 
   
   
+

Review Comment:
   Yep - kindof handy for overriding the solr.xml definition on a 
per-collection basis.
   
   But I agree - no need to have this in our default configset (even as a 
commented out example).  Will move it over to solr.xml.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] SOLR-17149: Introduce ParallelHttpShardHandler [solr]



gerlowskija commented on code in PR #2681:
URL: https://github.com/apache/solr/pull/2681#discussion_r1737662158


##
solr/server/solr/configsets/_default/conf/solrconfig.xml:
##
@@ -656,6 +656,10 @@
 
   
   
+

Review Comment:
   Yep - kindof handy for overriding the solr.xml definition on a 
per-collection basis.  The docs on ShardHandler overall could really use some 
beefing up - gonna try to squeeze that into this PR if I can.
   
   Anyways, I agree - no need to have this in our default configset (even as a 
commented out example).  Will move it over to solr.xml.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] Bump up Java version to 21 [solr]