[jira] [Commented] (SOLR-17219) Exceptions occur while Solr reads some core's configset (java.io.IOException: Error opening /configs/)

2024-06-25 Thread Guillaume Jactat (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859867#comment-17859867
 ] 

Guillaume Jactat commented on SOLR-17219:
-

Also experiencing this issue with Solr 9.6.0.
{*}Solr 9.6.1 seems to be fine though{*}. I didn't managed to produce the same 
error with this version.

> Exceptions occur while Solr reads some core's configset (java.io.IOException: 
> Error opening /configs/)
> 
>
> Key: SOLR-17219
> URL: https://issues.apache.org/jira/browse/SOLR-17219
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 9.5
> Environment: My first attempts were on Windows via services hosted 
> through procrun (both zookeeper and solr nodes). I also tried with a Docker 
> Dekstop ensemble.
> It seems that this error occurs less frequently via Docker. But it happens 
> anyway.
>Reporter: Guillaume Jactat
>Priority: Major
> Attachments: stack.txt
>
>
> Hello,
> I'm currently testing SolrCloud to get a better idea of how to recover from 
> node failures.
> I have a simple configuration: one ZooKeeper server and 3 *Solr 9.5* nodes.
> I upload a configset in Zookeeper via Solr's Configsets API. I create 200 
> collections, all bound to the same configset.
> I leave the collections empty for the moment.
> When I stop/start one node, the process of recovery happens. And almost 
> everytime, i get the following error (full stack is attached to this issue):
> java.io.IOException: Error opening 
> /configs/CoreModel–CB38FE6CFE/lang/stopwords_fi.txt
> Its not always the same configset's file. Sometimes, everything goes fine. 
> But when this error occurs, the whole process of recovery seem compromised, 
> leaving a lot of cores/collections "down". No "retry" happens, maybe because 
> Solr assumes that the configset is wrong and no retry could fix it ?
> I've tried the same setup on Windows Service (procrun) and Docker Desktop 
> containers. It seems that this error occurs less frequently with docker but 
> it happens anyway.
> I didn't find anything close to this error on the web... I have no clue why 
> this error happens.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17269) Don't publish synthetic collection to ZK in solr coordinator nodes

2024-06-25 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859899#comment-17859899
 ] 

David Smiley commented on SOLR-17269:
-

Pushed to 9x.  
https://github.com/apache/solr/commit/ea860477922c2fe7d25274ae7d66de11a0dc19cd
Dunno why there wasn't a bot comment when I pushed
FYI [~epugh], you had erroneously added the CHANGES.txt item on branch_9x under 
the backport of an unrelated issue.  An example of hour CHANGES.txt management 
is annoying and error-prone.

> Don't publish synthetic collection to ZK in solr coordinator nodes
> --
>
> Key: SOLR-17269
> URL: https://issues.apache.org/jira/browse/SOLR-17269
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 9.6
>Reporter: Patson Luk
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Coordinator node uses a "synthetic collection/core" (Prefixed with 
> `.sys.COORDINATOR-COLL-`) to proxy queries to other data node.
> However, such collection was actually registered to the ZK as well (like a 
> normal collection). This could be problematic at times when we have other 
> tools that read the cluster status and are mistaken that as a real collection.
> The proposal here is the introduction of a new `SyntheticSolrCore` which 
> subclasses `SolrCore`, such core bypasses ZK publish steps of regular core 
> (and avoid creating core.properties etc) hence preventing other tools from 
> detecting such synthetic collection. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Resolved] (SOLR-17269) Don't publish synthetic collection to ZK in solr coordinator nodes

2024-06-25 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-17269.
-
Fix Version/s: 9.7
   Resolution: Fixed

> Don't publish synthetic collection to ZK in solr coordinator nodes
> --
>
> Key: SOLR-17269
> URL: https://issues.apache.org/jira/browse/SOLR-17269
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 9.6
>Reporter: Patson Luk
>Priority: Major
> Fix For: 9.7
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Coordinator node uses a "synthetic collection/core" (Prefixed with 
> `.sys.COORDINATOR-COLL-`) to proxy queries to other data node.
> However, such collection was actually registered to the ZK as well (like a 
> normal collection). This could be problematic at times when we have other 
> tools that read the cluster status and are mistaken that as a real collection.
> The proposal here is the introduction of a new `SyntheticSolrCore` which 
> subclasses `SolrCore`, such core bypasses ZK publish steps of regular core 
> (and avoid creating core.properties etc) hence preventing other tools from 
> detecting such synthetic collection. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17346) Synchronise default configset stopwords to the same list as lucene

2024-06-25 Thread Alastair Porter (Jira)
Alastair Porter created SOLR-17346:
--

 Summary: Synchronise default configset stopwords to the same list 
as lucene
 Key: SOLR-17346
 URL: https://issues.apache.org/jira/browse/SOLR-17346
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Alastair Porter


Solr's default configset comes with a collection of sample stopwords from the 
snowball project in solr/server/solr/configsets/_default/conf/lang 
(https://github.com/apache/solr/tree/a42c605fb916439222a086356f368f02cf80304a/solr/server/solr/configsets/_default/conf/lang)

There is a similar list of stopwords in the lucene repository, however these 
have been updated to a more recent list of snowball 
([https://github.com/apache/lucene/tree/main/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball)]

Specifically, the most recent list of stopwords for the french language has 
removed a number of words which are homonyms of other useful words which 
shouldn't be skipped.

In a discussion on the solr-users mailing list it was agreed that it would be a 
good idea to sync the list of files in solr with the ones in lucene.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17346) Synchronise default configset stopwords to the same list as lucene

2024-06-25 Thread Alastair Porter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859906#comment-17859906
 ] 

Alastair Porter commented on SOLR-17346:


It appears that I cannot assign this task to myself, but I have already opened 
a PR at https://github.com/apache/solr/pull/2533

> Synchronise default configset stopwords to the same list as lucene
> --
>
> Key: SOLR-17346
> URL: https://issues.apache.org/jira/browse/SOLR-17346
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Alastair Porter
>Priority: Trivial
>
> Solr's default configset comes with a collection of sample stopwords from the 
> snowball project in solr/server/solr/configsets/_default/conf/lang 
> (https://github.com/apache/solr/tree/a42c605fb916439222a086356f368f02cf80304a/solr/server/solr/configsets/_default/conf/lang)
> There is a similar list of stopwords in the lucene repository, however these 
> have been updated to a more recent list of snowball 
> ([https://github.com/apache/lucene/tree/main/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball)]
> Specifically, the most recent list of stopwords for the french language has 
> removed a number of words which are homonyms of other useful words which 
> shouldn't be skipped.
> In a discussion on the solr-users mailing list it was agreed that it would be 
> a good idea to sync the list of files in solr with the ones in lucene.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17347) Remove the env from EnvUtils

2024-06-25 Thread David Smiley (Jira)
David Smiley created SOLR-17347:
---

 Summary: Remove the env from EnvUtils
 Key: SOLR-17347
 URL: https://issues.apache.org/jira/browse/SOLR-17347
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: David Smiley
Assignee: David Smiley


It's been problematic for EnvUtils to have "env" data and env accessors.  The 
mapping of env vars to sys props is fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-15960) Unified use of system properties and environment variables

2024-06-25 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859922#comment-17859922
 ] 

David Smiley commented on SOLR-15960:
-

Follow up in SOLR-17347

> Unified use of system properties and environment variables
> --
>
> Key: SOLR-15960
> URL: https://issues.apache.org/jira/browse/SOLR-15960
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 9.5
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> We have a lot of boiler-plate code in Solr related to resolving configuration 
> from config-files, system properties and environment variables.
> The main pattern so far has been to load a config from an xml file, which 
> uses system property variables like {{{}$\{myVar{. All the environment 
> variables that we expose in {{solr.in.sh}} are converted to system.properties 
> in {{bin/solr}} and inside of Solr we only care about sys.props. This has 
> served us quite well, but is also has certain disadvantages:
>  * Naming mismatches. You have one config name in the xml file, one as system 
> property and yet another for environment variable.
>  * Duplicate code to deal with type conversion, and converting comma 
> separated lists from env.var into Java lists
>  * Every new config option needs to touch {{{}bin/solr{}}}, {{bin/solr.cmd}} 
> and often more.
> In the world of containers and k8s, we want to configure almost every aspect 
> of an app using environment variables. It is sometimes also more secure than 
> passing sys.props on the cmdline since they won't show up in a "ps".
> So this is a proposal to unify all Solr's configs in a more structured way
>  * Make naming a convention. All env.variable should be uppercase with format 
> {{SOLR_X_Y}} and all sys.propos should be lowercase with the format 
> {{solr.x.y}}. Perhaps {{solr.camelCase}} should map to {{SOLR_CAMEL_CASE}}, 
> or we discourage camel case in favour of dots.
>  * Add a central {{ConfigResolver}} class to Solr that can answer e.g. 
> {{getInt("solr.my.number")}} and it would return either prop 
> {{solr.my.number}} or {{SOLR_MY_NUMBER}}. Similar for String, bool etc, and 
> with fallback-values
>  * List support, e.g. {{getListOfStrings("solr.modules")}} and it would 
> return a {{List}} from either {{solr.modules}} or {{SOLR_MODULES}}, 
> supporting comma-separated, custom separator and why not also json list 
> format ["foo","bar"]?
> A pitfall of using environment variables directly is testing, since env.vars 
> are immutable. I suggest we solve this by reading all {{SOLR_*}} 
> env.variables on startup and inserting them into a static, mutable map 
> somewhere which is the single source of truth for env.vars. Then we can ban 
> the use of {{System.getenv()}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17347) Remove the env from EnvUtils

2024-06-25 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-17347:

Description: It's been problematic for EnvUtils to have "env" data and env 
accessors. The mapping of env vars to sys props is fine. See the follow-up 
comments 
[here|https://issues.apache.org/jira/browse/SOLR-15960?focusedCommentId=17856840&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17856840].
  (was: It's been problematic for EnvUtils to have "env" data and env 
accessors.  The mapping of env vars to sys props is fine.)

> Remove the env from EnvUtils
> 
>
> Key: SOLR-17347
> URL: https://issues.apache.org/jira/browse/SOLR-17347
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>
> It's been problematic for EnvUtils to have "env" data and env accessors. The 
> mapping of env vars to sys props is fine. See the follow-up comments 
> [here|https://issues.apache.org/jira/browse/SOLR-15960?focusedCommentId=17856840&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17856840].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17345) Should bin/solr allow short/long opts consistently?

2024-06-25 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859933#comment-17859933
 ] 

Gus Heck commented on SOLR-17345:
-

+1 for consistency, short opts that conflict can just be assigned an alternate 
letter. I'm a fan of the doc-opt systems where the documentation is the 
specification. Looks like they do have a bash version... 
https://github.com/docopt/docopts

> Should bin/solr allow short/long opts consistently?
> ---
>
> Key: SOLR-17345
> URL: https://issues.apache.org/jira/browse/SOLR-17345
> Project: Solr
>  Issue Type: Sub-task
>  Components: scripts and tools
>Reporter: Jason Gerlowski
>Priority: Trivial
>
> Some "bin/solr" parameters accept only single-character short-opts (e.g. 
> "-c").  Others only accept long-opts (e.g. "--collection").  While a third 
> set of parameters accepts both short and long options.
> Sometimes this may be intentional, for instance when multiple parameters 
> start with the same letter and would conflict if both had a short-opt.  But 
> in other cases it appears coincidental.
> It's very low priority, but we may want to resolve this inconsistency at some 
> point as much as possible and offer both short and long opts (at least where 
> possible).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17348) Mitigate extreme parallelism of zkCallback executor

2024-06-25 Thread Michael Gibney (Jira)
Michael Gibney created SOLR-17348:
-

 Summary: Mitigate extreme parallelism of zkCallback executor
 Key: SOLR-17348
 URL: https://issues.apache.org/jira/browse/SOLR-17348
 Project: Solr
  Issue Type: Improvement
Reporter: Michael Gibney


zkCallback executor is [currently an unbounded thread pool of core size 
0|https://github.com/apache/solr/blob/709a1ee27df23b419d09fe8f67c3276409131a4a/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/SolrZkClient.java#L91-L92],
 using a SynchronousQueue. Thus, a flood of zkCallback events (as might be 
triggered by a cluster restart, e.g.) can result in spinning up a very large 
number of threads. In practice we have encountered as many as 35k threads 
created in some such cases, even after the impact of this situation was reduced 
by the fix for SOLR-11535.

Inspired by [~cpoerschke]'s recent [closer look at thread pool 
behavior|https://issues.apache.org/jira/browse/SOLR-13350?focusedCommentId=17853178#comment-17853178],
 I wondered if we might be able to employ a bounded queue to alleviate some of 
the pressure from bursty zk callbacks.

The new config might look something like: {{corePoolSize=1024, 
maximumPoolSize=Integer.MAX_VALUE, allowCoreThreadTimeout=true, workQueue=new 
LinkedBlockingQueue<>(1024)}}. This would allow the pool to grow up to (and 
shrink from) corePoolSize in the same manner it currently does, but once 
exceeding corePoolSize (e.g. during a cluster restart or other callback flood 
event), tasks would be queued (up to some fixed limit). If the queue limit is 
exceeded, new threads would still be created, but we would have avoided the 
current “always create a thread” behavior, and by so doing hopefully reduce 
task execution time and improve overall throughput.

>From the ThreadPoolExecutor javadocs:

{quote}Direct handoffs. A good default choice for a work queue is a 
SynchronousQueue that hands off tasks to threads without otherwise holding 
them. Here, an attempt to queue a task will fail if no threads are immediately 
available to run it, so a new thread will be constructed. This policy avoids 
lockups when handling sets of requests that might have internal dependencies. 
Direct handoffs generally require unbounded maximumPoolSizes to avoid rejection 
of new submitted tasks. This in turn admits the possibility of unbounded thread 
growth when commands continue to arrive on average faster than they can be 
processed.{quote}

So afaict SynchronousQueue mainly makes sense if there exists the possibility 
of deadlock due to dependencies among tasks, and I think this should ideally 
_not_ be the case with zk callbacks (though in practice I'm not sure this is 
the case).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17348) Mitigate extreme parallelism of zkCallback executor

2024-06-25 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860019#comment-17860019
 ] 

Michael Gibney commented on SOLR-17348:
---

In order to naively check for possible deadlocks if the parallelism of 
zkCallback executor were modified, I changed it to 
{{ExecutorUtil.newMDCAwareSingleThreadExecutor()}}. Most of the tests were fine 
(I did not run nightlies); there were only two tests that failed, one of which 
({{ZkSolrClientTest.testMultipleWatchesAsync}}) is not really worth mentioning 
because iiuc it explicitly tests for parallel callback execution, so it should 
be expected to fail.

The concerning failure can reproduced via:
{code:sh}
./gradlew :solr:core:test --tests 
"org.apache.solr.cloud.TestCloudConsistency.testOutOfSyncReplicasCannotBecomeLeader"
 -Ptests.jvms=5 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC 
-XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m" 
-Ptests.seed=56FF65ADA5A59077 -Ptests.file.encoding=ISO-8859-1
{code}

I'm not sure what the deadlock here is, but I'd like to know whether it means:
# full parallelism of zk callbacks is really required, and this just won't work
# with a few changes, zk callbacks would _not_ require parallelism

Note also that I wouldn't assume that a passing test suite would mean 
everything is fine. I don't plan to pursue this further atm, but I wanted to 
drop my thoughts and experiments into this issue in case someone feels inclined 
to follow up on it. It's possible that a deep dive here might even improve 
tangential things (e.g., perhaps leader elections?) in unanticipated ways ...

> Mitigate extreme parallelism of zkCallback executor
> ---
>
> Key: SOLR-17348
> URL: https://issues.apache.org/jira/browse/SOLR-17348
> Project: Solr
>  Issue Type: Improvement
>Reporter: Michael Gibney
>Priority: Minor
>
> zkCallback executor is [currently an unbounded thread pool of core size 
> 0|https://github.com/apache/solr/blob/709a1ee27df23b419d09fe8f67c3276409131a4a/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/SolrZkClient.java#L91-L92],
>  using a SynchronousQueue. Thus, a flood of zkCallback events (as might be 
> triggered by a cluster restart, e.g.) can result in spinning up a very large 
> number of threads. In practice we have encountered as many as 35k threads 
> created in some such cases, even after the impact of this situation was 
> reduced by the fix for SOLR-11535.
> Inspired by [~cpoerschke]'s recent [closer look at thread pool 
> behavior|https://issues.apache.org/jira/browse/SOLR-13350?focusedCommentId=17853178#comment-17853178],
>  I wondered if we might be able to employ a bounded queue to alleviate some 
> of the pressure from bursty zk callbacks.
> The new config might look something like: {{corePoolSize=1024, 
> maximumPoolSize=Integer.MAX_VALUE, allowCoreThreadTimeout=true, workQueue=new 
> LinkedBlockingQueue<>(1024)}}. This would allow the pool to grow up to (and 
> shrink from) corePoolSize in the same manner it currently does, but once 
> exceeding corePoolSize (e.g. during a cluster restart or other callback flood 
> event), tasks would be queued (up to some fixed limit). If the queue limit is 
> exceeded, new threads would still be created, but we would have avoided the 
> current “always create a thread” behavior, and by so doing hopefully reduce 
> task execution time and improve overall throughput.
> From the ThreadPoolExecutor javadocs:
> {quote}Direct handoffs. A good default choice for a work queue is a 
> SynchronousQueue that hands off tasks to threads without otherwise holding 
> them. Here, an attempt to queue a task will fail if no threads are 
> immediately available to run it, so a new thread will be constructed. This 
> policy avoids lockups when handling sets of requests that might have internal 
> dependencies. Direct handoffs generally require unbounded maximumPoolSizes to 
> avoid rejection of new submitted tasks. This in turn admits the possibility 
> of unbounded thread growth when commands continue to arrive on average faster 
> than they can be processed.{quote}
> So afaict SynchronousQueue mainly makes sense if there exists the possibility 
> of deadlock due to dependencies among tasks, and I think this should ideally 
> _not_ be the case with zk callbacks (though in practice I'm not sure this is 
> the case).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-17348) Mitigate extreme parallelism of zkCallback executor

2024-06-25 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860019#comment-17860019
 ] 

Michael Gibney edited comment on SOLR-17348 at 6/25/24 7:58 PM:


In order to naively check for possible deadlocks if the parallelism of 
zkCallback executor were modified, I changed it to 
{{ExecutorUtil.newMDCAwareSingleThreadExecutor()}}. Most of the tests were fine 
(I did not run nightlies); there were only two tests that failed, one of which 
({{ZkSolrClientTest.testMultipleWatchesAsync}}) is not really worth mentioning 
because iiuc it explicitly tests for parallel callback execution, so it should 
be expected to fail.

The only other failure (which is more concerning) can be reproduced via:
{code:sh}
./gradlew :solr:core:test --tests 
"org.apache.solr.cloud.TestCloudConsistency.testOutOfSyncReplicasCannotBecomeLeader"
 -Ptests.jvms=5 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC 
-XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m" 
-Ptests.seed=56FF65ADA5A59077 -Ptests.file.encoding=ISO-8859-1
{code}

I'm not sure what the deadlock here is, but I'd like to know whether it means:
# full parallelism of zk callbacks is really required, and this just won't work
# with a few changes, zk callbacks would _not_ require parallelism

Note also that I wouldn't assume that a passing test suite would mean 
everything is fine. I don't plan to pursue this further atm, but I wanted to 
drop my thoughts and experiments into this issue in case someone feels inclined 
to follow up on it. It's possible that a deep dive here might even improve 
tangential things (e.g., perhaps leader elections?) in unanticipated ways ...


was (Author: mgibney):
In order to naively check for possible deadlocks if the parallelism of 
zkCallback executor were modified, I changed it to 
{{ExecutorUtil.newMDCAwareSingleThreadExecutor()}}. Most of the tests were fine 
(I did not run nightlies); there were only two tests that failed, one of which 
({{ZkSolrClientTest.testMultipleWatchesAsync}}) is not really worth mentioning 
because iiuc it explicitly tests for parallel callback execution, so it should 
be expected to fail.

The concerning failure can reproduced via:
{code:sh}
./gradlew :solr:core:test --tests 
"org.apache.solr.cloud.TestCloudConsistency.testOutOfSyncReplicasCannotBecomeLeader"
 -Ptests.jvms=5 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC 
-XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m" 
-Ptests.seed=56FF65ADA5A59077 -Ptests.file.encoding=ISO-8859-1
{code}

I'm not sure what the deadlock here is, but I'd like to know whether it means:
# full parallelism of zk callbacks is really required, and this just won't work
# with a few changes, zk callbacks would _not_ require parallelism

Note also that I wouldn't assume that a passing test suite would mean 
everything is fine. I don't plan to pursue this further atm, but I wanted to 
drop my thoughts and experiments into this issue in case someone feels inclined 
to follow up on it. It's possible that a deep dive here might even improve 
tangential things (e.g., perhaps leader elections?) in unanticipated ways ...

> Mitigate extreme parallelism of zkCallback executor
> ---
>
> Key: SOLR-17348
> URL: https://issues.apache.org/jira/browse/SOLR-17348
> Project: Solr
>  Issue Type: Improvement
>Reporter: Michael Gibney
>Priority: Minor
>
> zkCallback executor is [currently an unbounded thread pool of core size 
> 0|https://github.com/apache/solr/blob/709a1ee27df23b419d09fe8f67c3276409131a4a/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/SolrZkClient.java#L91-L92],
>  using a SynchronousQueue. Thus, a flood of zkCallback events (as might be 
> triggered by a cluster restart, e.g.) can result in spinning up a very large 
> number of threads. In practice we have encountered as many as 35k threads 
> created in some such cases, even after the impact of this situation was 
> reduced by the fix for SOLR-11535.
> Inspired by [~cpoerschke]'s recent [closer look at thread pool 
> behavior|https://issues.apache.org/jira/browse/SOLR-13350?focusedCommentId=17853178#comment-17853178],
>  I wondered if we might be able to employ a bounded queue to alleviate some 
> of the pressure from bursty zk callbacks.
> The new config might look something like: {{corePoolSize=1024, 
> maximumPoolSize=Integer.MAX_VALUE, allowCoreThreadTimeout=true, workQueue=new 
> LinkedBlockingQueue<>(1024)}}. This would allow the pool to grow up to (and 
> shrink from) corePoolSize in the same manner it currently does, but once 
> exceeding corePoolSize (e.g. during a cluster restart or other callback flood 
> event), tasks would be queued (up to some fixed limit). If the queue limit is 
> exceeded, new threads

[jira] [Updated] (SOLR-13350) Explore collector managers for multi-threaded search

2024-06-25 Thread Ishan Chattopadhyaya (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-13350:

Priority: Blocker  (was: Major)

> Explore collector managers for multi-threaded search
> 
>
> Key: SOLR-13350
> URL: https://issues.apache.org/jira/browse/SOLR-13350
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
> Attachments: SOLR-13350-pre-PR-2508.patch, SOLR-13350.patch, 
> SOLR-13350.patch, SOLR-13350.patch
>
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> AFAICT, SolrIndexSearcher can be used only to search all the segments of an 
> index in series. However, using CollectorManagers, segments can be searched 
> concurrently and result in reduced latency. Opening this issue to explore the 
> effectiveness of using CollectorManagers in SolrIndexSearcher from latency 
> and throughput perspective.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-13350) Explore collector managers for multi-threaded search

2024-06-25 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860043#comment-17860043
 ] 

Ishan Chattopadhyaya commented on SOLR-13350:
-

> https://github.com/apache/solr/pull/2508 opened to explore replacing the 
> cached with a fixed threadpool.

Thanks [~cpoerschke]! This seems definitely faster than using cached 
threadpools. I'm benchmarking more thoroughly at the moment.
I've marked this issue as a release blocker, as we shouldn't have a release in 
this state (without your patch, and potentially other fixes on top of that).

> Explore collector managers for multi-threaded search
> 
>
> Key: SOLR-13350
> URL: https://issues.apache.org/jira/browse/SOLR-13350
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
> Attachments: SOLR-13350-pre-PR-2508.patch, SOLR-13350.patch, 
> SOLR-13350.patch, SOLR-13350.patch
>
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> AFAICT, SolrIndexSearcher can be used only to search all the segments of an 
> index in series. However, using CollectorManagers, segments can be searched 
> concurrently and result in reduced latency. Opening this issue to explore the 
> effectiveness of using CollectorManagers in SolrIndexSearcher from latency 
> and throughput perspective.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-13350) Explore collector managers for multi-threaded search

2024-06-25 Thread Ishan Chattopadhyaya (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-13350:

Fix Version/s: 9.7

> Explore collector managers for multi-threaded search
> 
>
> Key: SOLR-13350
> URL: https://issues.apache.org/jira/browse/SOLR-13350
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
> Fix For: 9.7
>
> Attachments: SOLR-13350-pre-PR-2508.patch, SOLR-13350.patch, 
> SOLR-13350.patch, SOLR-13350.patch
>
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> AFAICT, SolrIndexSearcher can be used only to search all the segments of an 
> index in series. However, using CollectorManagers, segments can be searched 
> concurrently and result in reduced latency. Opening this issue to explore the 
> effectiveness of using CollectorManagers in SolrIndexSearcher from latency 
> and throughput perspective.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-10255) Large psuedo-stored fields via BinaryDocValuesField

2024-06-25 Thread Alexey Serba (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860074#comment-17860074
 ] 

Alexey Serba commented on SOLR-10255:
-

[~dsmiley], please review https://github.com/apache/solr/pull/2536 when you 
have a chance.

> Large psuedo-stored fields via BinaryDocValuesField
> ---
>
> Key: SOLR-10255
> URL: https://issues.apache.org/jira/browse/SOLR-10255
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: SOLR-10255.patch, SOLR-10255.patch
>
>
> (sub-issue of SOLR-10117)  This is a proposal for a better way for Solr to 
> handle "large" text fields.  Large docs that are in Lucene StoredFields slow 
> requests that don't involve access to such fields.  This is fundamental to 
> the fact that StoredFields are row-stored.  Worse, the Solr documentCache 
> will wind up holding onto massive Strings.  While the latter could be tackled 
> on it's own somehow as it's the most serious issue, nevertheless it seems 
> wrong that such large fields are in row-stored storage to begin with.  After 
> all, relational DBs seemed to have figured this out and put CLOBs/BLOBs in a 
> separate place.  Here, we do similarly by using, Lucene 
> {{BinaryDocValuesField}}.  BDVF isn't well known in the DocValues family as 
> it's not for typical DocValues purposes like sorting/faceting etc.  The 
> default DocValuesFormat doesn't compress these but we could write one that 
> does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org