[jira] [Commented] (SOLR-17623) SimpleOrderedMap should implement Map

2025-02-01 Thread Renato Haeberli (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922921#comment-17922921
 ] 

Renato Haeberli commented on SOLR-17623:


You're welcome, I couldn't have done it without your help!

Just to make sure, I do understand the challenges around moving away from  
NamedList.asShallowMap correctly:
NamedList.asShallowMap calls NamedList.this.asMap(1), hence NamedLists within a 
NamedList are also converted to a Map. By simply using SOM wherever  
asShallowMap is called, we would 'lose' this conversion of the nested NamedList.
Could we overcome that issue by ensuring, that wherever asShallowMap is called, 
we change the code so that  the SOM only holds other SOMs, and no NamedList? 
With that, the nested data structure is already a Map by definition and should 
not cause any issues.


Is the goal to also to move away from NamedList.asMap or only from asShallowMap?

 

 

> SimpleOrderedMap should implement Map
> -
>
> Key: SOLR-17623
> URL: https://issues.apache.org/jira/browse/SOLR-17623
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
>  Labels: newdev, pull-request-available
> Fix For: 9.9
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> SimpleOrderedMap is semantically a Map; it should implement Map.  
> Why: This will help us transition away from NamedList (it's superclass) in a 
> number of places, since many (most?) places that are defined to return a 
> NamedList could actually be declared to be a SimpleOrderedMap and eventually 
> simply Map.  
> There's some risk that code somewhere gets this Map, a large one, and then 
> assumes it has better than O(N) lookup, which it doesn't provide.  Perhaps a 
> Javadoc warning will do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17648) multiThreaded: Remove obsolete RejectedExecutionException avoidance

2025-02-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SOLR-17648:
--
Labels: pull-request-available  (was: )

> multiThreaded: Remove obsolete RejectedExecutionException avoidance
> ---
>
> Key: SOLR-17648
> URL: https://issues.apache.org/jira/browse/SOLR-17648
> Project: Solr
>  Issue Type: Task
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Lucene 9.12 fixed a RejectedExecutionException risk 
> https://github.com/apache/lucene/pull/13622 in which 
> RejectedExecutionException is caught and the task is run. This is done with a 
> simple Executor wrapper in IndexSearcher's constructor. I propose we remove 
> the hack/work-around in SolrIndexSearcher. This brings back a 
> LinkedBlockingQueue, and queue size to determine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[PR] SOLR-17648: multiThreaded=true: changed queue implementation [solr]

2025-02-01 Thread via GitHub


dsmiley opened a new pull request, #3155:
URL: https://github.com/apache/solr/pull/3155

from unlimited to 1000 max, after which the caller thread will execute.
Didn't need the RejectedExecutionException avoidance hack anymore; Lucene 
9.12 has it.
   
   https://issues.apache.org/jira/browse/SOLR-17648


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17623) SimpleOrderedMap should implement Map

2025-02-01 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922943#comment-17922943
 ] 

David Smiley commented on SOLR-17623:
-

Lets take that topic to SOLR-17647 which I just created...

> SimpleOrderedMap should implement Map
> -
>
> Key: SOLR-17623
> URL: https://issues.apache.org/jira/browse/SOLR-17623
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Major
>  Labels: newdev, pull-request-available
> Fix For: 9.9
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> SimpleOrderedMap is semantically a Map; it should implement Map.  
> Why: This will help us transition away from NamedList (it's superclass) in a 
> number of places, since many (most?) places that are defined to return a 
> NamedList could actually be declared to be a SimpleOrderedMap and eventually 
> simply Map.  
> There's some risk that code somewhere gets this Map, a large one, and then 
> assumes it has better than O(N) lookup, which it doesn't provide.  Perhaps a 
> Javadoc warning will do.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-13360) StringIndexOutOfBoundsException: String index out of range: -3

2025-02-01 Thread Alex Deparvu (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916790#comment-17916790
 ] 

Alex Deparvu edited comment on SOLR-13360 at 2/1/25 2:01 PM:
-

Just trying to make some progress here, I was able to add a PR with some 
minimal tests that reproduce the issue 
[https://github.com/apache/solr/pull/3112]
There is no proposed solution yet because I wanted to discuss options here 
first. but I did leave some TODO notes where I think changes could do.

just a disclaimer I don't really have a lot of knowledge in this area so I did 
spend some time trying to understand what the issue is so some of this might 
not be 100% correct. also thanks to the wealth of details on this ticket I was 
able to come up with a relatively simple test (see 
SpellCheckCollatorWithSynonymTest) but also a more lower level unit test that 
unpacks all the complexity of setting up a Solr instance with correct field 
types and all (see SpellCheckCollatorCollationOnlyTest).

To me this looks like an overlapping interval problem. given a sufficiently 
complicated analyzer, the tokens can have a lot of overlapping start/end 
indexes and this can cause chaos on the collation code which seems to assume 
tokens come in with strictly increasing start indexes. The twist to the PR that 
was posted above is that not only can start index repeat BUT you can have 
tokens inside other tokens which is a gap that also needs to be fixed. see the 
unit test here 
[https://github.com/apache/solr/pull/3112/files#diff-a85c16ca4fe0d8747a0c76e21460c7ec5ede698a4a83eed47e39cdc197af0c48R126-R130]

Not really sure what the solution is, I am leaning towards a cleanup of the 
correction tokens (basically remove any overlapping intervals): sort by start 
index first, remove anything that is inside the previous token's interval (this 
will favor the first token over the others and I am not sure it is a good 
approach).

example
 - search `panthera pardus`
 - synonim definition `panthera pardus, leopard|0.6`
 - corrections are: leopard (0, 15), 0(0,15), 6(0,15), panthera(0, 8), pardu(9, 
15)
 - note the `0` and `6` tokens were generated from the syonym definition so 
even if a possible bug - I am leaving the example in because it shows what can 
happen

More open questions

1. I added a log capturing the data in case of a future 
StringIndexOutOfBoundsException. curious what people think, is this useful or 
not. I can remove it if it is too intrusive or can leak sensitive data in the 
logs.

2. I can put this behind a system property (on by default) so if anything 
happens this can be reverted to previous behavior.

3. there is some issue with boosts in synonyms. I tried documenting here 
[https://github.com/apache/solr/pull/3112/files#diff-8a4f8ed7cdb05bd73fcaaa4b688a166db1c28ab32eca3179ec19ec43138384ccR39]

4. I think there might be an issue with the whitespace correction. I move this 
code into a dedicated metod but did not have time to add any tests.


was (Author: alex.parvulescu):
Just trying to make some progress here, I was able to add a PR with some 
minimal tests that reproduce the issue https://github.com/apache/solr/pull/3112
There is no proposed solution yet because I wanted to discuss options here 
first. but I did leave some TODO notes where I think changes could do.

just a disclaimer I don't really have a lot of knowledge in this area so I did 
spend some time trying to understand what the issue is so some of this might 
not be 100% correct. also thanks to the wealth of details on this ticket I was 
able to come up with a relatively simple test (see 
SpellCheckCollatorWithSynonymTest) but also a more lower level unit test that 
unpacks all the complexity of setting up a Solr instance with correct field 
types and all (see SpellCheckCollatorCollationOnlyTest).

To me this looks like an overlapping interval problem. given a sufficiently 
complicated analyzer, the tokens can have a lot of overlapping start/end 
indexes and this can cause chaos on the collation code which seems to assume 
tokens come in with strictly increasing start indexes. The twist to the PR that 
was posted above is that not only can start index repeat BUT you can have 
tokens inside other tokens which is a gap that also needs to be fixed. see the 
unit test here 
https://github.com/apache/solr/pull/3112/files#diff-a85c16ca4fe0d8747a0c76e21460c7ec5ede698a4a83eed47e39cdc197af0c48R110-R114

Not really sure what the solution is, I am leaning towards a cleanup of the 
correction tokens (basically remove any overlapping intervals): sort by start 
index first, remove anything that is inside the previous token's interval (this 
will favor the first token over the others and I am not sure it is a good 
approach).

example
 - search `panthera pardus`
 - synonim definition `panthera pardus, leopard|0.6`
 - corrections are: leopard (0, 15), 0(0,1

[jira] [Commented] (SOLR-13360) StringIndexOutOfBoundsException: String index out of range: -3

2025-02-01 Thread Alex Deparvu (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922941#comment-17922941
 ] 

Alex Deparvu commented on SOLR-13360:
-

PR was updated with the cleanup solution proposed above. the new method 
[mergeSpellCheckCorrections|https://github.com/apache/solr/pull/3112/files#diff-2373648e97e9823fac15ad2dabc6da48c98a870b39580776fd2d4bfa46fe532fR299]
 does all of this.

> StringIndexOutOfBoundsException: String index out of range: -3
> --
>
> Key: SOLR-13360
> URL: https://issues.apache.org/jira/browse/SOLR-13360
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.2.1
> Environment: Solr 7.2.1 - SAP Hybris 6.7.0.8
>Reporter: Ahmed Ghoneim
>Priority: Critical
>  Labels: pull-request-available
> Attachments: managed-schema, managed-schema, resources.json, 
> solr-config.zip
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> *{color:#ff}I cannot execute the following query:{color}*
> {noformat}
> http://localhost:8983/solr/master_Project_Product_flip/suggest?q=duotop&spellcheck.q=duotop&qt=/suggest&spellcheck.dictionary=de&spellcheck.collate=true{noformat}
> 4/1/2019, 1:16:07 PM ERROR true RequestHandlerBase 
> java.lang.StringIndexOutOfBoundsException: String index out of range: -3
> {code:java}
> java.lang.StringIndexOutOfBoundsException: String index out of range: -3
>   at 
> java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:851)
>   at java.lang.StringBuilder.replace(StringBuilder.java:262)
>   at 
> org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:252)
>   at 
> org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:94)
>   at 
> org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:297)
>   at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:209)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:

Re: [PR] SOLR-17310: Configurable LeafSorter to customize segment search order [solr]

2025-02-01 Thread via GitHub


github-actions[bot] closed pull request #2477: SOLR-17310:  Configurable 
LeafSorter to customize segment search order
URL: https://github.com/apache/solr/pull/2477


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17310: Configurable LeafSorter to customize segment search order [solr]

2025-02-01 Thread via GitHub


github-actions[bot] commented on PR #2477:
URL: https://github.com/apache/solr/pull/2477#issuecomment-2629168132

   This PR is now closed due to 60 days of inactivity after being marked as 
stale.  Re-opening this PR is still possible, in which case it will be marked 
as active again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17187: make replica poll interval configurable [solr]

2025-02-01 Thread via GitHub


dsmiley commented on code in PR #3144:
URL: https://github.com/apache/solr/pull/3144#discussion_r1938341688


##
solr/core/src/java/org/apache/solr/cloud/ReplicateFromLeader.java:
##
@@ -134,18 +127,25 @@ public static String getCommitVersion(SolrCore solrCore) {
   }
 
   /**
-   * Determine the poll interval for replicas based on the auto soft/hard 
commit schedule
+   * Determine the poll interval for replicas based on the auto soft/hard 
commit schedule or
+   * configured commit poll interval
*
* @param uinfo the update handler info containing soft/hard commit 
configuration
* @return a poll interval string representing a cadence of polling 
frequency in the form of
-   * hh:mm:ss
+   * hh:mm:ss, never null
*/
   public static String determinePollInterval(SolrConfig.UpdateHandlerInfo 
uinfo) {
 int hardCommitMaxTime = uinfo.autoCommmitMaxTime;
 int softCommitMaxTime = uinfo.autoSoftCommmitMaxTime;
 boolean hardCommitNewSearcher = uinfo.openSearcher;
-String pollIntervalStr = null;
-if (hardCommitMaxTime != -1) {
+String customCommitPollInterval = uinfo.commitPollInterval;
+String pollIntervalStr = "00:00:03";
+
+if (System.getProperty("jetty.testMode") != null) {

Review Comment:
   You are thinking of EnvUtils but that's for real/operational settings a user 
might set.  That doesn't apply here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17187: make replica poll interval configurable [solr]

2025-02-01 Thread via GitHub


dsmiley commented on code in PR #3144:
URL: https://github.com/apache/solr/pull/3144#discussion_r1938342390


##
solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java:
##
@@ -1485,7 +1486,8 @@ private Long readIntervalMs(String interval) {
 return TimeUnit.MILLISECONDS.convert(readIntervalNs(interval), 
TimeUnit.NANOSECONDS);
   }
 
-  private Long readIntervalNs(String interval) {
+  @VisibleForTesting
+  public static Long readIntervalNs(String interval) {

Review Comment:
   indeed; it explains why it's more open than it otherwise needs to be



##
solr/core/src/test/org/apache/solr/cloud/ReplicateFromLeaderTest.java:
##
@@ -18,54 +18,105 @@
 package org.apache.solr.cloud;
 
 import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertThrows;
 
+import org.apache.solr.common.SolrException;
 import org.apache.solr.core.SolrConfig;
+import org.junit.After;
+import org.junit.Before;
 import org.junit.Test;
 
 public class ReplicateFromLeaderTest {
 
+  private String jettyTestMode;
+
+  @Before
+  public void setUp() {

Review Comment:
   Indeed we do.  Firstly, all Solr tests ought to extend SolrTestCase, at 
least indirectly.  This test doesn't; it should be modified.  Secondly, STCJ4 
includes:
   
   ```aiignore
 @Rule public TestRule solrTestRules = RuleChain.outerRule(new 
SystemPropertiesRestoreRule());
   
   ```
   That obsoletes the setup & tearDown here.
   
   So for now, I suggest removing the setup & teardown here and subclass STCJ4. 
 It's a TODO for me in my work on making SolrTestCase a better base class to 
transition that line from STCJ4 to STC.



##
solr/solr-ref-guide/modules/configuration-guide/pages/commits-transaction-logs.adoc:
##
@@ -62,6 +62,8 @@ It is recommended that this be set for as long as is 
reasonable given the applic
 A hard commit means that, if a server crashes, Solr will know exactly where 
your data was stored; a soft commit means that the data is stored, but the 
location information isn't yet stored.
 The tradeoff is that a soft commit gives you faster visibility because it's 
not waiting for background merges to finish.
 
+In a TLOG/PULL replica setup, the commit configuration also influences the 
interval at which the replica is polling the shard leader. You may optionally 
configure a custom `commitPollInterval`.

Review Comment:
   Please start new sentences on new lines.  Not sure if this style choice is 
stated somewhere.  Goal is reducing diffs and reducing need for line wrap.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Comment Edited] (SOLR-13360) StringIndexOutOfBoundsException: String index out of range: -3

2025-02-01 Thread Alex Deparvu (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916790#comment-17916790
 ] 

Alex Deparvu edited comment on SOLR-13360 at 2/1/25 2:15 PM:
-

Just trying to make some progress here, I was able to add a PR with some 
minimal tests that reproduce the issue 
[https://github.com/apache/solr/pull/3112]
There is no proposed solution yet because I wanted to discuss options here 
first. but I did leave some TODO notes where I think changes could do.

just a disclaimer I don't really have a lot of knowledge in this area so I did 
spend some time trying to understand what the issue is so some of this might 
not be 100% correct. also thanks to the wealth of details on this ticket I was 
able to come up with a relatively simple test (see 
SpellCheckCollatorWithSynonymTest) but also a more lower level unit test that 
unpacks all the complexity of setting up a Solr instance with correct field 
types and all (see SpellCheckCollatorCollationOnlyTest).

To me this looks like an overlapping interval problem. given a sufficiently 
complicated analyzer, the tokens can have a lot of overlapping start/end 
indexes and this can cause chaos on the collation code which seems to assume 
tokens come in with strictly increasing start indexes. The twist to the PR that 
was posted above is that not only can start index repeat BUT you can have 
tokens inside other tokens which is a gap that also needs to be fixed. see the 
unit test here 
[https://github.com/apache/solr/pull/3112/files#diff-a85c16ca4fe0d8747a0c76e21460c7ec5ede698a4a83eed47e39cdc197af0c48R126-R130]

Not really sure what the solution is, I am leaning towards a cleanup of the 
correction tokens (basically remove any overlapping intervals): sort by start 
index first, remove anything that is inside the previous token's interval (this 
will favor the first token over the others and I am not sure it is a good 
approach).

example
 - search `panthera pardus`
 - synonim definition `panthera pardus, leopard|0.6`
 - corrections are: leopard (0, 15), 0(0,15), 6(0,15), panthera(0, 8), pardu(9, 
15)
 - note the `0` and `6` tokens were generated from the syonym definition so 
even if a possible bug - I am leaving the example in because it shows what can 
happen

More open questions

1. I added a log capturing the data in case of a future 
StringIndexOutOfBoundsException. curious what people think, is this useful or 
not. I can remove it if it is too intrusive or can leak sensitive data in the 
logs.

2. I can put this behind a system property (on by default) so if anything 
happens this can be reverted to previous behavior.

3. there is some issue with boosts in synonyms. I tried documenting here 
[https://github.com/apache/solr/pull/3112/files#diff-8a4f8ed7cdb05bd73fcaaa4b688a166db1c28ab32eca3179ec19ec43138384ccR39]

4. I think there might be an issue with the whitespace correction parts but did 
not have time to add any tests.


was (Author: alex.parvulescu):
Just trying to make some progress here, I was able to add a PR with some 
minimal tests that reproduce the issue 
[https://github.com/apache/solr/pull/3112]
There is no proposed solution yet because I wanted to discuss options here 
first. but I did leave some TODO notes where I think changes could do.

just a disclaimer I don't really have a lot of knowledge in this area so I did 
spend some time trying to understand what the issue is so some of this might 
not be 100% correct. also thanks to the wealth of details on this ticket I was 
able to come up with a relatively simple test (see 
SpellCheckCollatorWithSynonymTest) but also a more lower level unit test that 
unpacks all the complexity of setting up a Solr instance with correct field 
types and all (see SpellCheckCollatorCollationOnlyTest).

To me this looks like an overlapping interval problem. given a sufficiently 
complicated analyzer, the tokens can have a lot of overlapping start/end 
indexes and this can cause chaos on the collation code which seems to assume 
tokens come in with strictly increasing start indexes. The twist to the PR that 
was posted above is that not only can start index repeat BUT you can have 
tokens inside other tokens which is a gap that also needs to be fixed. see the 
unit test here 
[https://github.com/apache/solr/pull/3112/files#diff-a85c16ca4fe0d8747a0c76e21460c7ec5ede698a4a83eed47e39cdc197af0c48R126-R130]

Not really sure what the solution is, I am leaning towards a cleanup of the 
correction tokens (basically remove any overlapping intervals): sort by start 
index first, remove anything that is inside the previous token's interval (this 
will favor the first token over the others and I am not sure it is a good 
approach).

example
 - search `panthera pardus`
 - synonim definition `panthera pardus, leopard|0.6`
 - corrections are: leopard (0, 15), 0(0,15), 6(0,15), panthera(0, 8), p

Re: [PR] SOLR-17641: Disable the Security Manager for Java 24+ [solr]

2025-02-01 Thread via GitHub


epugh commented on PR #3153:
URL: https://github.com/apache/solr/pull/3153#issuecomment-2628951104

   It's great that you made the disabling conditional on Java 24+.   I wonder 
though if we are better served just removing it completely from Solr 10, 
regardless of version of Java?   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17609: Remove HDFS module [solr]

2025-02-01 Thread via GitHub


epugh commented on PR #2923:
URL: https://github.com/apache/solr/pull/2923#issuecomment-2628949967

   It's february 1st, so I'm going to look to merge this on Monday barring any 
other concerns being brought up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17609: mark deprecation of HDFS module in 9x and removal in 10 [solr]

2025-02-01 Thread via GitHub


epugh commented on code in PR #3041:
URL: https://github.com/apache/solr/pull/3041#discussion_r1938274548


##
solr/solr-ref-guide/modules/deployment-guide/pages/solr-on-hdfs.adoc:
##
@@ -20,6 +20,8 @@
 The Solr HDFS Module has support for writing and reading Solr's index and 
transaction log files to the HDFS distributed filesystem.
 It does not use Hadoop MapReduce to process Solr data.
 
+IMPORTANT: The HDFS Module has been deprecated and will be removed in Solr 10.

Review Comment:
   I believe `IMPORTANT` is ascii doc, it does some extra formatting in the 
HTML to pop.   `NOTE` is another one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-17414) multiThreaded=true can result in RejectedExecutionException

2025-02-01 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923016#comment-17923016
 ] 

David Smiley commented on SOLR-17414:
-

Lucene 9.12 has a fix for this -- https://github.com/apache/lucene/pull/13622 
in which RejectedExecutionException is caught and the task is run.  This is 
done with a simple Executor wrapper in IndexSearcher's constructor.  I propose 
I remove the hack/work-around here in favor leaning on Lucene's solution -- 
less for us to maintain.  This brings back a queue size to determine.

> multiThreaded=true can result in RejectedExecutionException
> ---
>
> Key: SOLR-17414
> URL: https://issues.apache.org/jira/browse/SOLR-17414
> Project: Solr
>  Issue Type: Bug
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.8
>
> Attachments: build-out2.txt
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since the new multiThreaded search feature landed, I see a new test
> failure involving "RejectedExecutionException" being thrown 
> [link|https://ge.apache.org/s/5ack462ji4mlu/tests/task/:solr:core:test/details/org.apache.solr.search.TestRealTimeGet/testStressGetRealtime?top-execution=1].
> It is thrown at a low level in Lucene building TermStates
> concurrently.  I doubt the problem is specific to that test
> (TestRealTimeGet) but that test might induce more activity than most
> tests, thus crossing some thresholds like the queue size -- apparently
> 1000.
> *I don't think we should be throwing a RejectedExecutionException
> when running a Search query*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17310: Configurable LeafSorter to customize segment search order [solr]

2025-02-01 Thread via GitHub


weiwang19 commented on PR #2477:
URL: https://github.com/apache/solr/pull/2477#issuecomment-2629240025

   keep it open


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Assigned] (SOLR-17648) multiThreaded: Remove obsolete RejectedExecutionException avoidance

2025-02-01 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned SOLR-17648:
---

Assignee: David Smiley

> multiThreaded: Remove obsolete RejectedExecutionException avoidance
> ---
>
> Key: SOLR-17648
> URL: https://issues.apache.org/jira/browse/SOLR-17648
> Project: Solr
>  Issue Type: Task
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>
> Lucene 9.12 fixed a RejectedExecutionException risk 
> https://github.com/apache/lucene/pull/13622 in which 
> RejectedExecutionException is caught and the task is run. This is done with a 
> simple Executor wrapper in IndexSearcher's constructor. I propose we remove 
> the hack/work-around in SolrIndexSearcher. This brings back a 
> LinkedBlockingQueue, and queue size to determine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17648) multiThreaded: Remove obsolete RejectedExecutionException avoidance

2025-02-01 Thread David Smiley (Jira)
David Smiley created SOLR-17648:
---

 Summary: multiThreaded: Remove obsolete RejectedExecutionException 
avoidance
 Key: SOLR-17648
 URL: https://issues.apache.org/jira/browse/SOLR-17648
 Project: Solr
  Issue Type: Task
Reporter: David Smiley


Lucene 9.12 fixed a RejectedExecutionException risk 
https://github.com/apache/lucene/pull/13622 in which RejectedExecutionException 
is caught and the task is run. This is done with a simple Executor wrapper in 
IndexSearcher's constructor. I propose we remove the hack/work-around in 
SolrIndexSearcher. This brings back a LinkedBlockingQueue, and queue size to 
determine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org