[ 
https://issues.apache.org/jira/browse/SOLR-13360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916790#comment-17916790
 ] 

Alex Deparvu commented on SOLR-13360:
-------------------------------------

Just trying to make some progress here, I was able to add a PR with some 
minimal tests that reproduce the issue https://github.com/apache/solr/pull/3112
There is no proposed solution yet because I wanted to discuss options here 
first. but I did leave some TODO notes where I think changes could do.

just a disclaimer I don't really have a lot of knowledge in this area so I did 
spend some time trying to understand what the issue is so some of this might 
not be 100% correct. also thanks to the wealth of details on this ticket I was 
able to come up with a relatively simple test (see 
SpellCheckCollatorWithSynonymTest) but also a more lower level unit test that 
unpacks all the complexity of setting up a Solr instance with correct field 
types and all (see SpellCheckCollatorCollationOnlyTest).

To me this looks like an overlapping interval problem. given a sufficiently 
complicated analyzer, the tokens can have a lot of overlapping start/end 
indexes and this can cause chaos on the collation code which seems to assume 
tokens come in with strictly increasing start indexes. The twist to the PR that 
was posted above is that not only can start index repeat BUT you can have 
tokens inside other tokens which is a gap that also needs to be fixed. see the 
unit test here 
https://github.com/apache/solr/pull/3112/files#diff-a85c16ca4fe0d8747a0c76e21460c7ec5ede698a4a83eed47e39cdc197af0c48R110-R114

Not really sure what the solution is, I am leaning towards a cleanup of the 
correction tokens (basically remove any overlapping intervals): sort by start 
index first, remove anything that is inside the previous token's interval (this 
will favor the first token over the others and I am not sure it is a good 
approach).

example
 - search `panthera pardus`
 - synonim definition `panthera pardus, leopard|0.6`
 - corrections are: leopard (0, 15), 0(0,15), 6(0,15), panthera(0, 8), pardu(9, 
15)
 - note the `0` and `6` tokens were generated from the syonym definition so 
even if a possible bug - I am leaving the example in because it shows what can 
happen


More open questions

1. I added a log capturing the data in case of a future 
StringIndexOutOfBoundsException. curious what people think, is this useful or 
not. I can remove it if it is too intrusive or can leak sensitive data in the 
logs.

2. I can put this behind a system property (on by default) so if anything 
happens this can be reverted to previous behavior.

3. there is some issue with boosts in synonyms. I tried documenting here 
https://github.com/apache/solr/pull/3112/files#diff-8a4f8ed7cdb05bd73fcaaa4b688a166db1c28ab32eca3179ec19ec43138384ccR41

4. I think there might be an issue with the whitespace correction. I move this 
code into a dedicated metod but did not have time to add any tests.




> StringIndexOutOfBoundsException: String index out of range: -3
> --------------------------------------------------------------
>
>                 Key: SOLR-13360
>                 URL: https://issues.apache.org/jira/browse/SOLR-13360
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 7.2.1
>         Environment: Solr 7.2.1 - SAP Hybris 6.7.0.8
>            Reporter: Ahmed Ghoneim
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: managed-schema, managed-schema, resources.json, 
> solr-config.zip
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> *{color:#ff0000}I cannot execute the following query:{color}*
> {noformat}
> http://localhost:8983/solr/master_Project_Product_flip/suggest?q=duotop&spellcheck.q=duotop&qt=/suggest&spellcheck.dictionary=de&spellcheck.collate=true{noformat}
> 4/1/2019, 1:16:07 PM ERROR true RequestHandlerBase 
> java.lang.StringIndexOutOfBoundsException: String index out of range: -3
> {code:java}
> java.lang.StringIndexOutOfBoundsException: String index out of range: -3
>       at 
> java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:851)
>       at java.lang.StringBuilder.replace(StringBuilder.java:262)
>       at 
> org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:252)
>       at 
> org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:94)
>       at 
> org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:297)
>       at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:209)
>       at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
>       at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
>       at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
>       at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
>       at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>       at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>       at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>       at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>       at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>       at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>       at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>       at org.eclipse.jetty.server.Server.handle(Server.java:534)
>       at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>       at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>       at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>       at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>       at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:251)
>       at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>       at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>       at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>       at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>       at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>       at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> 4/1/2019, 1:16:07 PM ERROR true HttpSolrCall 
> null:java.lang.StringIndexOutOfBoundsException: String index out of range: -3
> {code:java}
> null:java.lang.StringIndexOutOfBoundsException: String index out of range: -3
>       at 
> java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:851)
>       at java.lang.StringBuilder.replace(StringBuilder.java:262)
>       at 
> org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:252)
>       at 
> org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:94)
>       at 
> org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:297)
>       at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:209)
>       at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
>       at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
>       at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
>       at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
>       at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>       at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>       at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>       at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>       at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>       at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>       at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>       at org.eclipse.jetty.server.Server.handle(Server.java:534)
>       at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>       at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>       at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>       at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>       at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:251)
>       at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>       at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>       at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>       at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>       at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>       at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>       at java.lang.Thread.run(Thread.java:748){code}
> *{color:#14892c}However the following query works:{color}*
> {noformat}
> http://localhost:8983/solr/master_Project_Product_flip/suggest?q=duotop&spellcheck.q=duotop&qt=/suggest&spellcheck.dictionary=de&spellcheck.collate=false{noformat}
> Note: there's a synonym
> {noformat}
> duotop -> Duo Top
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to