[ 
https://issues.apache.org/jira/browse/SOLR-13360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379107#comment-17379107
 ] 

Rainer Jung commented on SOLR-13360:
------------------------------------

The original old code tries to replace tokens in the original query by spell 
checked alternatives. It uses the assumption, that the replacement occurs in 
increasing position order in the original query.

The query string part to replace is identified by the startOffset and endOffset 
noted in the replacement token. These refer to the orignal character indexes in 
the query string.

If you want to replace multiple tokens left to right in a string and the sub 
strings to replace are given by indexes in the string, you have to be careful, 
if the length of the replacement string differs from the original substring. If 
the length gets longer, you have to increase the start and end indexes of all 
replacements to the right by the same amount, if it gets shorter, you have to 
decrease it. The code does exactly this calculation and cumulates the added 
increase and decrease in the variable offset.

So it is absolutly necessary, that the assumptions - replacements are done left 
to right - is correct.

 

Now the exception happens in the case, where multiple replacements 
(corrections) point to the same character position in the orginal query. How 
can that happen? By configuring the spellchecker with a queryAnalyzerFieldType 
which uses a synonym list in the query analyzer with the property, that it 
replaces a token by multiple tokens. That might not be the comon case, but it 
is possible.

Now searching for e.g. a single token in the original query can expand to 
multiple tokens by the synonym. All these refer to the same original query 
token, so use the same startOffset and endOffset. If the spell checker find 
corrections for more than one of these tokens, the collation code will try to 
replace the same original query token multiple times, always using the same 
startOffset and endOffset plus calculated offset. If the length changes, that 
is no longer correct.  Example:

 

original query: "myname"

synonym list: myname, some1 some2 myname, some3 some4 myname

Token list going into spell checker collation: myname, some1, some2, some3, 
some4

Assumed spelling corrections: some1 => some, some2=> some, some3=> some, 
some4=> some

Replacement happening on original query text "myname":
 * myname => some (startOffset 0, endOffset 6, offset 0; new offset -2 because 
replacement string is 2 chars shorter, so everything after the replacement 
would move 2 positions left)
 * some => exception (startOffset 0-2 = -2, endOffset 62 = 4,...)

It is a bit harder to reproduvce than that, because corrections with 
positionIncrement == 0 are completely ignored. I couldn't actually find out, 
when exactly this happens, but simple synonym lists often result in that value 
0.

IMHO the code is wrong by making the assumption, that any replacement refers to 
a different token in the original query string. This is no longer true when 
synonym replacement by a query analyzer comes into play.

As a workaround, I find it correct to harden the code, altough it does not fix 
the root cause. If the replacement tokens are overlapping or not in order left 
to right, then the code should skip those out of order or overlapping 
replacements.

I will attach a suggested workaround patch.

I think, that using a queryAnalyzerFieldType in a spellchecker config with 
collation and with a field type that uses a query analyzer which can result in 
a list of tokens where multiple tokens refer to the same original string 
position in the query - like synonyms can - is simply not supported. That 
should be documented.

> StringIndexOutOfBoundsException: String index out of range: -3
> --------------------------------------------------------------
>
>                 Key: SOLR-13360
>                 URL: https://issues.apache.org/jira/browse/SOLR-13360
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 7.2.1
>         Environment: Solr 7.2.1 - SAP Hybris 6.7.0.8
>            Reporter: Ahmed Ghoneim
>            Priority: Critical
>         Attachments: managed-schema, managed-schema, resources.json, 
> solr-config.zip
>
>
> *{color:#ff0000}I cannot execute the following query:{color}*
> {noformat}
> http://localhost:8983/solr/master_Project_Product_flip/suggest?q=duotop&spellcheck.q=duotop&qt=/suggest&spellcheck.dictionary=de&spellcheck.collate=true{noformat}
> 4/1/2019, 1:16:07 PM ERROR true RequestHandlerBase 
> java.lang.StringIndexOutOfBoundsException: String index out of range: -3
> {code:java}
> java.lang.StringIndexOutOfBoundsException: String index out of range: -3
>       at 
> java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:851)
>       at java.lang.StringBuilder.replace(StringBuilder.java:262)
>       at 
> org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:252)
>       at 
> org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:94)
>       at 
> org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:297)
>       at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:209)
>       at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
>       at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
>       at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
>       at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
>       at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>       at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>       at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>       at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>       at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>       at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>       at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>       at org.eclipse.jetty.server.Server.handle(Server.java:534)
>       at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>       at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>       at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>       at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>       at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:251)
>       at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>       at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>       at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>       at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>       at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>       at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> 4/1/2019, 1:16:07 PM ERROR true HttpSolrCall 
> null:java.lang.StringIndexOutOfBoundsException: String index out of range: -3
> {code:java}
> null:java.lang.StringIndexOutOfBoundsException: String index out of range: -3
>       at 
> java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:851)
>       at java.lang.StringBuilder.replace(StringBuilder.java:262)
>       at 
> org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:252)
>       at 
> org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:94)
>       at 
> org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:297)
>       at 
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:209)
>       at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
>       at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
>       at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
>       at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
>       at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>       at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>       at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>       at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>       at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>       at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>       at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>       at org.eclipse.jetty.server.Server.handle(Server.java:534)
>       at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>       at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>       at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>       at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>       at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:251)
>       at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>       at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>       at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>       at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>       at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>       at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>       at java.lang.Thread.run(Thread.java:748){code}
> *{color:#14892c}However the following query works:{color}*
> {noformat}
> http://localhost:8983/solr/master_Project_Product_flip/suggest?q=duotop&spellcheck.q=duotop&qt=/suggest&spellcheck.dictionary=de&spellcheck.collate=false{noformat}
> Note: there's a synonym
> {noformat}
> duotop -> Duo Top
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to