[ https://issues.apache.org/jira/browse/SOLR-15777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470899#comment-17470899 ]
Michael Gibney commented on SOLR-15777: --------------------------------------- Encountered another instance of this causing confusion "in the wild" today. I think in most cases what's happening is that with the change of {{useDocValuesAsStored}} defaulting to true (Solr 5.5.0, schema version 1.6), people started getting results for {{ICUCollationField}} returned incidentally when specifying {{fl=*}}. I suspect that people neither expect nor want this, and the values can be invalid UTF-8, and even when they are _valid_ UTF-8, they can wreak havoc with client-side response parsing, etc. [PR #506|https://github.com/apache/solr/pull/506] has a few commits. The first causes udvas to default to false (uncontroversial imo?), but tries to support explicit udvas=true (returning the raw collation key serialized as base64-encoded binary). The second commit adds the option to strictly disallow udvas=true (throwing an exception if it is attempted to be explicitly set). The third strips all the other nonsense out and just disallows udvas=true (throwing an exception if it's attempted). I'm convinced that "strict" is the right way to go here. It's a similar case to {{SortableTextField}} except that where the "dv" and "stored" manifestation in {{SortableTextField}} stand a good chance of being identical, "dv" and "stored" manifestation for {{ICUCollationField}} are significantly different, so it wouldn't even be remotely possible to support the usual semantics of udvas. (Note, my proposal for analyzed docValues at SOLR-8362 would support the mixed use case cleanly, I think :)). If this approach sounds ok, I'll probably flesh out some sanity-check tests and try to get this committed (the tests should be straightforward enough that I'd like to offer opportunity for anyone to contradict the proposed approach before writing tests to enshrine that approach ...) > UTF8toUTF16 failing for Unicode Character “ᴙ” (U+1D19) > ------------------------------------------------------ > > Key: SOLR-15777 > URL: https://issues.apache.org/jira/browse/SOLR-15777 > Project: Solr > Issue Type: Bug > Components: query > Affects Versions: 7.7.3 > Reporter: Parag Ninawe > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This issue was seen for bulgarian language and specifically on the inverse R > Unicode Character “ᴙ” (U+1D19) > > # Indexing documents was fine > # On querying following error was seen under following conditions > Following is the Solr Config(field type & dynamic field for which the error > is thrown on querying) > {code:java} > <fieldType name="collated_bg" class="solr.ICUCollationField" locale="bg" > strength="primary" caseLevel="false"/>{code} > {code:java} > <dynamicField name="sort_X3b_bg_*" type="collated_bg" stored="false" > indexed="false" docValues="true" />{code} > Following is the sample indexed doc content > {code:java} > { "id": "testdoc" "sort_X3b_bg_title": "я" }{code} > > On querying/Select query with id this doc gives the following error on Solr > > {code:java} > { "error":{ "msg":"121", "trace":"java.lang.ArrayIndexOutOfBoundsException: > 121\n\tat > org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:602)\n\tat > org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:137)\n\tat > org.apache.solr.search.SolrDocumentFetcher.decodeDVField(SolrDocumentFetcher.java:550)\n\tat > > org.apache.solr.search.SolrDocumentFetcher.decorateDocValueFields(SolrDocumentFetcher.java:506)\n\tat > > org.apache.solr.search.SolrDocumentFetcher$RetrieveFieldsOptimizer.getSolrDoc(SolrDocumentFetcher.java:800)\n\tat > > org.apache.solr.search.SolrDocumentFetcher$RetrieveFieldsOptimizer.access$000(SolrDocumentFetcher.java:672)\n\tat > > org.apache.solr.search.SolrDocumentFetcher.solrDoc(SolrDocumentFetcher.java:278)\n\tat > org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:95)\n\tat > org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:59)\n\tat > org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184)\n\tat > > org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136)\n\tat > > org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386)\n\tat > > org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292)\n\tat > org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73)\n\tat > org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:66)\n\tat > > org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)\n\tat > > org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:811)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:540)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > org.eclipse.jetty.server.Server.handle(Server.java:502)\n\tat > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)\n\tat > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\n\tat > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat > org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat > > org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\n\tat > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)\n\tat > > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)\n\tat > java.lang.Thread.run(Thread.java:748)\n", "code":500}}{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org