[ 
https://issues.apache.org/jira/browse/SOLR-15777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470899#comment-17470899
 ] 

Michael Gibney commented on SOLR-15777:
---------------------------------------

Encountered another instance of this causing confusion "in the wild" today. I 
think in most cases what's happening is that with the change of 
{{useDocValuesAsStored}} defaulting to true (Solr 5.5.0, schema version 1.6), 
people started getting results for {{ICUCollationField}} returned incidentally 
when specifying {{fl=*}}.

I suspect that people neither expect nor want this, and the values can be 
invalid UTF-8, and even when they are _valid_ UTF-8, they can wreak havoc with 
client-side response parsing, etc.

[PR #506|https://github.com/apache/solr/pull/506] has a few commits. The first 
causes udvas to default to false (uncontroversial imo?), but tries to support 
explicit udvas=true (returning the raw collation key serialized as 
base64-encoded binary). The second commit adds the option to strictly disallow 
udvas=true (throwing an exception if it is attempted to be explicitly set). The 
third strips all the other nonsense out and just disallows udvas=true (throwing 
an exception if it's attempted).

I'm convinced that "strict" is the right way to go here. It's a similar case to 
{{SortableTextField}} except that where the "dv" and "stored" manifestation in 
{{SortableTextField}} stand a good chance of being identical, "dv" and "stored" 
manifestation for {{ICUCollationField}} are significantly different, so it 
wouldn't even be remotely possible to support the usual semantics of udvas. 
(Note, my proposal for analyzed docValues at SOLR-8362 would support the mixed 
use case cleanly, I think :)).

If this approach sounds ok, I'll probably flesh out some sanity-check tests and 
try to get this committed (the tests should be straightforward enough that I'd 
like to offer opportunity for anyone to contradict the proposed approach before 
writing tests to enshrine that approach ...)

> UTF8toUTF16 failing for Unicode Character “ᴙ” (U+1D19)
> ------------------------------------------------------
>
>                 Key: SOLR-15777
>                 URL: https://issues.apache.org/jira/browse/SOLR-15777
>             Project: Solr
>          Issue Type: Bug
>          Components: query
>    Affects Versions: 7.7.3
>            Reporter: Parag Ninawe
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue was seen for bulgarian language and specifically on the inverse R 
> Unicode Character “ᴙ” (U+1D19)
>  
>  # Indexing documents was fine
>  # On querying following error was seen under following conditions
> Following is the Solr Config(field type & dynamic field for which the error 
> is thrown on querying)
> {code:java}
> <fieldType name="collated_bg" class="solr.ICUCollationField" locale="bg" 
> strength="primary" caseLevel="false"/>{code}
> {code:java}
> <dynamicField name="sort_X3b_bg_*" type="collated_bg" stored="false" 
> indexed="false" docValues="true" />{code}
> Following is the sample indexed doc content
> {code:java}
> { "id": "testdoc" "sort_X3b_bg_title": "я" }{code}
>  
> On querying/Select query with id this doc gives the following error on Solr 
>  
> {code:java}
> { "error":{ "msg":"121", "trace":"java.lang.ArrayIndexOutOfBoundsException: 
> 121\n\tat 
> org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:602)\n\tat 
> org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:137)\n\tat 
> org.apache.solr.search.SolrDocumentFetcher.decodeDVField(SolrDocumentFetcher.java:550)\n\tat
>  
> org.apache.solr.search.SolrDocumentFetcher.decorateDocValueFields(SolrDocumentFetcher.java:506)\n\tat
>  
> org.apache.solr.search.SolrDocumentFetcher$RetrieveFieldsOptimizer.getSolrDoc(SolrDocumentFetcher.java:800)\n\tat
>  
> org.apache.solr.search.SolrDocumentFetcher$RetrieveFieldsOptimizer.access$000(SolrDocumentFetcher.java:672)\n\tat
>  
> org.apache.solr.search.SolrDocumentFetcher.solrDoc(SolrDocumentFetcher.java:278)\n\tat
>  org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:95)\n\tat 
> org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:59)\n\tat 
> org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184)\n\tat
>  
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136)\n\tat
>  
> org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386)\n\tat
>  
> org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292)\n\tat
>  org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73)\n\tat 
> org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:66)\n\tat
>  
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:811)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:540)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:502)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
>  
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\n\tat
>  org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat 
> org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat
>  
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)\n\tat
>  java.lang.Thread.run(Thread.java:748)\n", "code":500}}{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to