[ https://issues.apache.org/jira/browse/SOLR-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17648287#comment-17648287 ]
ASF subversion and git services commented on SOLR-16589: -------------------------------------------------------- Commit d558cec6582a6084d0bb163d55b960e00d340a44 in solr's branch refs/heads/branch_9_1 from Kevin Risden [ https://gitbox.apache.org/repos/asf?p=solr.git;h=d558cec6582 ] SOLR-16589: Large fields with large=true can be truncated when using unicode values (#1241) > Large fields with large="true" can be truncated when using unicode values > ------------------------------------------------------------------------- > > Key: SOLR-16589 > URL: https://issues.apache.org/jira/browse/SOLR-16589 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search > Affects Versions: 9.0, 9.1 > Reporter: Nikolas Osvalds > Assignee: Kevin Risden > Priority: Major > Fix For: main (10.0), 9.2 > > Time Spent: 0.5h > Remaining Estimate: 0h > > h3. Summary > For fields using large="true", large fields (which is what they are intended > for) can be truncated in v9+ of Solr. > Example fieldtype definition: > {code:java} > <fieldtype name="string_large" class="solr.TextField" multiValued="false" > indexed="false" stored="true" omitNorms="true" large="true" />{code} > h3. Cause > Looks like this is a bug introduced along with > https://issues.apache.org/jira/browse/LUCENE-8805 / > https://github.com/apache/lucene/issues/9849 > The current code is here: > https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L511 > > {code:java} > public void stringField(FieldInfo fieldInfo, String value) throws IOException > { > Objects.requireNonNull(value, "String value should not be null"); > bytesRef.bytes = value.getBytes(StandardCharsets.UTF_8); > bytesRef.length = value.length(); > {code} > > Specifically with respect to "large" fields handling. > The length in utf8 bytes will often be longer than the string length > `value.length()`, hence the truncation. > h3. Fix > {code:java} > bytesRef.length = bytesRef.bytes.length {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org