[jira] [Commented] (SOLR-7867) implicit sharded, facet grouping problem with multivalued string field starting with digits

JIRA Wed, 12 Aug 2015 05:09:07 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693384#comment-14693384
 ]


Gürkan Vural commented on SOLR-7867:
------------------------------------

I can confirm that such a bug exists. Some specific positioned documents in the 
index are causing this error. If you filter the group/facet query to return 
only this document the error still exists. For my specific document in the 
readTerm function start and suffix are computed as 32 and 9 respectively. 
However term.bytes array has length only 37. If you update the document with 
the same values the problem disappears. I assume this is because the position 
in the index is changing.

> implicit sharded, facet grouping problem with multivalued string field 
> starting with digits
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7867
>                 URL: https://issues.apache.org/jira/browse/SOLR-7867
>             Project: Solr
>          Issue Type: Bug
>          Components: faceting, SolrCloud
>    Affects Versions: 5.2
>         Environment: 3.13.0-48-generic #80-Ubuntu SMP x86_64 GNU/Linux
> java version "1.7.0_80"
> Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
> Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
>            Reporter: Umut Erogul
>              Labels: docValues, facet, group, sharding
>         Attachments: DocValuesException.PNG, ErrorReadingDocValues.PNG
>
>
> related parts @ schema.xml:
> {code}<field name="keyword_ss" type="string" indexed="true" stored="true" 
> docValues="true" multiValued="true"/>
> <field name="author_s" type="string" indexed="true" stored="true" 
> docValues="true"/>{code}
> every document has valid author_s and keyword_ss fields;
> we can make successful facet group queries on single node, single collection, 
> solr-4.9.0 server
> {code}
> q: *:* fq: keyword_ss:3m
> facet=true&facet.field=keyword_ss&group=true&group.field=author_s&group.facet=true
> {code}
> when querying on solr-5.2.0 server with implicit sharded environment with:
> {code}<!-- router.field -->
> <field name="shard_name" type="string" indexed="true" stored="true" 
> required="true"/>{code}
> with example shard names; affinity1 affinity2 affinity3 affinity4
> the same query with same documents gets:
> {code}
> ERROR - 2015-08-04 08:15:15.222; [document affinity3 core_node32 
> document_affinity3_replica2] org.apache.solr.common.SolrException; 
> org.apache.solr.common.SolrException: Exception during facet.field: keyword_ss
>         at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:632)
>         at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:617)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:571)
>         at 
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:642)
> ...
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>         at 
> org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer$CompressedBinaryDocValues$CompressedBinaryTermsEnum.readTerm(Lucene50DocValuesProducer.java:1008)
>         at 
> org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer$CompressedBinaryDocValues$CompressedBinaryTermsEnum.next(Lucene50DocValuesProducer.java:1026)
>         at 
> org.apache.lucene.search.grouping.term.TermGroupFacetCollector$MV$SegmentResult.nextTerm(TermGroupFacetCollector.java:373)
>         at 
> org.apache.lucene.search.grouping.AbstractGroupFacetCollector.mergeSegmentResults(AbstractGroupFacetCollector.java:91)
>         at 
> org.apache.solr.request.SimpleFacets.getGroupedCounts(SimpleFacets.java:541)
>         at 
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:463)
>         at 
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:386)
>         at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:626)
>         ... 33 more
> {code}
> all the problematic queries are caused by strings starting with digits; 
> ("3m", "8 saniye", "2 broke girls", "1v1y")
> there are some strings that the query works like ("24", "90+", "45 dakika")
> we do not observe the problem when querying with 
> -keyword_ss:(0-9)*
> updating the problematic documents (a small subset of keyword_ss:(0-9)*), 
> fixes the query, 
> but we cannot find an easy solution to find the problematic documents
> there is around 400m docs; seperated at 28 shards; 
> -keyword_ss:(0-9)* matches %97 of documents



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-7867) implicit sharded, facet grouping problem with multivalued string field starting with digits

Reply via email to