[
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882958#comment-13882958
]
Steve Rowe commented on SOLR-5652:
----------------------------------
bq. It looks to me like there are two problems here: 1) the same doc is showing
up on different pages when deep paging; and 2) missing docvalue docs are sorted
incorrectly.
I think I understand problem #2: non-multi-valued numeric and string fields are
created (by TrieField's and StrField's createFields() methods) as
NumericDocValuesField-s and SortedDocValuesField-s, respectively, and these
require each doc to have a value, which apparently defaults to zero for
NumericDocValuesField-s and the empty string for SortedDocValueField-s.
Here are the declarations for the field types that have this problem in
DistribCursorPagingTest (from schema-sorts.xml):
{code:xml}
<fieldtype name="str_dv_last" class="solr.StrField" stored="true"
indexed="false" docValues="true" sortMissingLast="true"/>
<fieldtype name="str_dv_first" class="solr.StrField" stored="true"
indexed="false" docValues="true" sortMissingFirst="true"/>
<fieldtype name="int_dv_last" class="solr.TrieIntField" stored="true"
indexed="false" docValues="true" sortMissingLast="true"/>
<fieldtype name="int_dv_first" class="solr.TrieIntField" stored="true"
indexed="false" docValues="true" sortMissingFirst="true"/>
<fieldtype name="long_dv_last" class="solr.TrieLongField" stored="true"
indexed="false" docValues="true" sortMissingLast="true"/>
<fieldtype name="long_dv_first" class="solr.TrieLongField" stored="true"
indexed="false" docValues="true" sortMissingFirst="true"/>
<fieldtype name="float_dv_last" class="solr.TrieFloatField" stored="true"
indexed="false" docValues="true" sortMissingLast="true"/>
<fieldtype name="float_dv_first" class="solr.TrieFloatField" stored="true"
indexed="false" docValues="true" sortMissingFirst="true"/>
<fieldtype name="double_dv_last" class="solr.TrieDoubleField" stored="true"
indexed="false" docValues="true" sortMissingLast="true"/>
<fieldtype name="double_dv_first" class="solr.TrieDoubleField" stored="true"
indexed="false" docValues="true" sortMissingFirst="true"/>
{code}
I think that the above declarations should by disallowed by Solr, because they
contain docValues="true" + sortMissing<Last|First>="true"; the user is asking
for a particular sorting behavior for missing values, when there never will be
missing values.
Also, the Solr Ref Guide
[says|https://cwiki.apache.org/confluence/display/solr/DocValues] about
docvalue fields "If this type is used, the field must be either required or
have a default value, meaning every document must have a value for this field."
However, neither the above field types nor the fields using them are required
or have a default specified. Maybe this should be enforced by schema parsing?
> Heisenbug in DistribCursorPagingTest: "walk already seen ..."
> -------------------------------------------------------------
>
> Key: SOLR-5652
> URL: https://issues.apache.org/jira/browse/SOLR-5652
> Project: Solr
> Issue Type: Bug
> Reporter: Hoss Man
> Assignee: Hoss Man
> Attachments: 129.log, 372.log,
> jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt,
> jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt
>
>
> Several times now, Uwe's jenkins has encountered a "walk already seen ..."
> assertion failure from DistribCursorPagingTest that I've been unable to
> fathom, let alone reproduce (although sarowe was able to trigger a similar,
> non-reproducible seed, failure on his machine)
> Using this as a tracking issue to try and make sense of it.
> Summary of things noticed so far:
> * So far only seen on http://jenkins.thetaphi.de & sarowe's mac
> * So far seen on MacOSX and Linux
> * So far seen on branch 4x and trunk
> * So far seen on Java6, Java7, and Java8
> * fails occured in first block of randomized testing:
> ** we've indexed a small number of randomized docs
> ** we're explicitly looping over every field and sorting in both directions
> * fails were sorting on one of the "\*_dv_last" or "\*_dv_first" fields
> (docValues=true, either sortMissingLast=true OR sortMissingFirst=true)
> ** for desc sorts, sort on same field asc has worked fine just before this
> (fields are in arbitrary order, but "asc" always tried before "desc")
> ** sorting on some other random fields has sometimes been tried before this
> and worked
> (specifics of each failure seen in the wild recorded in comments)
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]