[
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13824295#comment-13824295
]
Jessica Cheng commented on SOLR-4260:
-------------------------------------
{quote}
This shouldn't be the case, because those updates will only have been ack'd if
each replica received them.
{quote}
That's what I thought too, but doesn't seem to be the case in the code. If you
take a look at DistributedUpdateProcessor.doFinish(),
{quote}
// if its a forward, any fail is a problem -
// otherwise we assume things are fine if we got it locally
// until we start allowing min replication param
if (errors.size() > 0) {
// if one node is a RetryNode, this was a forward request
if (errors.get(0).req.node instanceof RetryNode) {
rsp.setException(errors.get(0).e);
}
// else
// for now we don't error - we assume if it was added locally, we
// succeeded
}
{quote}
It then starts a thread to urge the replica to recover, but if that fails, it
just completely gives up.
> Inconsistent numDocs between leader and replica
> -----------------------------------------------
>
> Key: SOLR-4260
> URL: https://issues.apache.org/jira/browse/SOLR-4260
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 5.0
> Environment: 5.0.0.2013.01.04.15.31.51
> Reporter: Markus Jelsma
> Priority: Critical
> Fix For: 5.0
>
> Attachments: 192.168.20.102-replica1.png,
> 192.168.20.104-replica2.png, clusterstate.png
>
>
> After wiping all cores and reindexing some 3.3 million docs from Nutch using
> CloudSolrServer we see inconsistencies between the leader and replica for
> some shards.
> Each core hold about 3.3k documents. For some reason 5 out of 10 shards have
> a small deviation in then number of documents. The leader and slave deviate
> for roughly 10-20 documents, not more.
> Results hopping ranks in the result set for identical queries got my
> attention, there were small IDF differences for exactly the same record
> causing a record to shift positions in the result set. During those tests no
> records were indexed. Consecutive catch all queries also return different
> number of numDocs.
> We're running a 10 node test cluster with 10 shards and a replication factor
> of two and frequently reindex using a fresh build from trunk. I've not seen
> this issue for quite some time until a few days ago.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]