[
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873951#comment-13873951
]
Timothy Potter commented on SOLR-4260:
--------------------------------------
Added some more logging on the leader ... as a bit of context, the replica
received doc with ID 41029 and then 41041 and didn't receive 41033 and 41038 in
between ... here's the log on the leader of activity between 41029 and then
41041.
2014-01-16 16:03:02,523 [updateExecutor-1-thread-1] INFO
solrj.impl.ConcurrentUpdateSolrServer - sent docs to
[http://ec2-107-21-55-0.compute-1.amazonaws.com:8985/solr/test3_shard3_replica1]
, 41003, 41
005, 41007, 41010, 41014, 41015, 41026, 41029
2014-01-16 16:03:02,527 [qtp417447538-16] INFO handler.loader.JavabinLoader -
test3_shard3_replica2 add: 41033
2014-01-16 16:03:02,527 [qtp417447538-16] INFO
update.processor.DistributedUpdateProcessor - doLocalAdd 41033
2014-01-16 16:03:02,527 [qtp417447538-16] INFO
solrj.impl.ConcurrentUpdateSolrServer - test3_shard3_replica2 queued (to:
http://ec2-107-21-55-0.compute-1.amazonaws.com:8985/solr/test3_shard3_replica1):
41033
2014-01-16 16:03:02,528 [qtp417447538-16] INFO handler.loader.JavabinLoader -
test3_shard3_replica2 add: 41038
2014-01-16 16:03:02,528 [qtp417447538-16] INFO
update.processor.DistributedUpdateProcessor - doLocalAdd 41038
2014-01-16 16:03:02,528 [qtp417447538-16] INFO
solrj.impl.ConcurrentUpdateSolrServer - test3_shard3_replica2 queued (to:
http://ec2-107-21-55-0.compute-1.amazonaws.com:8985/solr/test3_shard3_replica1):
41038
2014-01-16 16:03:02,559 [qtp417447538-16] INFO
solrj.impl.ConcurrentUpdateSolrServer - blockUntilFinished starting
http://ec2-107-21-55-0.compute-1.amazonaws.com:8985/solr/test3_shard3_replica1
2014-01-16 16:03:02,559 [qtp417447538-16] INFO
solrj.impl.ConcurrentUpdateSolrServer - blockUntilFinished is done for
http://ec2-107-21-55-0.compute-1.amazonaws.com:8985/solr/test3_shard3_replica1
2014-01-16 16:03:02,559 [qtp417447538-16] INFO
solrj.impl.ConcurrentUpdateSolrServer - shutting down CUSS for
http://ec2-107-21-55-0.compute-1.amazonaws.com:8985/solr/test3_shard3_replica1
2014-01-16 16:03:02,559 [qtp417447538-16] INFO
solrj.impl.ConcurrentUpdateSolrServer - shut down CUSS for
http://ec2-107-21-55-0.compute-1.amazonaws.com:8985/solr/test3_shard3_replica1
Not quite sure what this means but I think you're hunch about
blockUntilFinished being involved is getting warmer
> Inconsistent numDocs between leader and replica
> -----------------------------------------------
>
> Key: SOLR-4260
> URL: https://issues.apache.org/jira/browse/SOLR-4260
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Environment: 5.0.0.2013.01.04.15.31.51
> Reporter: Markus Jelsma
> Assignee: Mark Miller
> Priority: Critical
> Fix For: 5.0, 4.7
>
> Attachments: 192.168.20.102-replica1.png,
> 192.168.20.104-replica2.png, clusterstate.png,
> demo_shard1_replicas_out_of_sync.tgz
>
>
> After wiping all cores and reindexing some 3.3 million docs from Nutch using
> CloudSolrServer we see inconsistencies between the leader and replica for
> some shards.
> Each core hold about 3.3k documents. For some reason 5 out of 10 shards have
> a small deviation in then number of documents. The leader and slave deviate
> for roughly 10-20 documents, not more.
> Results hopping ranks in the result set for identical queries got my
> attention, there were small IDF differences for exactly the same record
> causing a record to shift positions in the result set. During those tests no
> records were indexed. Consecutive catch all queries also return different
> number of numDocs.
> We're running a 10 node test cluster with 10 shards and a replication factor
> of two and frequently reindex using a fresh build from trunk. I've not seen
> this issue for quite some time until a few days ago.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]