[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica

Timothy Potter (JIRA) Thu, 16 Jan 2014 13:16:10 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873951#comment-13873951
 ]


Timothy Potter commented on SOLR-4260:
--------------------------------------

Added some more logging on the leader ... as a bit of context, the replica 
received doc with ID 41029 and then 41041 and didn't receive 41033 and 41038 in 
between ... here's the log on the leader of activity between 41029 and then 
41041.

2014-01-16 16:03:02,523 [updateExecutor-1-thread-1] INFO  
solrj.impl.ConcurrentUpdateSolrServer  - sent docs to 
[http://ec2-107-21-55-0.compute-1.amazonaws.com:8985/solr/test3_shard3_replica1]
 , 41003, 41
005, 41007, 41010, 41014, 41015, 41026, 41029
2014-01-16 16:03:02,527 [qtp417447538-16] INFO  handler.loader.JavabinLoader  - 
test3_shard3_replica2 add: 41033
2014-01-16 16:03:02,527 [qtp417447538-16] INFO  
update.processor.DistributedUpdateProcessor  - doLocalAdd 41033
2014-01-16 16:03:02,527 [qtp417447538-16] INFO  
solrj.impl.ConcurrentUpdateSolrServer  - test3_shard3_replica2 queued (to: 
http://ec2-107-21-55-0.compute-1.amazonaws.com:8985/solr/test3_shard3_replica1):
 41033
2014-01-16 16:03:02,528 [qtp417447538-16] INFO  handler.loader.JavabinLoader  - 
test3_shard3_replica2 add: 41038
2014-01-16 16:03:02,528 [qtp417447538-16] INFO  
update.processor.DistributedUpdateProcessor  - doLocalAdd 41038
2014-01-16 16:03:02,528 [qtp417447538-16] INFO  
solrj.impl.ConcurrentUpdateSolrServer  - test3_shard3_replica2 queued (to: 
http://ec2-107-21-55-0.compute-1.amazonaws.com:8985/solr/test3_shard3_replica1):
 41038
2014-01-16 16:03:02,559 [qtp417447538-16] INFO  
solrj.impl.ConcurrentUpdateSolrServer  - blockUntilFinished starting 
http://ec2-107-21-55-0.compute-1.amazonaws.com:8985/solr/test3_shard3_replica1
2014-01-16 16:03:02,559 [qtp417447538-16] INFO  
solrj.impl.ConcurrentUpdateSolrServer  - blockUntilFinished is done for 
http://ec2-107-21-55-0.compute-1.amazonaws.com:8985/solr/test3_shard3_replica1
2014-01-16 16:03:02,559 [qtp417447538-16] INFO  
solrj.impl.ConcurrentUpdateSolrServer  - shutting down CUSS for 
http://ec2-107-21-55-0.compute-1.amazonaws.com:8985/solr/test3_shard3_replica1
2014-01-16 16:03:02,559 [qtp417447538-16] INFO  
solrj.impl.ConcurrentUpdateSolrServer  - shut down CUSS for 
http://ec2-107-21-55-0.compute-1.amazonaws.com:8985/solr/test3_shard3_replica1

Not quite sure what this means but I think you're hunch about 
blockUntilFinished being involved is getting warmer

> Inconsistent numDocs between leader and replica
> -----------------------------------------------
>
>                 Key: SOLR-4260
>                 URL: https://issues.apache.org/jira/browse/SOLR-4260
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>         Environment: 5.0.0.2013.01.04.15.31.51
>            Reporter: Markus Jelsma
>            Assignee: Mark Miller
>            Priority: Critical
>             Fix For: 5.0, 4.7
>
>         Attachments: 192.168.20.102-replica1.png, 
> 192.168.20.104-replica2.png, clusterstate.png, 
> demo_shard1_replicas_out_of_sync.tgz
>
>
> After wiping all cores and reindexing some 3.3 million docs from Nutch using 
> CloudSolrServer we see inconsistencies between the leader and replica for 
> some shards.
> Each core hold about 3.3k documents. For some reason 5 out of 10 shards have 
> a small deviation in then number of documents. The leader and slave deviate 
> for roughly 10-20 documents, not more.
> Results hopping ranks in the result set for identical queries got my 
> attention, there were small IDF differences for exactly the same record 
> causing a record to shift positions in the result set. During those tests no 
> records were indexed. Consecutive catch all queries also return different 
> number of numDocs.
> We're running a 10 node test cluster with 10 shards and a replication factor 
> of two and frequently reindex using a fresh build from trunk. I've not seen 
> this issue for quite some time until a few days ago.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica

Reply via email to