We are running a large sharded Lucene-based application.
Our configuration supports near real-time updates, by incrementally
Updating documents (using delete then add) on the shards.
Every shard is replicated to several machines in order to improve performance.
We replicate the shard by sending the same deletion and addition commands to 
all the replicas,
Where they may be performed in a different order. (We delete a set of 
documents, say 1000 at a time,
Then add them one-by-one semi-asynchronously).
Lately we have noticed a subtle difference in query scores across different 
replicas of the same shard.
Further investigation showed that the only noticeable difference between the 
replicas was the index directory structure:
1.      Different replicas have different sets of segments - most segment files 
are the same, but some are different.
2.      The numbers of deleted documents are different between two replicas of 
the same shard.
Is this a known behavior of Java Lucene?
How can we change this behavior? We want different replicas returning the exact 
same score per query hits.
(We would rather not optimize the index as we believe this will harm 
performance.)

TIA,
Yuval and Ophir


Reply via email to