: Schema meets the requirements for Atomic Update, so we are doing a migration
: by querying the old cluster and writing to the new cluster. We are doing it in
: batches by filtering on one of the fields, and using cursorMark to efficiently
: page through the results.
: The query thread gets batches of 10000 documents and dumps them on a 
: One of the batches always indexes 5 fewer documents than numFound.  It's
: consistent -- always 5 documents.  Updates are paused during the migration.
: On the last run, numFound for this batch was 3824942 and the indexed count was
: 3824937.

I assume you mean one of the batches always indexes 5 fewer documents then 
'rows=N' param (ie: the query batch size) ... correct?   

You're talking about the total numFound being higher then the index count?

: The other idea I have is that there could be a uniqueKey value that appears in
: more than one shard.  This doesn't seem likely, as the compositeId router

Also possible is that sme shards are out of sync with their leader -- ie: 
for some shardX, replica1 has a doc that replica2 doesn't, and replica1 is 
used for the initial phase of the request to get the "top N sorted doc 
uniqueKey at cursorMark=ZZZ" but replica2 is used in the second phase to 
fetch all of the field values.  (but if that were the case, you'd expect 
that at least some of the time you'd get "lucky" and the two phases would 
both hit replicas that agreeed with eachother -- even if they didn't agree 
with the leader -- and the problem wouldn't reliably reproduce every time)

: should keep that from happening.  Is there a way to detect this situation?  I

I would log every cursorMark request URL and the number of docs in the 

If, at the end of the run, you see a cursorMark value that didn't return 
the same number of docs as your rows param (ignoring the last batch which 
you expect to be smaller) then go manually re-run that query against every 
replica of every shard using `distrib=false` and diff the responses from 
each replica of the same shard


Reply via email to