Hi all,

I recently inherited a team/app that has been running on a single instance
of SOLR for many years. An attempt was made to migrate to a 10 node cluster
configuration and we immediately encountered some issues which appear to be
related to the fact that data is being read from nodes where data
replication had not yet completed. The highlights:


   - 10 node cluster with 5 instances per DC with a mix of NRT and TLOG
   - Data is sourced from another system in large batches throughout the
   day (another system triggers our system on an adhoc basis, which then
   refreshes data from the upstream system).
      - These updates take from minutes to up to 2 hours
      - We have an autoCommit of every 1 min and autoSoftCommit every 1 sec
   - We also have numerous background processes which kick off on a
   schedule (some every 15 mins, some hourly, some daily) which execute
   queries and perform a variety of actions based on the current state of the
   data
      - e.g. New records = send an email notifying users of some things
      they need to do
      - e.g. Removed records = send an email notifying users of some updates
      - (Significantly more complex than this.)
      - Background jobs are NOT aware of whether or not a refresh (first
      bullet) is currently underway
   - Based on our investigation, we *think* our application is getting
   incomplete results when executing queries during / shortly after data
   refreshes, and making incorrect decisions (e.g. notifying users that some
   records were removed when they actually weren't, followed by a future
   notification that the records are back)


Would appreciate any advice or things to consider based on the above.

Thank you!

Reply via email to