On 9/7/2021 10:01 AM, lstusr 5u93n4 wrote:
Seems like our experimentation is showing that it doesn't at least for TLOG replica types. If we bound the query to the leaders, we can get accurate results immediately after the commit. If we don't add that restriction, sometimes the results sometimes won't show the groups of data that were indexed in the previous.
Info you might already know: TLOG (and PULL) replicas do not index, unless a TLOG replica is the leader, in which case it behaves exactly like NRT. A PULL replica can never become leader.
When you have TLOG or PULL replicas, Solr is only going to do indexing on the shard leaders. When a commit finishes, it should be done on all cores that participate in indexing.
Replication of the completed index segments to TLOG and PULL replicas will happen AFTER the commit is done, not concurrently. I don't think there's a reliable way of asking Solr to tell you when all replications are complete.
If all replicas were NRT, then I think you wouldn't have this problem. But indexing is slower, because all replicas are going to do it, mostly concurrently. In some cases the slowdown might be significant.
Does your "query only the leaders" code check clusterstate in ZK to figure out which replicas are leader? Leaders can change in response to problems.
Thanks, Shawn