As Shawn explained, when a TLOG replica is not the leader, it does not index the documents directly but pulls index segments from the leader. However, this operation is generally rather fast - within a second or two - since it copies the changed segments, not the full index (and 70 million docs isn’t usually all that big anyway). I didn’t see where you said how soon you attempt to query the replicas, that might be helpful to know to understand if you really do have a problem.
It’s important to note that a leader does not notify its replicas that it has new segments. This is controlled by the commit configuration - non-leader replicas poll their leader for possible changes at half the interval set for autoCommits, and if that is not set, then half the autoSoftCommit time. Whether you commit or not on the leader has nothing to do with what the replica does - it’s what has been configured for those params. Other factors may make the replication take longer, particularly a slow network or if you’re merging segments down to a single very large segment (then each replica has to pull the entire index every time you make an update). You can see when replication happens by looking at the logs for one of the replicas - if you have the default logging levels enabled, you’ll clearly see messages about polling for new segments, and when replication starts and finishes. If you make an index update and don’t see a replica poll within a reasonable amount of time, you may want to change those commit settings I mentioned. If you see it start but take a long time, it’s more likely the network or you’ve merged down to a big segment that takes a while over the wire. It seems inefficient to me to check who is a leader and only query the leader - that feels very much a workaround for an otherwise misconfigured cluster. The main reason for having replicas is to distribute the query load, and if you direct all your queries to leaders the replicas are basically doing nothing but waiting in case the leader goes down (which, however, is fine if all you care about is disaster recovery, but there are probably lighter weight approaches if that’s your only need). On Sep 8, 2021, 10:02 AM -0500, lstusr 5u93n4 <lstusr...@gmail.com>, wrote: > > Info you might already know: TLOG (and PULL) replicas do not index, > > unless a TLOG replica is the leader, in which case it behaves exactly > > like NRT. A PULL replica can never become leader. > > > > When you have TLOG or PULL replicas, Solr is only going to do indexing > > on the shard leaders. When a commit finishes, it should be done on all > > cores that participate in indexing. > > > > Replication of the completed index segments to TLOG and PULL replicas > > will happen AFTER the commit is done, not concurrently. I don't think > > there's a reliable way of asking Solr to tell you when all replications > > are complete. > > Thanks Shawn, it's good to have this all spelled out. Validates what we're > seeing. > > > Does your "query only the leaders" code check clusterstate in ZK to > > figure out which replicas are leader? Leaders can change in response to > > problems. > > Yeah, exactly. Working implementation is to check > `/collections/<name>/state.json` in ZK to determine the leaders, and put a > watch on that node to react if the cluster state changes. > > I see what you're saying about determining if the replications are > complete. However, querying the leaders post-commit is good enough for our > particular use case, so we'll opt to keep the indexing speed as high as > possible and not wait on the replication before proceeding to the next > group of data. > > Thanks for all your help! > > Kyle > > > On Tue, 7 Sept 2021 at 17:13, Shawn Heisey <apa...@elyograg.org> wrote: > > > On 9/7/2021 3:08 PM, Shawn Heisey wrote: > > > I don't think there's a reliable way of asking Solr to tell you when > > > all replications are complete. > > > > > > You could use the replication handler (/solr/corename/replication) to > > gather this info and compare info from the leader index with info from > > the follower index(es). For this to be reliable, you would need to > > check clusterstate in ZK so you're absolutely sure which cores are > > leaders. I do not know off the top of my head what parameters need to > > be sent to the replication handler to gather that info. > > > > Thanks, > > Shawn > > > >