This is all great info. Thanks for the patch Wei! It looks reasonable to me
and it's exciting to hear about your results. I agree that your patch +
tlog looks like a good solution at a design level.

Now onto the question of the problem to solve. Bram and Wei cover it well.
The goal at a high level is to better divide the work, as well as decouple
operational factors, between read and write replicas.

Right now, I'm interested in an architecture with a shared collections with
3 replicas per shard, where any one of them may become the leader as fault
tolerance, which I believe tlog plus Wei's patch fits perfectly. This also
doesn't work with Dave's suggestion, though it could be useful for slightly
different setups.

What are the next steps on integrating Wei's patch into main-line for
official release?

Best,
Stephen

On Sun, Jun 13, 2021, 4:46 PM Wei <weiwan...@gmail.com> wrote:

> We did some explorations on excluding reads from the leader in a TLOG +
> PULL cloud. When updates are heavy we do observer query throughput and
> latency improvement.  Added the patch we have bee testing
> https://issues.apache.org/jira/secure/attachment/13026792/SOLR-15472.patch
>
>
> On Fri, Jun 11, 2021 at 11:43 AM Bram Van Dam <bram.van...@intix.eu>
> wrote:
>
> > On 11/06/2021 00.28, Walter Underwood wrote:
> > > Are you trying to send queries to less loaded machines? If so, this
> > won’t do that.
> > > Leaders only do a little bit more work than followers. All indexing
> > processing is local
> > > and that is most of the CPU usage.
> >
> > I suspect that depends on the type of replica.
> >
> > Reducing the load on leaders seems like a valuable feature. We've
> > observed cases where a high query load on leaders caused it to become
> > unresponsive, resulting in a cascade of failures, eventually rendering
> > an entire cluster unusable.
> >
> > In fact, it would also be useful to be able to direct certain queries
> > *only* to leaders when you know that replicas are lagging behind.
> >
> >   - Bram
> >
>

Reply via email to