Hey all,

We're trying to improve the load balancing across multiple services 
handling a significant amount of traffic (up to 2M requests per second for 
the biggest ones) and each server is roughly handling 5.5K requests per 
second.
Our services autoscale a lot across the day and we need to discover new 
servers within seconds. For performance and cost efficiency, each client is 
aware of all its servers, so the traffic does not flow through an AWS LB.
So far, to discover new servers, we've been using the max connection age as 
recommended, which lead to some traffic imbalance impacting the performance 
and cost.

To improve the current setup we decided to:

   - Implement our own custom name resolver that periodically polls the 
   available servers.
   - Leverage the round robin load balancing policy to evenly distribute 
   the traffic.

All of this is working well but we are observing latency spike whenever we 
autoscale (scale-in and out).
Our understanding is that round robin will first establish the connection 
before utilizing the sub-channel, so we don't understand the spike when new 
server are discovered.
And the spike when a server is deleted is even more unclear to us.

We've tried many things but without luck, so any explanations or 
suggestions from your side would be really appreciated.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/grpc-io/d2e2226d-eb15-4838-bc95-c1b1548f5d2en%40googlegroups.com.

Reply via email to