Jesper, thanks a lot for your email, your answer was a hand in the dark forest of doubts.
I will start trying the load generator wrk2. About "instrument, profile, observe", yes, I added the gops agent but until now I don't have any conclusion related to that information. Regards. On Sunday, October 25, 2020 at 8:07:58 AM UTC-3 jesper.lou...@gmail.com wrote: > On Sat, Oct 24, 2020 at 7:30 PM JuanPablo AJ <jpab...@gmail.com> wrote: > > >> I have some doubts related to the HTTP client. >> > > First, if you have unexplained efficiency concerns in a program, you > should profile and instrument. Make the system tell you what is happening > rather than making guesses as to why. With that said, I have some hunches > and experiments you might want to try out. > > When you perform a load test, you have a SUT, or system-under-test. That > is the whole system, including infrastructure around it. I can be a single > program, or a cluster of machines. You also have a load generator, which > generates load on your SUT in order to test different aspects of the SUT: > bandwidth usage, latency in response, capacity limits, resource limits, > etc[1]. Your goal is to figure out if the data you are seeing are within an > acceptable range for your use case, or if you have to work more on the > system to make it fall within the acceptable window. > > Your test is about RTT latency of requests. This will become important. > > One particular problem in your test is that the load generator and the SUT > runs in the same environment. If the test is simple and you are trying to > stress the system maximally, chances are that the load generator impacts > the SUT. That means the latency will rise due to time sharing in the > operating system. > > Second, when measuring latency you should look out for the problem Gil > Tene coined as "coordinated omission". In CO, the problem is that the load > generator and the SUT cooperates in order to deliver the wrong latency > counts. This is especially true if you just fire as many requests as > possible on 50 connections. Under an overload situation, the system will > suffer in latency since that is the only way the system can alleviate > pressure. The problem with CO is that a server can decide to park a couple > of requests and handle the other requests as fast as possible. This can > load to a high number of requests on the active connections, and the > stalled connections become noise in the statistics. You can look up Tene's > `wrk2` project, but I think the ideas were baked back into Will > Glozers wrk at a later point in time (memory eludes me). > > The third point is about the sensitivity of your tests: when you measure > things at the millisecond, microsecond or nanosecond range, your test > becomes far more susceptible to foreign impact. You can generally use > statistical bootstrapping to measure the impact this has on test variance, > which I've done in the past. You start finding all kinds of interesting > corner cases that perturb your benchmarks. Among the more surprising ones: > > * CPU Scaling governors > * Turbo boosting: one core can be run at a higher clock frequency than a > cluster. GC in Go is multicore, so even for a single-core program, this > might have an effect > * CPU heat. Laptop CPUs have miserable thermal cooling compared to a > server or desktop. They can run fast in small bursts, but not for longer > stretches > * Someone using the computer while doing the benchmark > * An open browser window which runs some Javascript in the background > * An open electron app with a rendering of a .gif or .webm file > * Playing music while performing the benchmark, yielding CPU power to the > MP3, Vorbis or AAC decoder > * Amount of incoming network traffic to process for a benchmark that has > nothing to do with the network > > Finally, asynchronous goroutines are still work the program needs to > execute. It isn't free. So as the system is stressed with a higher load you > run higher against the capacity limit, thus incurring slower response > times. In the case where you perform requests in the background to another > HTTP server, you are taking a slice of the available resources. You are > also generating as much work internally as is coming in externally. In a > real world server, this is usually a bad idea and you must put a resource > limit in place. Otherwise an aggressive client can overwhelm your server. > The trick is to slow the caller down by *not* responding right away if you > are overloaded internally. > > You should check your kernel. When you perform a large amount of requests > on the same machine, you can run into limits in the number of TCP source > ports if they are rotated too fast. It is a common problem when the load > generator and SUT are on the same host. > > You should check your HTTP client configuration as well. One way to avoid > the above problem is to maximize connection reuse, but then you risk > head-of-line blocking on the connections, even (or perhaps even more so) in > the HTTP/2 case. > > But above all: instrument, profile, observe. Nothing beats data and plots. > > [1] SLI, SLOs etc. A good starting point is > https://landing.google.com/sre/sre-book/chapters/service-level-objectives/ > but that book is worth it for a full read. > https://landing.google.com/sre/books/ too! > > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/17ce5bef-3c11-4284-a994-7a11375cd3d5n%40googlegroups.com.