On 2019-06-11 08:36:55 +0200, Fabien COELHO wrote:
This behavior under -R is fully voluntary, and the result above just show
that the database cannot really keep up with the load, which is simply the
case, so for me it is okay to show bad figures.
I mean, you just turned one named value, into a different one, without
renaming it. And the new meaning under -R, is basically the same as one
that's already there (lag). Also note that it also can actually keep up
in the above example.
The idea under throttling is to model a client which would want the result
of a query at a certain point in time, say a query for a web page which is
being generated, which is the scheduled time. It is the when the client
knows it wants an answer. If it is not processed immediately, that is bad
for its client perceived latency.
Whether this is due to lag (i.e. the server is loaded and cannot start to
process the answer) or because the server is slow to answer is irrelevant,
the client is waiting, the web page is not generated, the system is slow. So
latency under -R is really "client latency", not only query latency, as it
is documented.
What does that have to do with incorporating the same data into both lag
and latency? I just fail to see what the point is, except to make it
unnecessarily harder to compare postgres' behaviour under both a
throttled and push-it-to-the-breaking point loads.
How long individual transactions take, and how much variance there is in
that, is something *crucial* to optimize for. *Especially* when the
machine/load is provisioned in a way to not overload the machine.
How is e.g.
progress: 1.6 s, 0.0 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms
progress: 2.0 s, 103546.5 tps, lat 1584.161 ms stddev 35.589, lag 1582.043 ms
progress: 3.0 s, 108535.2 tps, lat 1347.619 ms stddev 101.782, lag 1346.170 ms
progress: 4.0 s, 108528.8 tps, lat 996.603 ms stddev 106.052, lag 995.159 ms
progress: 5.0 s, 109468.8 tps, lat 633.464 ms stddev 108.483, lag 632.030 ms
progress: 6.0 s, 110606.7 tps, lat 252.923 ms stddev 110.391, lag 251.505 ms
progress: 7.0 s, 84253.3 tps, lat 6.829 ms stddev 15.067, lag 6.423 ms
progress: 8.0 s, 80470.7 tps, lat 0.142 ms stddev 0.079, lag 0.017 ms
progress: 9.0 s, 80104.2 tps, lat 0.142 ms stddev 0.081, lag 0.017 ms
progress: 10.0 s, 80277.0 tps, lat 0.152 ms stddev 0.150, lag 0.017 ms
the lat column adds basically nothing over the lag column here.
more useful than:
progress: 1.3 s, 0.0 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms
progress: 2.0 s, 116315.6 tps, lat 1.425 ms stddev 1.440, lag 1087.076 ms
progress: 3.0 s, 113526.2 tps, lat 1.390 ms stddev 0.408, lag 709.908 ms
progress: 4.0 s, 111816.4 tps, lat 1.407 ms stddev 0.399, lag 302.866 ms
progress: 5.0 s, 88061.9 tps, lat 0.543 ms stddev 0.652, lag 16.526 ms
progress: 6.0 s, 80045.4 tps, lat 0.128 ms stddev 0.079, lag 0.017 ms
progress: 7.0 s, 79636.3 tps, lat 0.124 ms stddev 0.073, lag 0.016 ms
progress: 8.0 s, 80535.3 tps, lat 0.125 ms stddev 0.073, lag 0.016 ms
where I can see that the transactions are now actually fast enough.
Obviously this is a toy example, but this really make -R close to
useless to me. I often want to switch from a unthrottled to a 90% load,
and improve the outlier beheaviour - but that outlier behaviour is
hidden due to this redefinition of lat (as the issue is now reported
over a much longer period of time, as it includes lag).