Hello Andres,
On 2019-06-11 11:31:15 +0300, Heikki Linnakangas wrote:
It's not fair to say that its meaning was changed. Before 9.4, there was no
-R option.
Well, my point is that -R changed the existing meaning of a field,
I do not think it does, because the client and transaction latencies are
actually the same without -R, so nothing is changed.
When the two concepts started to differ, I choose the one interpretion
that I thought best, and for me the whole point of running pgbench with -R
is to look at the client latency, i.e. to answer to the question "is the
database keeping up with the scheduled load", so this is the one
displayed.
and that that's not nice.
Well, that is clearly not what you would have expected.
Yeah, I can see that the server-observed transaction latency would often be
more useful than what's printed now. But changing the current meaning
doesn't seem like a good idea.
Well, then a *new* column should have been added for that value under
-R. Although admittedly 'lat' is not a very good descriptor for the non
-R behaviour.
But anyway, to go forward, I think we should replace 'lat' with a
'txtime' (or similar) column that is not affected by -R. And then, under
-R only, add a new 'txlat' column, that shows the 'current' meaning of
lat under -R. Not convinced the names are right, but you get the gist.
I do not have a strong opinion. "lat" says latency in a short form
constrained by a one line output. What precise latency is displayed is
explained in the doc.
As Fabien said, the reported latency is the latency seen by the
imaginary user of the system, and without -R, there's no lag so it's the
same number. See also how it works with --latency-limit. The limit is on the
reported latency, which includes the lag.
Well, that's how it works in a lot of scenarios (e.g. interactive
scenarios, where users give up). But there's also a lot where the amount
of work doesn't decrease due to lateness, it just queues up (e.g. many
batch / queue processing workloads).
Sure. In which case --latency-limit should not be used.
The outlier behavior seems very visible in both of the above. The system
completely stalled for about 1-2 seconds. And then it takes a few seconds to
process the backlog and catch up.
But that was just because I was showing a simplistic example. E.g.
here's a log of a vacuum finishing, and then another starting a few
seconds later (both vacuums lasting a fair while):
progress: 139.0 s, 2438.4 tps, txtime 13.033 ms stddev 3.830, lat 17530.784 ms
stddev 590.153, lag 17517.751 ms
progress: 140.0 s, 2489.0 tps, txtime 12.911 ms stddev 3.642, lat 17752.862 ms
stddev 600.661, lag 17739.952 ms
progress: 141.0 s, 2270.0 tps, txtime 14.021 ms stddev 4.965, lat 17973.805 ms
stddev 594.784, lag 17959.784 ms
progress: 142.0 s, 1408.0 tps, txtime 22.848 ms stddev 5.365, lat 18417.808 ms
stddev 632.729, lag 18394.960 ms
progress: 143.0 s, 3001.0 tps, txtime 10.724 ms stddev 4.318, lat 18796.971 ms
stddev 617.462, lag 18786.247 ms
progress: 144.0 s, 4678.0 tps, txtime 6.823 ms stddev 2.136, lat 18503.253 ms
stddev 669.072, lag 18496.431 ms
progress: 145.0 s, 4577.0 tps, txtime 7.001 ms stddev 1.526, lat 18108.596 ms
stddev 689.843, lag 18101.596 ms
progress: 146.0 s, 2596.0 tps, txtime 12.261 ms stddev 3.060, lat 17961.623 ms
stddev 683.498, lag 17949.363 ms
progress: 147.0 s, 2654.0 tps, txtime 12.072 ms stddev 3.282, lat 18120.009 ms
stddev 685.074, lag 18107.938 ms
progress: 148.0 s, 3471.0 tps, txtime 9.240 ms stddev 3.702, lat 18251.712 ms
stddev 676.572, lag 18242.472 ms
progress: 149.0 s, 3056.0 tps, txtime 10.468 ms stddev 5.131, lat 18058.950 ms
stddev 675.334, lag 18048.482 ms
progress: 150.0 s, 2319.0 tps, txtime 13.778 ms stddev 3.762, lat 18305.101 ms
stddev 688.186, lag 18291.323 ms
progress: 151.0 s, 2355.0 tps, txtime 13.567 ms stddev 3.891, lat 18586.073 ms
stddev 691.656, lag 18572.506 ms
progress: 152.0 s, 2321.0 tps, txtime 13.742 ms stddev 3.708, lat 18835.985 ms
stddev 709.580, lag 18822.244 ms
progress: 153.0 s, 2360.0 tps, txtime 13.604 ms stddev 3.533, lat 19121.166 ms
stddev 709.682, lag 19107.562 ms
The period inbetween where no vacuum was running is imo considerably
harder to spot looking at 'lat'.
ISTM that the signal is pretty clear in whether the lag increases or
decreases. Basically the database is 18 seconds behind its load, which is
very bad if a user is waiting.
I guess you can argue that one can just look at tps instead. But for
other rate limited cases that doesn't work as well:
progress: 121.0 s, 961.0 tps, txtime 3.452 ms stddev 0.947, lat 3.765 ms stddev
1.268, lag 0.313 ms
progress: 122.0 s, 979.0 tps, txtime 5.388 ms stddev 8.737, lat 7.378 ms stddev
11.137, lag 1.990 ms
progress: 123.0 s, 1078.8 tps, txtime 3.679 ms stddev 1.278, lat 4.322 ms
stddev 3.216, lag 0.643 ms
progress: 124.0 s, 1082.2 tps, txtime 5.575 ms stddev 9.790, lat 8.716 ms
stddev 15.317, lag 3.141 ms
progress: 125.0 s, 990.0 tps, txtime 3.489 ms stddev 1.148, lat 3.817 ms stddev
1.456, lag 0.328 ms
progress: 126.0 s, 955.0 tps, txtime 9.284 ms stddev 15.362, lat 14.210 ms
stddev 22.084, lag 4.926 ms
progress: 127.0 s, 960.0 tps, txtime 11.951 ms stddev 11.222, lat 17.732 ms
stddev 21.066, lag 5.781 ms
progress: 128.0 s, 945.9 tps, txtime 11.702 ms stddev 17.590, lat 23.791 ms
stddev 45.327, lag 12.089 ms
progress: 129.0 s, 1013.1 tps, txtime 19.871 ms stddev 19.407, lat 42.530 ms
stddev 39.582, lag 22.659 ms
Although the tps seems stable, something is clearly amiss because the lag
and stddev (both lat & txtime) are increasing, and also the txtime. ISTM
that the information is there, but needs to be interpreted, and this is
never trivial.
here e.g. the was a noticable slowdown, where looking at tps doesn't
help (because we're still able to meet the tps goal).
Sure, that is why there are other informations on the line.
I still don't quite get when I would want to look at lat, when I have
lag. They're always going to be close by.
More or less. Under -R "lag + tx = lat", they are close if tx is small.
The reason I choose so show lat is because I think it is the most
important figure, and I added lag to show what part it represented in the
overall performance. I do not think that showing all 3 is very useful
because it makes the lines too long.
For testing the server under full load, like during that catch up period,
testing without -R seems better.
One area in which postgres is pretty weak, although less bad than we
used to be, is in is predicatable latency. Most production servers
aren't run under the highest possible throughput, therefore optimizing
jitter under loaded but not breakneck speeds is important.
Yes, I completely agree. I spent a lot of time working on the checkpointer
to improve its behavior and reduce large client side latency spikes that
could occur even with a moderate load.
And to be able to localize where such latency is introduced, it's
important to see the precise moment things got slower / where
performance recovered.
For me it is more complicated. The tx could stay low but the system could
lag far behind, unable to catch up. In your first trace above, txtime
seems low enough (~ 10-15 ms), but the system is lagging 18000 ms behind.
I think we should just restore lat to a sane behaviour under -R, and if
you want to have lat + lag as a separate column in -R mode, then let's
do that.
It seems like a bad idea to change the meaning of the value after the fact.
Would be good to at least rename it, to avoid confusion. Maybe that's not
too important for the interactive -P reports, but got to be mindful about
the numbers logged in the log file, at least.
Given that 'lat' is currently in the file, and I would bet the majority
of users of pgbench use it without -R, I'm not convinced that's
something to care about. Most are going to interpret it the way it's
computed without -R.
As said above, without -R we have "lat = txtime" and "lag = 0". It is more
of your intuitive expectations than an issue with what is displayed under
-R, which for me are currently the most important figures from a
benchmarking (client oriented) perspective.
I do not think that there is an issue to fix, but you can do as you feel.
Maybe there could be an option to show the transaction time (that I would
shorten as "tx") instead of "lat", for people who want to focus more on
server performance than on client perceived performance. I can add such an
option, maybe for the next CF or the one afterwards, although I have
already sent quite a few small patches about pgbench already.
--
Fabien.