Hello Robert,
All in all, pgbench overheads are small compared to postgres processing
times and representative of a reasonably optimized client application.
It's pretty easy to devise tests where pgbench is client-limited --
just try running it with threads = clients/4, sometimes even
clients/2. So I don't buy the idea that this is true in general.
Ok, one thread cannot feed an N core server if enough client are executed
per thread and the server has few things to do.
The point I'm clumsily trying to make is that pgbench-specific overheads
are quite small: Any benchmark driver would have pretty much at least the
same costs, because you have the cpu cost of the tool itself, then the
library it uses, eg lib{pq,c}, then syscalls. Even if the first costs are
reduced to zero, you still have to deal with the database through the
system, and this part will be the same.
As the cost of pgbench itself in a reduced part of the total cpu costs of
running the bench client side, there is no extraordinary improvement to
expect by optimizing this part. This does not mean that pgbench
performance should not be improved, if possible and maintainable.
I'll develop a little more that point in an answer to Andres figures,
which are very interesting, by providing some more figures.
To try to salvage my illustration idea: I could change the name to "demo",
i.e. quite far from "TPC-B", do some extensions to make it differ, eg use
a non-uniform random generator, and then explicitly say that it is a
vaguely inspired by "TPC-B" and intended as a demo script susceptible to
be updated to illustrate new features (eg if using a non-uniform generator
I'd really like to add a permutation layer if available some day).
This way, the "demo" real intention would be very clear.
I do not like this idea at all; "demo" is about as generic a name as
imaginable.
What name would you suggest, if it were to be made available from pgbench
as a builtin, that avoids confusion with "tpcb-like"?
But I have another idea: how about including this script in the
documentation with some explanatory text that describes (a) the ways in
which it is more faithful to TPC-B than what the normal pgbench thing
and (b) the problems that it doesn't solve, as enumerated by Fabien
upthread:
We can put more examples in the documentation, ok.
One of the issue raised by Tom is that claiming faithfulness to TCP-B is
prone to legal issues. Franckly, I do not care about TPC-B, only that it
is a *real* benchmark, and that it allows to illustrate pgbench
capabilities.
Another point is confusion if there are two tpcb-like scripts provided.
So I'm fine with giving up any claim about faithfulness, especially as it
would allow the "demo" script to be more didactic and illustrate more
of pgbench capabilities.
"table creation and data types, performance data collection, database
configuration, pricing of hardware used in the tests, post-benchmark run
checks, auditing constraints, whatever…"
I already put such caveats in comments and in the documentation, but that
does not seem to be enough for Tom.
Perhaps that idea still won't attract any votes, but I throw it out
there for consideration.
I think that adding an illustration section could be fine, but ISTM that
it would still be appropriate to have the example executable. Moreover, I
think that your idea does not fixes the "we need not to make too much
claims about TPC-B to avoid potential legal issues".
--
Fabien.