On Wed, Jul 16, 2014 at 12:57 AM, Fabien COELHO <coe...@cri.ensmp.fr> wrote: >> Well, I think the feedback has been pretty clear, honestly. Here's >> what I'm unhappy about: I can't understand what these options are >> actually doing. > > We can try to improve the documentation, once more! > > However, ISTM that it is not the purpose of pgbench documentation to be a > primer about what is an exponential or gaussian distribution, so the idea > would yet be to have a relatively compact explanation, and that the > interested but clueless reader would document h..self from wikipedia or a > text book or a friend or a math teacher (who could be a friend as well:-).
Well, I think it's a balance. I agree that the pgbench documentation shouldn't try to substitute for a text book or a math teacher, but I also think that you shouldn't necessarily need to refer to a text book or a math teacher in order to figure out how to use pgbench. Saying "it's complicated, so we don't have to explain it" would be a cop out; we need to *make* it simple. And if there's no way to do that, then IMHO we should reject the patch in favor of some future patch that implements something that will be easy for users to understand. >>> [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10 >>> starting vacuum...end. >>> transaction type: Exponential distribution TPC-B (sort of) >>> scaling factor: 1 >>> exponential threshold: 10.00000 >>> >>> decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0% >>> highest/lowest percent of the range: 9.5% 0.0% >> >> I don't have a clue what that means. None. > > Maybe we could add in front of the decile/percent > > "distribution of increasing account key values selected by pgbench:" I still wouldn't know what that meant. And it misses the point anyway: if the documentation is good, this will be unnecessary. If the documentation is bad, a printout that tries to illustrate it by example is not an acceptable substitute. >> Here is an example of an explanation that would make sense to me. >> This is not the actual behavior of your patch, I'm quite sure, so this >> is just an example of the *kind* of explanation that I think is >> needed: > > This is more or less the approximate behavior of the patch, but for 1% of > the range, not 50%. However I'm not sure that the current documentation is > so bad. I think it isn't, because in the system I described, a larger value indicates a flatter distribution, but in the documentation, a smaller value indicates a flatter distribution. That having been said, I agree the current documentation for the exponential distribution is not too bad. But this part does not make sense: + A crude approximation of the distribution is that the most frequent 1% + values are drawn <replaceable>threshold</>% of the time. + The closer to 0.0 the threshold, the flatter (more uniform) the access + distribution. Given the first statement, I'd expect the lowest possible threshold to be 0.01, not 0. The documentation for the Gaussian distribution is in somewhat worse shape. Unlike the documentation for exponential, it makes no attempt at all to give the user a clear idea what the distribution actually looks like. The closest it comes is this: + In other worlds, the larger the <replaceable>threshold</>, + the narrower the access range around the middle. But that's not really very close - there's no way for a user to judge what impact the threshold parameter actually has except to try it. Unlike the discussion of exponential, which contains a fairly-precise mathematical characterization of the behavior, the Gaussian stuff has nothing except a hand-wavy explanation that a higher threshold skews the distribution more. (Also, the English expression is "in other words" not "in other worlds" - but in fact the phrase has no business in that sentence at all, because it is not reiterating the contents of the previous sentence in different language, but rather making a new point entirely. And the following sentence does not start with a capital letter, though maybe that's because it was intended to be incorporated into this sentence somehow.) I think that you also need to consider which instances of the words "gaussian" and "exponential" are referring to the option and which are referring to the abstract mathematical concept. When you're talking about the option, you should use all lower-case (as you've done) but with <literal> tags or similar. When you're referring to the mathematical distribution, Gaussian should be capitalized. BTW, I agree with both Heikki's suggestion that we make these options to setrandom only and not expose command-line options for them, and with Andres's critique that the documentation of those options is far too repetitive. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers