On 2/17/19 5:09 PM, Tom Lane wrote: > Fabien COELHO <coe...@cri.ensmp.fr> writes: >>> I'm trying to use random_zipfian() for benchmarking of skewed data sets, >>> and I ran head-first into an issue with rather excessive CPU costs. > >> If you want skewed but not especially zipfian, use exponential which is >> quite cheap. Also zipfian with a > 1.0 parameter does not have to compute >> the harmonic number, so it depends in the parameter. > > Maybe we should drop support for parameter values < 1.0, then. The idea > that pgbench is doing something so expensive as to require caching seems > flat-out insane from here.
Maybe. It's not quite clear to me why we support the two modes at all? We use one algorithm for values < 1.0 and another one for values > 1.0, what's the difference there? Are those distributions materially different? Also, I wonder if just dropping support for parameters < 1.0 would be enough, because the docs say: The function's performance is poor for parameter values close and above 1.0 and on a small range. which seems to suggest it might be slow even for values > 1.0 in some cases. Not sure. > That cannot be seen as anything but a foot-gun > for unwary users. Under what circumstances would an informed user use > that random distribution rather than another far-cheaper-to-compute one? > >> ... This is why I submitted a pseudo-random permutation >> function, which alas does not get much momentum from committers. > > TBH, I think pgbench is now much too complex; it does not need more > features, especially not ones that need large caveats in the docs. > (What exactly is the point of having zipfian at all?) > I wonder about the growing complexity of pgbench too ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services