Re: CPU costs of random_zipfian in pgbench

Tom Lane Sat, 23 Mar 2019 10:02:02 -0700

Fabien COELHO <coe...@cri.ensmp.fr> writes:
> [ pgbench-zipf-doc-3.patch ]


I started to look through this, and the more I looked the more unhappy
I got that we're having this discussion at all.  The zipfian support
in pgbench is seriously over-engineered and under-documented.  As an
example, I was flabbergasted to find out that the end-of-run summary
statistics now include this:

    /* Report zipfian cache overflow */
    for (i = 0; i < nthreads; i++)
    {
        totalCacheOverflows += threads[i].zipf_cache.overflowCount;
    }
    if (totalCacheOverflows > 0)
    {
        printf("zipfian cache array overflowed %d time(s)\n", 
totalCacheOverflows);
    }

What is the point of that, and if there is a point, why is it nowhere
mentioned in pgbench.sgml?  What would a user do with this information,
and how would they know what to do?

I remain of the opinion that we ought to simply rip out support for
zipfian with s < 1.  It's not useful for benchmarking purposes to have
a random-number function with such poor computational properties.
I think leaving it in there is just a foot-gun: we'd be a lot better
off throwing an error that tells people to use some other distribution.

Or if we do leave it in there, we for sure have to have documentation
that *actually* explains how to use it, which this patch still doesn't.
There's nothing suggesting that you'd better not use a large number of
different (n,s) combinations.

                        regards, tom lane

Re: CPU costs of random_zipfian in pgbench

Reply via email to