Hi,

On 2020-03-01 09:45:40 -0600, Justin Pryzby wrote:
> +/*
> + * Show hash bucket stats and (optionally) memory.
> + */
> +static void
> +show_tuplehash_info(HashTableInstrumentation *inst, ExplainState *es)
> +{
> +     long    spacePeakKb_tuples = (inst->space_peak_tuples + 1023) / 1024,
> +             spacePeakKb_hash = (inst->space_peak_hash + 1023) / 1024;

Let's not add further uses of long. It's terrible because it has
different widths on 64bit windows and everything else. Specify the
widths explicitly, or use something like size_t.


> +     if (es->format != EXPLAIN_FORMAT_TEXT)
> +     {
> +             ExplainPropertyInteger("Hash Buckets", NULL,
> +                                                        inst->nbuckets, es);
> +             ExplainPropertyInteger("Original Hash Buckets", NULL,
> +                                                        
> inst->nbuckets_original, es);
> +             ExplainPropertyInteger("Peak Memory Usage (hashtable)", "kB",
> +                                                        spacePeakKb_hash, 
> es);
> +             ExplainPropertyInteger("Peak Memory Usage (tuples)", "kB",
> +                                                        spacePeakKb_tuples, 
> es);

And then you're passing the long to ExplainPropertyInteger which accepts
a int64, making the use of long above particularly suspicious.

I wonder if it would make sense to add a ExplainPropertyBytes(), that
would do the rounding etc automatically. It could then also switch units
as appropriate.


> +     }
> +     else if (!inst->nbuckets)
> +             ; /* Do nothing */
> +     else
> +     {
> +             if (inst->nbuckets_original != inst->nbuckets)
> +             {
> +                     ExplainIndentText(es);
> +                     appendStringInfo(es->str,
> +                                             "Buckets: %ld (originally %ld)",
> +                                             inst->nbuckets,
> +                                             inst->nbuckets_original);
> +             }
> +             else
> +             {
> +                     ExplainIndentText(es);
> +                     appendStringInfo(es->str,
> +                                             "Buckets: %ld",
> +                                             inst->nbuckets);
> +             }
> +
> +             if (es->analyze)
> +                     appendStringInfo(es->str,
> +                                     "  Memory Usage: hashtable: %ldkB, 
> tuples: %ldkB",
> +                                     spacePeakKb_hash, spacePeakKb_tuples);
> +             appendStringInfoChar(es->str, '\n');

I'm not sure I like the alternative output formats here. All the other
fields are separated with a comma, but the original size is in
parens. I'd probably just format it as "Buckets: %lld " and then add
", Original Buckets: %lld" when differing.

Also, %ld is problematic, because it's only 32bit wide on some platforms
(including 64bit windows), but 64bit on others. The easiest way to deal
with that is to use %lld and cast the argument to long long - yes that's
a sad workaround.


> +/* Update instrumentation stats */
> +void
> +UpdateTupleHashTableStats(TupleHashTable hashtable, bool initial)
> +{
> +     hashtable->instrument.nbuckets = hashtable->hashtab->size;
> +     if (initial)
> +     {
> +             hashtable->instrument.nbuckets_original = 
> hashtable->hashtab->size;
> +             hashtable->instrument.space_peak_hash = 
> hashtable->hashtab->size *
> +                     sizeof(TupleHashEntryData);
> +             hashtable->instrument.space_peak_tuples = 0;
> +     }
> +     else
> +     {
> +#define maxself(a,b) a=Max(a,b)
> +             /* hashtable->entrysize includes additionalsize */
> +             maxself(hashtable->instrument.space_peak_hash,
> +                             hashtable->hashtab->size * 
> sizeof(TupleHashEntryData));
> +             maxself(hashtable->instrument.space_peak_tuples,
> +                             hashtable->hashtab->members * 
> hashtable->entrysize);
> +#undef maxself
> +     }
> +}

Not a fan of this macro.

I'm also not sure I understand what you're trying to do here?


> diff --git a/src/test/regress/expected/aggregates.out 
> b/src/test/regress/expected/aggregates.out
> index f457b5b150..b173b32cab 100644
> --- a/src/test/regress/expected/aggregates.out
> +++ b/src/test/regress/expected/aggregates.out
> @@ -517,10 +517,11 @@ order by 1, 2;
>           ->  HashAggregate
>                 Output: s2.s2, sum((s1.s1 + s2.s2))
>                 Group Key: s2.s2
> +               Buckets: 4
>                 ->  Function Scan on pg_catalog.generate_series s2
>                       Output: s2.s2
>                       Function Call: generate_series(1, 3)
> -(14 rows)
> +(15 rows)

These tests probably won't be portable. The number of hash buckets
calculated will e.g. depend onthe size of the contained elements. And
that'll e.g. will depend on whether pointers are 4 or 8 bytes.


Greetings,

Andres Freund


Reply via email to