Hi,

while working on the bloom filters for hashjoins, I've started using the HLL library committed as part of the sorting improvements for 9.5. I propose adding two more functions to the API, which I think are quite useful:

1) initHyperLogLogError(hyperLogLogState *cState, double error)

   Instead of specifying bwidth (essentially the number of bits used
   for addressing in the counter), this allows specifying the expected
   error rate for the counter, which is

      error_rate = 1.04 / sqrt(2^bwidth)

   So for 5% we get bwidth=5, and so on. This makes the API a bit easier
   the use, because there are pretty much no comments about the meaning
   of bwidth, and the existing callers simply use 10 without details.

2) freeHyperLogLog(hyperLogLogState *cState)

   I think it's a good idea to provide function "undoing" what init
   does, i.e. freeing the internal memory etc. Currently that's trivial
   to do, but perhaps we'll make the structure more complicated in the
   future (albeit that might be unlikely).

FWIW I've already used this in the patch marrying hash joins and bloom filters.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to