On Thu, Sep 17, 2020 at 06:48:11PM -0400, John Naylor wrote:
I wrote:

Hmm, I came across that paper while doing background reading. Okay,
now I get that "% (filter->nbits - 1)" is the second hash function in
that scheme. But now I wonder if that second function should actually
act on the passed "value" (the original hash), so that they are
actually independent, as required. In the language of that paper, the
patch seems to have

g(x) = h1(x) + i*h2(h1(x)) + f(i)

instead of

g(x) = h1(x) + i*h2(x) + f(i)

Concretely, I'm wondering if it should be:

 big_h = DatumGetUint32(hash_uint32(value));
 h = big_h % filter->nbits;
-d = big_h % (filter->nbits - 1);
+d = value % (filter->nbits - 1);

But I could be wrong.

I'm wrong -- if we use different operands to the moduli, we throw away
the assumption of co-primeness. But I'm still left wondering why we
have to re-hash the hash for this to work. In any case, there should
be some more documentation around the core algorithm, so that future
readers are not left scratching their heads.


Hmm, good question. I think we don't really need to hash it twice. It
does not rally achieve anything - it won't reduce number of collisions
or anything like that.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to