Re: [HACKERS] Hash Functions

amul sul Thu, 31 Aug 2017 05:42:08 -0700

On Wed, Aug 30, 2017 at 9:05 PM, Robert Haas <robertmh...@gmail.com> wrote:


> On Wed, Aug 30, 2017 at 10:43 AM, amul sul <sula...@gmail.com> wrote:
> > Thanks for the suggestion, I have updated 0002-patch accordingly.
> > Using this I found some strange behaviours as follow:
> >
> > 1) standard and extended0 output for the jsonb_hash case is not same.
> > 2) standard and extended0 output for the hash_range case is not same when
> > input
> >    is int4range(550274, 1550274)  other case in the patch are fine. This
> can
> > be
> >    reproducible with other values as well, for e.g. int8range(1570275,
> > 208112489).
> >
> > Will look into this tomorrow.
>
> Those sound like bugs in your patch.  Specifically:
>
> +    /* Same approach as hash_range */
> +    result = hash_uint32_extended((uint32) flags, seed);
> +    result ^= lower_hash;
> +    result = (result << 1) | (result >> 63);
> +    result ^= upper_hash;
> 
>

Yes, you are correct.


> 
>
> That doesn't give compatible results.  The easiest thing to do might
> be to rotate the high 32 bits and the low 32 bits separately.
> JsonbHashScalarValueExtended has the same problem.  Maybe create a
> helper function that does something like this (untested):
>
> ((x << 1) & UINT64COUNT(0xfffffffefffffffe)) | ((x >> 31) &
> UINT64CONST(0x100000001))
> 
>

This working as expected,  also tested by executing the following SQL
multiple times:

SELECT v as value, hash_range(v)::bit(32) as standard,
       hash_range_extended(v, 0)::bit(32) as extended0,
       hash_range_extended(v, 1)::bit(32) as extended1
FROM   (VALUES (int8range(floor(random() * 100)::int8, floor(random() *
1000)::int8)),
        (int8range(floor(random() * 1000)::int8, floor(random() *
10000)::int8)),
        (int8range(floor(random() * 10000)::int8, floor(random() *
100000)::int8)),
         (int8range(floor(random() * 10000000)::int8, floor(random() *
100000000)::int8)),
         (int8range(floor(random() * 100000000)::int8, floor(random() *
1000000000)::int8))) x(v)
WHERE  hash_range(v)::bit(32) != hash_range_extended(v, 0)::bit(32)
       OR hash_range(v)::bit(32) = hash_range_extended(v, 1)::bit(32);  



> > Another case which I want to discuss is, extended and standard version of
> > hashfloat4, hashfloat8 & hash_numeric function will have the same output
> for
> > zero
> > value irrespective of seed value. Do you think we need a fix for this?
>
> Yes, I think you should return the seed rather than 0 in the cases
> where the current code hard-codes a 0 return.  That will give the same
> behavior in the seed == 0 case while cheaply getting us something a
> bit different when there is a seed.
>
> Fixed in the attached version.



> BTW, you should probably invent and use a PG_RETURN_UINT64 macro in this
> patch.
>
> Added i
n the attached version
.

Thanks for your all the suggestions.

Regards,
Amul

0001-add-optional-second-hash-proc-v4.patch
Description: Binary data

0002-test-Hash_functions_v3_wip.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

Reply via email to