Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?

Erwin Brandstetter Wed, 11 Dec 2019 10:08:22 -0800

On Tue, Dec 10, 2019 at 11:34 PM Miles Elam <miles.e...@productops.com>
wrote:


> In terms of "wasted computation", MD5, SHA1, and the others always compute
> the full length before they are passed to a UUID, int, or whatever. It's a
> sunk cost. It's also a minor cost considering many hash algorithms are
> performed in CPU hardware now. All that's left is the truncation and cast,
> which you can't avoid easily.
>
>
> Sure, you could reimplement Java's .hashCode() method by iterating through
> the characters and processing the character codes:
>
> s[0]*31^(n - 1) + s[1]*31^(n - 2) + ... + s[n - 1]
>
>
> I don't see how that would beat the CPU-based hashes though unless you
> wrote a C-based extension. Maybe it's better just to embrace the
> user-defined function first and then decide if performance is insufficient
> for your use cases.
>
>
> CREATE EXTENSION IF NOT EXISTS pgcrypto;
>
> CREATE OR REPLACE FUNCTION hash8 (p_data text, p_algo text = 'md5')
> RETURNS int8 AS $$
>
> SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 16),
> 'hex'))::bit(64)::int8
>
> $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;
>
>
> CREATE OR REPLACE FUNCTION hash4 (p_data text, p_algo text = 'md5')
> RETURNS int4 AS $$
>
> SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 8),
> 'hex'))::bit(32)::int4
>
> $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;
>
>
> SELECT
>
> hash4('something something something'),
>
> hash4('something something something', 'sha1'),
>
> hash8('something something something'),
>
> hash8('something something something', 'sha1');
>
>
>
> Cheers,
>
>
> Miles
>



Thanks for the custom functions! May be useful as fallback.

But I am really looking for standard functions in Postgres first. Those
should be faster and more reliable than writing my own.


Regards

Erwin

Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?

Reply via email to