On Tue, Dec 10, 2019 at 11:34 PM Miles Elam <miles.e...@productops.com> wrote:
> In terms of "wasted computation", MD5, SHA1, and the others always compute > the full length before they are passed to a UUID, int, or whatever. It's a > sunk cost. It's also a minor cost considering many hash algorithms are > performed in CPU hardware now. All that's left is the truncation and cast, > which you can't avoid easily. > > > Sure, you could reimplement Java's .hashCode() method by iterating through > the characters and processing the character codes: > > s[0]*31^(n - 1) + s[1]*31^(n - 2) + ... + s[n - 1] > > > I don't see how that would beat the CPU-based hashes though unless you > wrote a C-based extension. Maybe it's better just to embrace the > user-defined function first and then decide if performance is insufficient > for your use cases. > > > CREATE EXTENSION IF NOT EXISTS pgcrypto; > > CREATE OR REPLACE FUNCTION hash8 (p_data text, p_algo text = 'md5') > RETURNS int8 AS $$ > > SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 16), > 'hex'))::bit(64)::int8 > > $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE; > > > CREATE OR REPLACE FUNCTION hash4 (p_data text, p_algo text = 'md5') > RETURNS int4 AS $$ > > SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 8), > 'hex'))::bit(32)::int4 > > $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE; > > > SELECT > > hash4('something something something'), > > hash4('something something something', 'sha1'), > > hash8('something something something'), > > hash8('something something something', 'sha1'); > > > > Cheers, > > > Miles > Thanks for the custom functions! May be useful as fallback. But I am really looking for standard functions in Postgres first. Those should be faster and more reliable than writing my own. Regards Erwin