Hello, I'm working on a data warehouse dimensionalization process where I need to hash a text string to use as the key. I've implemented with MD5. It works fine, the problem I have is the size of the md5 (32 bytes) is often longer than the original string - thus not accomplishing what I want - space savings.
Does anybody have alternative hash function recommendations? I looked at the options I knew of select length(encode('ar=514','hex')); -- 12 select length(decode('ar=514','base64')); -- 24 select length(DIGEST('ar=514', 'md5')) -- 16bytes select length(DIGEST('ar=514', 'sha1')) -- 20bytes function is currently written in pg/plsql, but I'm considering switching to python for broader library choice. Source data is delimited list of name/value pairs. Length range from 0-2500 bytes. ar=514,cc=CA,ci=Montreal,cn=North+America,co=Sympatico,cs=Canada,nt=Xdsl,rc=QC,rs=Quebec,tp=High,tz=GMT%2D5 Thanks in advance Doug Little Sr. Data Warehouse Architect | Business Intelligence Architecture | Orbitz Worldwide douglas.lit...@orbitz.com<mailto:douglas.lit...@orbitz.com> [cid:image001.jpg@01CCD8EB.FDD3C490] orbitz.com<http://www.orbitz.com/> | ebookers.com<http://www.ebookers.com/> | hotelclub.com<http://www.hotelclub.com/> | cheaptickets.com<http://www.cheaptickets.com/> | ratestogo.com<http://www.ratestogo.com/> | asiahotels.com<http://www.asiahotels.com/>
<<inline: image001.jpg>>