Hello,

I'm working on a data warehouse dimensionalization process   where I need to 
hash a text string to use as the key.
I've implemented with MD5.  It works fine,  the problem I have is the size of 
the md5 (32 bytes) is often longer than the original string - thus not 
accomplishing what I want - space savings.

Does anybody have alternative hash function recommendations?
 I looked at the options I knew of
select length(encode('ar=514','hex')); -- 12
select length(decode('ar=514','base64')); -- 24
select length(DIGEST('ar=514', 'md5')) -- 16bytes
select length(DIGEST('ar=514', 'sha1')) -- 20bytes

function is currently written in pg/plsql,  but I'm considering switching to 
python for broader library choice.



Source data is delimited list of name/value pairs.  Length range from 0-2500 
bytes.
ar=514,cc=CA,ci=Montreal,cn=North+America,co=Sympatico,cs=Canada,nt=Xdsl,rc=QC,rs=Quebec,tp=High,tz=GMT%2D5

Thanks in advance
Doug Little

Sr. Data Warehouse Architect | Business Intelligence Architecture | Orbitz 
Worldwide
douglas.lit...@orbitz.com<mailto:douglas.lit...@orbitz.com>
 [cid:image001.jpg@01CCD8EB.FDD3C490]   orbitz.com<http://www.orbitz.com/> | 
ebookers.com<http://www.ebookers.com/> | 
hotelclub.com<http://www.hotelclub.com/> | 
cheaptickets.com<http://www.cheaptickets.com/> | 
ratestogo.com<http://www.ratestogo.com/> | 
asiahotels.com<http://www.asiahotels.com/>

<<inline: image001.jpg>>

Reply via email to