hashing strings to integers for sqlite3 keys

Adam Funk Thu, 22 May 2014 05:06:49 -0700

I'm using Python 3.3 and the sqlite3 module in the standard library.
I'm processing a lot of strings from input files (among other things,
values of headers in e-mail & news messages) and suppressing
duplicates using a table of seen strings in the database.


It seems to me --- from past experience with other things, where
testing integers for equality is faster than testing strings, as well
as from reading the SQLite3 documentation about INTEGER PRIMARY KEY
--- that the SELECT tests should be faster if I am looking up an
INTEGER PRIMARY KEY value rather than TEXT PRIMARY KEY.  Is that
right?

If so, what sort of hashing function should I use?  The "maxint" for
SQLite3 is a lot smaller than the size of even MD5 hashes.  The only
thing I've thought of so far is to use MD5 or SHA-something modulo the
maxint value.  (Security isn't an issue --- i.e., I'm not worried
about someone trying to create a hash collision.)

Thanks,
Adam


-- 
"It is the role of librarians to keep government running in difficult
times," replied Dramoren.  "Librarians are the last line of defence
against chaos."                                       (McMullen 2001)
-- 
https://mail.python.org/mailman/listinfo/python-list

hashing strings to integers for sqlite3 keys

Reply via email to