Re: hashing strings to integers

2014-06-03 Thread Adam Funk
On 2014-05-27, Steven D'Aprano wrote: > On Tue, 27 May 2014 16:13:46 +0100, Adam Funk wrote: >> Well, here's the way it works in my mind: >> >>I can store a set of a zillion strings (or a dict with a zillion >>string keys), but every time I test "if new_string in seen_strings", >>the

Re: hashing strings to integers

2014-06-03 Thread Adam Funk
On 2014-05-28, Dan Sommers wrote: > On Tue, 27 May 2014 17:02:50 +, Steven D'Aprano wrote: > >> - rather than "zillions" of them, there are few enough of them that >> the chances of an MD5 collision is insignificant; > >> (Any MD5 collision is going to play havoc with your strategy of >>

Re: hashing strings to integers

2014-05-27 Thread Dan Sommers
On Tue, 27 May 2014 17:02:50 +, Steven D'Aprano wrote: > - rather than "zillions" of them, there are few enough of them that > the chances of an MD5 collision is insignificant; > (Any MD5 collision is going to play havoc with your strategy of > using hashes as a proxy for the real string

Re: hashing strings to integers

2014-05-27 Thread Chris Angelico
On Wed, May 28, 2014 at 3:02 AM, Steven D'Aprano wrote: > But I know that Python is a high-level language with > lots of high-level data structures like dicts which trade-off time and > memory for programmer convenience, and that I'd want to see some real > benchmarks proving that my application w

Re: hashing strings to integers

2014-05-27 Thread Steven D'Aprano
On Tue, 27 May 2014 16:13:46 +0100, Adam Funk wrote: > On 2014-05-23, Chris Angelico wrote: > >> On Fri, May 23, 2014 at 8:27 PM, Adam Funk >> wrote: >>> I've also used hashes of strings for other things involving >>> deduplication or fast lookups (because integer equality is faster than >>> str

Re: hashing strings to integers

2014-05-27 Thread Adam Funk
On 2014-05-23, Terry Reedy wrote: > On 5/23/2014 6:27 AM, Adam Funk wrote: > >> that. The only thing that really bugs me in Python 3 is that execfile >> has been removed (I find it useful for testing things interactively). > > The spelling has been changed to exec(open(...).read(), ... . It you u

Re: hashing strings to integers

2014-05-27 Thread Adam Funk
On 2014-05-23, Chris Angelico wrote: > On Fri, May 23, 2014 at 8:27 PM, Adam Funk wrote: >> I've also used hashes of strings for other things involving >> deduplication or fast lookups (because integer equality is faster than >> string equality). I guess if it's just for deduplication, though, a

Re: hashing strings to integers

2014-05-23 Thread Terry Reedy
On 5/23/2014 6:27 AM, Adam Funk wrote: that. The only thing that really bugs me in Python 3 is that execfile has been removed (I find it useful for testing things interactively). The spelling has been changed to exec(open(...).read(), ... . It you use it a lot, add a customized def execfile(

Re: hashing strings to integers

2014-05-23 Thread Chris Angelico
On Fri, May 23, 2014 at 8:36 PM, Adam Funk wrote: > BTW, I just tested that & it should be "big" for consistency with the > hexdigest: Yes, it definitely should be parsed big-endianly. ChrisA -- https://mail.python.org/mailman/listinfo/python-list

Re: hashing strings to integers (was: hashing strings to integers for sqlite3 keys)

2014-05-23 Thread Chris Angelico
On Fri, May 23, 2014 at 8:27 PM, Adam Funk wrote: > I've also used hashes of strings for other things involving > deduplication or fast lookups (because integer equality is faster than > string equality). I guess if it's just for deduplication, though, a > set of byte arrays is as good as a set o

Re: hashing strings to integers

2014-05-23 Thread Adam Funk
On 2014-05-23, Adam Funk wrote: > On 2014-05-22, Peter Otten wrote: >> In Python 3 there's int.from_bytes() >> > h = hashlib.sha1(b"Hello world") > int.from_bytes(h.digest(), "little") >> 538059071683667711846616050503420899184350089339 > > Excellent, thanks for pointing that out. I've j

hashing strings to integers (was: hashing strings to integers for sqlite3 keys)

2014-05-23 Thread Adam Funk
On 2014-05-22, Peter Otten wrote: > Adam Funk wrote: >> Well, J*v* returns a byte array, so I used to do this: >> >> digester = MessageDigest.getInstance("MD5"); >> ... >> digester.reset(); >> byte[] digest = digester.digest(bytes); >> return new BigInteger(+1, digest); > > I

Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Peter Otten
Adam Funk wrote: > On 2014-05-22, Chris Angelico wrote: > >> On Thu, May 22, 2014 at 11:54 PM, Adam Funk wrote: > >>> That ties in with a related question I've been wondering about lately >>> (using MD5s & SHAs for other things) --- getting a hash value (which >>> is internally numeric, rather

Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Chris Angelico
On Fri, May 23, 2014 at 12:47 AM, Adam Funk wrote: >> I don't know that there is, at least not with hashlib. You might be >> able to use digest() followed by the struct module, but it's no less >> convoluted. It's the same in several other languages' hashing >> functions; the result is a string, n

Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
On 2014-05-22, Chris Angelico wrote: > On Thu, May 22, 2014 at 11:54 PM, Adam Funk wrote: >> That ties in with a related question I've been wondering about lately >> (using MD5s & SHAs for other things) --- getting a hash value (which >> is internally numeric, rather than string, right?) out as

Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
On 2014-05-22, Chris Angelico wrote: > On Thu, May 22, 2014 at 11:41 PM, Adam Funk wrote: >> On further reflection, I think I asked for that. In fact, the table >> I'm using only has one column for the hashes --- I wasn't going to >> store the strings at all in order to save disk space (maybe my

Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread alister
On Thu, 22 May 2014 12:47:31 +0100, Adam Funk wrote: > I'm using Python 3.3 and the sqlite3 module in the standard library. I'm > processing a lot of strings from input files (among other things, values > of headers in e-mail & news messages) and suppressing duplicates using a > table of seen stri

Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Chris Angelico
On Thu, May 22, 2014 at 11:54 PM, Adam Funk wrote: >> >>> from hashlib import sha1 >> >>> s = "Hello world" >> >>> h = sha1(s) >> >>> h.hexdigest() >> '7b502c3a1f48c8609ae212cdfb639dee39673f5e' >> >>> int(h.hexdigest(), 16) >> 703993777145756967576188115661016000849227759454L > > That tie

Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Chris Angelico
On Thu, May 22, 2014 at 11:41 PM, Adam Funk wrote: > On further reflection, I think I asked for that. In fact, the table > I'm using only has one column for the hashes --- I wasn't going to > store the strings at all in order to save disk space (maybe my mind is > stuck in the 1980s). That's a p

Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
On 2014-05-22, Tim Chase wrote: > On 2014-05-22 12:47, Adam Funk wrote: >> I'm using Python 3.3 and the sqlite3 module in the standard library. >> I'm processing a lot of strings from input files (among other >> things, values of headers in e-mail & news messages) and suppressing >> duplicates usi

Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
On 2014-05-22, Chris Angelico wrote: > On Thu, May 22, 2014 at 9:47 PM, Adam Funk wrote: >> I'm using Python 3.3 and the sqlite3 module in the standard library. >> I'm processing a lot of strings from input files (among other things, >> values of headers in e-mail & news messages) and suppressing

Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
On 2014-05-22, Peter Otten wrote: > Adam Funk wrote: > >> I'm using Python 3.3 and the sqlite3 module in the standard library. >> I'm processing a lot of strings from input files (among other things, >> values of headers in e-mail & news messages) and suppressing >> duplicates using a table of see

Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Tim Chase
On 2014-05-22 12:47, Adam Funk wrote: > I'm using Python 3.3 and the sqlite3 module in the standard library. > I'm processing a lot of strings from input files (among other > things, values of headers in e-mail & news messages) and suppressing > duplicates using a table of seen strings in the datab

Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Chris Angelico
On Thu, May 22, 2014 at 9:47 PM, Adam Funk wrote: > I'm using Python 3.3 and the sqlite3 module in the standard library. > I'm processing a lot of strings from input files (among other things, > values of headers in e-mail & news messages) and suppressing > duplicates using a table of seen strings

Re: hashing strings to integers for sqlite3 keys

2014-05-22 Thread Peter Otten
Adam Funk wrote: > I'm using Python 3.3 and the sqlite3 module in the standard library. > I'm processing a lot of strings from input files (among other things, > values of headers in e-mail & news messages) and suppressing > duplicates using a table of seen strings in the database. > > It seems t

hashing strings to integers for sqlite3 keys

2014-05-22 Thread Adam Funk
I'm using Python 3.3 and the sqlite3 module in the standard library. I'm processing a lot of strings from input files (among other things, values of headers in e-mail & news messages) and suppressing duplicates using a table of seen strings in the database. It seems to me --- from past experience