On 2014-05-27, Steven D'Aprano wrote:
> On Tue, 27 May 2014 16:13:46 +0100, Adam Funk wrote:
>> Well, here's the way it works in my mind:
>>
>>I can store a set of a zillion strings (or a dict with a zillion
>>string keys), but every time I test "if new_string in seen_strings",
>>the
On 2014-05-28, Dan Sommers wrote:
> On Tue, 27 May 2014 17:02:50 +, Steven D'Aprano wrote:
>
>> - rather than "zillions" of them, there are few enough of them that
>> the chances of an MD5 collision is insignificant;
>
>> (Any MD5 collision is going to play havoc with your strategy of
>>
On Tue, 27 May 2014 17:02:50 +, Steven D'Aprano wrote:
> - rather than "zillions" of them, there are few enough of them that
> the chances of an MD5 collision is insignificant;
> (Any MD5 collision is going to play havoc with your strategy of
> using hashes as a proxy for the real string
On Wed, May 28, 2014 at 3:02 AM, Steven D'Aprano
wrote:
> But I know that Python is a high-level language with
> lots of high-level data structures like dicts which trade-off time and
> memory for programmer convenience, and that I'd want to see some real
> benchmarks proving that my application w
On Tue, 27 May 2014 16:13:46 +0100, Adam Funk wrote:
> On 2014-05-23, Chris Angelico wrote:
>
>> On Fri, May 23, 2014 at 8:27 PM, Adam Funk
>> wrote:
>>> I've also used hashes of strings for other things involving
>>> deduplication or fast lookups (because integer equality is faster than
>>> str
On 2014-05-23, Terry Reedy wrote:
> On 5/23/2014 6:27 AM, Adam Funk wrote:
>
>> that. The only thing that really bugs me in Python 3 is that execfile
>> has been removed (I find it useful for testing things interactively).
>
> The spelling has been changed to exec(open(...).read(), ... . It you u
On 2014-05-23, Chris Angelico wrote:
> On Fri, May 23, 2014 at 8:27 PM, Adam Funk wrote:
>> I've also used hashes of strings for other things involving
>> deduplication or fast lookups (because integer equality is faster than
>> string equality). I guess if it's just for deduplication, though, a
On 5/23/2014 6:27 AM, Adam Funk wrote:
that. The only thing that really bugs me in Python 3 is that execfile
has been removed (I find it useful for testing things interactively).
The spelling has been changed to exec(open(...).read(), ... . It you use
it a lot, add a customized def execfile(
On Fri, May 23, 2014 at 8:36 PM, Adam Funk wrote:
> BTW, I just tested that & it should be "big" for consistency with the
> hexdigest:
Yes, it definitely should be parsed big-endianly.
ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
On Fri, May 23, 2014 at 8:27 PM, Adam Funk wrote:
> I've also used hashes of strings for other things involving
> deduplication or fast lookups (because integer equality is faster than
> string equality). I guess if it's just for deduplication, though, a
> set of byte arrays is as good as a set o
On 2014-05-23, Adam Funk wrote:
> On 2014-05-22, Peter Otten wrote:
>> In Python 3 there's int.from_bytes()
>>
> h = hashlib.sha1(b"Hello world")
> int.from_bytes(h.digest(), "little")
>> 538059071683667711846616050503420899184350089339
>
> Excellent, thanks for pointing that out. I've j
On 2014-05-22, Peter Otten wrote:
> Adam Funk wrote:
>> Well, J*v* returns a byte array, so I used to do this:
>>
>> digester = MessageDigest.getInstance("MD5");
>> ...
>> digester.reset();
>> byte[] digest = digester.digest(bytes);
>> return new BigInteger(+1, digest);
>
> I
Adam Funk wrote:
> On 2014-05-22, Chris Angelico wrote:
>
>> On Thu, May 22, 2014 at 11:54 PM, Adam Funk wrote:
>
>>> That ties in with a related question I've been wondering about lately
>>> (using MD5s & SHAs for other things) --- getting a hash value (which
>>> is internally numeric, rather
On Fri, May 23, 2014 at 12:47 AM, Adam Funk wrote:
>> I don't know that there is, at least not with hashlib. You might be
>> able to use digest() followed by the struct module, but it's no less
>> convoluted. It's the same in several other languages' hashing
>> functions; the result is a string, n
On 2014-05-22, Chris Angelico wrote:
> On Thu, May 22, 2014 at 11:54 PM, Adam Funk wrote:
>> That ties in with a related question I've been wondering about lately
>> (using MD5s & SHAs for other things) --- getting a hash value (which
>> is internally numeric, rather than string, right?) out as
On 2014-05-22, Chris Angelico wrote:
> On Thu, May 22, 2014 at 11:41 PM, Adam Funk wrote:
>> On further reflection, I think I asked for that. In fact, the table
>> I'm using only has one column for the hashes --- I wasn't going to
>> store the strings at all in order to save disk space (maybe my
On Thu, 22 May 2014 12:47:31 +0100, Adam Funk wrote:
> I'm using Python 3.3 and the sqlite3 module in the standard library. I'm
> processing a lot of strings from input files (among other things, values
> of headers in e-mail & news messages) and suppressing duplicates using a
> table of seen stri
On Thu, May 22, 2014 at 11:54 PM, Adam Funk wrote:
>> >>> from hashlib import sha1
>> >>> s = "Hello world"
>> >>> h = sha1(s)
>> >>> h.hexdigest()
>> '7b502c3a1f48c8609ae212cdfb639dee39673f5e'
>> >>> int(h.hexdigest(), 16)
>> 703993777145756967576188115661016000849227759454L
>
> That tie
On Thu, May 22, 2014 at 11:41 PM, Adam Funk wrote:
> On further reflection, I think I asked for that. In fact, the table
> I'm using only has one column for the hashes --- I wasn't going to
> store the strings at all in order to save disk space (maybe my mind is
> stuck in the 1980s).
That's a p
On 2014-05-22, Tim Chase wrote:
> On 2014-05-22 12:47, Adam Funk wrote:
>> I'm using Python 3.3 and the sqlite3 module in the standard library.
>> I'm processing a lot of strings from input files (among other
>> things, values of headers in e-mail & news messages) and suppressing
>> duplicates usi
On 2014-05-22, Chris Angelico wrote:
> On Thu, May 22, 2014 at 9:47 PM, Adam Funk wrote:
>> I'm using Python 3.3 and the sqlite3 module in the standard library.
>> I'm processing a lot of strings from input files (among other things,
>> values of headers in e-mail & news messages) and suppressing
On 2014-05-22, Peter Otten wrote:
> Adam Funk wrote:
>
>> I'm using Python 3.3 and the sqlite3 module in the standard library.
>> I'm processing a lot of strings from input files (among other things,
>> values of headers in e-mail & news messages) and suppressing
>> duplicates using a table of see
On 2014-05-22 12:47, Adam Funk wrote:
> I'm using Python 3.3 and the sqlite3 module in the standard library.
> I'm processing a lot of strings from input files (among other
> things, values of headers in e-mail & news messages) and suppressing
> duplicates using a table of seen strings in the datab
On Thu, May 22, 2014 at 9:47 PM, Adam Funk wrote:
> I'm using Python 3.3 and the sqlite3 module in the standard library.
> I'm processing a lot of strings from input files (among other things,
> values of headers in e-mail & news messages) and suppressing
> duplicates using a table of seen strings
Adam Funk wrote:
> I'm using Python 3.3 and the sqlite3 module in the standard library.
> I'm processing a lot of strings from input files (among other things,
> values of headers in e-mail & news messages) and suppressing
> duplicates using a table of seen strings in the database.
>
> It seems t
I'm using Python 3.3 and the sqlite3 module in the standard library.
I'm processing a lot of strings from input files (among other things,
values of headers in e-mail & news messages) and suppressing
duplicates using a table of seen strings in the database.
It seems to me --- from past experience
26 matches
Mail list logo