New submission from Lukas Lueg <lukas.l...@gmail.com>:

The objects provided by hashlib mainly serve the purpose of computing hashes 
over strings of arbitrary size. The user gets a new object (e.g. 
hashlib.sha1()), calls .update() with chunks of data and then finally uses 
.digest() or .hexdigest() to get the hash. For convenience reasons these steps 
can also be done in almost one step (e.g. hashlib.sha1('foobar').hexdigest()).
While the above approach basically covers all use-cases for hash-functions, 
when computing hashes of many small strings it is yet inefficient (e.g. due to 
interpreter-overhead) and leaves out the possibility for performance 
improvements.

There are many cases where we need the hashes of numerous (small) objects, most 
or all of which being available in memory at the same time.

I therefor propose to extend the classes provided by hashlib with an additional 
function that takes an iterable object, computes the hash over the string 
representation of each member and returns the result. Due to the aim of this 
interface, the function is a member of the class (not the instance) and has 
therefor no state bound to an instance. Memory requirements are to be 
anticipated and met by the programmer.

For example:

foo = ['my_database_key1', 'my_database_key2']
hashlib.sha1.compute(foo) 
>> ('\x00\x00', '\xff\xff')


I consider this interface to hashlib particular useful, as we can take 
advantage of vector-based implementations that compute multiple hashes in one 
pass (e.g. through SSE2). GCC has a vector-extension that provides a *somewhat* 
standard way to write code that can get compiled to SSE2 or similar machine 
code. Examples of vector-based implementations of SHA1 and MD5 can be found at 
https://code.google.com/p/pyrit/issues/detail?id=207


Contigency plan: We compile to code iterating over OpenSSL's EVP-functions if 
compiler is other than GCC or SSE2 is not available. The same approach can be 
used to cover hashlib-objects for which we don't have an optimized 
implementation.

----------
components: Library (Lib)
messages: 120351
nosy: ebfe
priority: normal
severity: normal
status: open
title: Add class-functions to hash many small objects with hashlib
type: feature request
versions: Python 3.2, Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10302>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to