Re: [Python-ideas] incremental hashing in hash

Ned Batchelder Wed, 28 Dec 2016 13:27:41 -0800

On 12/28/16 12:27 PM, [email protected] wrote:
> On Wed, Dec 28, 2016 at 11:48 AM, Ned Batchelder
> <[email protected] <mailto:[email protected]>> wrote:
>
>     You can write a simple function to use hash iteratively to hash
>     the entire stream in constant space and linear time:
>
>         def hash_stream(them):
>             val = 0
>             for it in them:
>                 val = hash((val, it))
>             return val
>
>     Although this creates N 2-tuples, they come and go, so the memory
>     use won't grow.  Adjust the code as needed to achieve
>     canonicalization before iterating.
>
>     Or maybe I am misunderstanding the requirements?
>
>
> This is better than solutions like
> http://stackoverflow.com/a/27952689/161642
> <http://stackoverflow.com/a/27952689/161642> in the sense that it's a
> little higher level (no bit shifting or magic numbers).
>
> But it's not clear that it's any better in the sense that you're still
> rolling your own incremental hash algorithm out of a lower-level
> primitive that doesn't document support for this, and therefore taking
> responsibility yourself for how well it distributes values into buckets.
>
> Are you confident this results in good hash performance? Is this
> better than a solution built on top of a hash function with an
> explicit API for calculating a hash incrementally, such as the crc32
> example I included? (And again, this would ideally be
> a sys.hash_info.hash_bits -bit algorithm.)


I don't have the theoretical background to defend this function.  But it
seems to me that if we believe that hash((int, thing)) distributes well,
then how could this function not distribute well?

--Ned.

_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] incremental hashing in __hash__

Reply via email to

Re: [Python-ideas] incremental hashing in hash