On 12/27/16 10:13 PM, [email protected] wrote:
> Applying this advice to the use cases above would require creating an
> arbitrarily large tuple in memory before passing it to hash(), which
> is then just thrown away. It would be preferable if there were a way
> to pass multiple values to hash() in a streaming fashion, such that
> the overall hash were computed incrementally, without building up a
> large object in memory first.
>
> Should there be better support for this use case? Perhaps hash() could
> support an alternative signature, allowing it to accept a stream of
> values whose combined hash would be computed incrementally in
> *constant* space and linear time, e.g. "hash(items=iter(self))".
You can write a simple function to use hash iteratively to hash the
entire stream in constant space and linear time:
def hash_stream(them):
val = 0
for it in them:
val = hash((val, it))
return val
Although this creates N 2-tuples, they come and go, so the memory use
won't grow. Adjust the code as needed to achieve canonicalization
before iterating.
Or maybe I am misunderstanding the requirements?
--Ned.
_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/