hzh...@gmail.com wrote:
"Given the function hashlib.sha256, enumerate all the possible inputs
that give the hexadecimal result
0a2591aaf3340ad92faecbc5908e74d04b51ee5d2deee78f089f1607570e2e91."
This is a hash collision problem. Nobody has proved that SHA-256 is
collision free
It's actually pretty easy to prove that it is *not* collision
free. The SHA-256 encodes 512 bits of data. So the the process
of encoding (2**512)+1 distinct inputs incurs a collision in
SHA-256 space as soon as you've hit (2**512)+1 if not earlier.
to start you off:
sha_backmap = {}
for i in xrange((2**512)+2):
hash = sha(str(i))
if hash in sha_backmap:
print "Collision found: %i and %i" % (
i, sha_backmap[hash])
break
sha_backmap[hash] = i
Though it might take a computer the size of the universe, so I'm
guessing that the first collision encountered is with "42". I
leave the actual calculation and hashing of all possible
combinations of 513 bits of data as an exercise to the reader
with a lot of time on their hands or a quantum computer under
their desk ;-)
It is hard to tell in advance. However, we can add some timing limit
or counting limit, to make it an algorithm, which can halt. For
example, whenever the program outputs more than 1000000 expressions
that match the input regex, we can halt because that exceeds our
limit. But surely this is not efficient because of the post-decision.
As mentioned, it sounds like you either want a depth-first of the
solution space that raises exceptions on an infinite/unbounded
operator ("*", "+", and "{N,}" as mentioned in another email), or
if you want to handle those operators, do a breadth-first search
of the solution-space and track your depth (or time taken, or
previous number of multi-factor atoms if you desire) to ensure
you don't exceed a certain depth. But you're still talking a
combinatorial number of solutions for even simple regexps.
-tkc
--
http://mail.python.org/mailman/listinfo/python-list