hzh...@gmail.com wrote:
"Given the function hashlib.sha256, enumerate all the possible inputs
that give the hexadecimal result
0a2591aaf3340ad92faecbc5908e74d04b51ee5d2deee78f089f1607570e2e91."

This is a hash collision problem. Nobody has proved that SHA-256 is
collision free

It's actually pretty easy to prove that it is *not* collision free. The SHA-256 encodes 512 bits of data. So the the process of encoding (2**512)+1 distinct inputs incurs a collision in SHA-256 space as soon as you've hit (2**512)+1 if not earlier.

to start you off:

  sha_backmap = {}
  for i in xrange((2**512)+2):
    hash = sha(str(i))
    if hash in sha_backmap:
      print "Collision found: %i and %i" % (
        i, sha_backmap[hash])
        break
    sha_backmap[hash] = i

Though it might take a computer the size of the universe, so I'm guessing that the first collision encountered is with "42". I leave the actual calculation and hashing of all possible combinations of 513 bits of data as an exercise to the reader with a lot of time on their hands or a quantum computer under their desk ;-)

It is hard to tell in advance. However, we can add some timing limit
or counting limit, to make it an algorithm, which can halt. For
example, whenever the program outputs more than 1000000 expressions
that match the input regex, we can halt because that exceeds our
limit. But surely this is not efficient because of the post-decision.

As mentioned, it sounds like you either want a depth-first of the solution space that raises exceptions on an infinite/unbounded operator ("*", "+", and "{N,}" as mentioned in another email), or if you want to handle those operators, do a breadth-first search of the solution-space and track your depth (or time taken, or previous number of multi-factor atoms if you desire) to ensure you don't exceed a certain depth. But you're still talking a combinatorial number of solutions for even simple regexps.

-tkc


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to