On Sun, 8 Jan 2006, Tom Anderson wrote: > On Fri, 6 Jan 2006 [EMAIL PROTECTED] wrote: > >> below you find my simple python version of MD2 algorithm as described >> in RFC1319 (http://rfc1319.x42.com/MD2). It produces correct results >> for strings shorter than 16 Bytes and wrong results for longer strings. > > I guess the thing to do is extract the C code from the RFC and compile > it, verify that it works, then stick loads of print statements in the C > and the python, to see where the states of the checksum engines diverge.
Okay, i've done this. I had to fiddle with the source a bit - added a #include "global.h" to md2.h (it needs it for the PROTO_LIST macro) and took the corresponding includes out of md2c.c and mddriver.c (to avoid duplicate definitions) - but after that, it built cleanly with: gcc -DMD=2 *.c *.h -o mddriver A couple of pairs of (somewhat spurious) parentheses in mddriver.c, and it even built cleanly with -Wall. Running the test suite with mddriver -x gives results matching the test vectors in the RFC - a good start! Patching the code to dump the checksums immediately after updating with the pad, and before updating with the checksum: *** checksum after padding = 623867b6af52795e5f214e9720beea8d MD2 ("") = 8350e5a3e24c153df2275c9f80692773 *** checksum after padding = 19739cada3ba281693348e9d256fff31 MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1 *** checksum after padding = 19e29d1b7304368e595a276f302f57cc MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb *** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0 *** checksum after padding = 4a42d3a377b7e9988fb9289699e4d3a3 MD2 ("abcdefghijklmnopqrstuvwxyz") = 4e8ddff3650292ab5a4108c3aa47940b *** checksum after padding = c3db7592ee1dd9b84505cfb4e2f9a765 MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") = da33def2a42df13975352846c30338cd *** checksum after padding = 59ca5673c8f931bc41214f56b5c6c01 MD2 ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") = d5976f79d83d3a0dc9806c3c66f3efd8 And here's my python code with the same modification, running the test suite: *** checksum after padding = 623867b6af52795e5f214e9720beea8d MD2 ("") = 8350e5a3e24c153df2275c9f80692773 *** checksum after padding = 19739cada3ba281693348e9d256fff31 MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1 *** checksum after padding = 19e29d1b7304368e595a276f302f57cc MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb *** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0 *** checksum after padding = 539ba695f264f365bcabc5c8b10913c7 MD2 ("abcdefghijklmnopqrstuvwxyz") = 65182bb8c569485fcba44dbc66a02b56 *** checksum after padding = 365fe0617f5f56a56090af1cfd6caac3 MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") = a1ccc835ea9654d6a2926c21f0b20813 *** checksum after padding = 9acf39425d22c4e3b4ddbdc563d23716 MD2 ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") = 8f1f49dc8de490b9aa7c99cec3fbccdf As you can see, the checksums start to go wrong when we hit 16 bytes. So, let us turn our attention to the checksum function. Here's the python i wrote: def checksum_old(c, buf): # c is checksum array, buf is input block l = c[-1] for i in xrange(digest_size): l = S[(buf[i] ^ l)] c[i] = l Here's the C from the RFC: unsigned int i, j, t; t = checksum[15]; for (i = 0; i < 16; i++) t = checksum[i] ^= PI_SUBST[block[i] ^ t]; Spot the difference. Yes, the assignment into the checksum array is a ^=, not a straight = - checksum bytes get set to current-value-of-checksum-byte xor S-box-transformation-of (input-byte xor accumulator). Translating that into python, we get: def checksum(c, buf): l = c[-1] for i in xrange(digest_size): l = S[(buf[i] ^ l)] ^ c[i] c[i] = l And when we put that back into the code, we get the right digests out. Victory! However, here's what the pseudocode in the RFC says: For j = 0 to 15 do Set c to M[i*16+j]. Set C[j] to S[c xor L]. Set L to C[j]. end /* of loop on j */ I certainly don't see any sign of a xor with the current-value-of-checksum-byte in there - it looks like the C and pseudocode in the RFC don't match up. And, yes, googling for "RFC 1319 errata" brings up a report correcting this. They really ought to amend RFCs to mention errata! Correct code here: http://urchin.earth.li/~twic/md2.py tom -- Mathematics is the door and the key to the sciences. -- Roger Bacon -- http://mail.python.org/mailman/listinfo/python-list