Tom Anderson wrote: > On Sun, 8 Jan 2006, Tom Anderson wrote: > > > On Fri, 6 Jan 2006 [EMAIL PROTECTED] wrote: > > > >> below you find my simple python version of MD2 algorithm as described > >> in RFC1319 (http://rfc1319.x42.com/MD2). It produces correct results > >> for strings shorter than 16 Bytes and wrong results for longer strings. > > > > I guess the thing to do is extract the C code from the RFC and compile > > it, verify that it works, then stick loads of print statements in the C > > and the python, to see where the states of the checksum engines diverge. > > Okay, i've done this. I had to fiddle with the source a bit - added a > #include "global.h" to md2.h (it needs it for the PROTO_LIST macro) and > took the corresponding includes out of md2c.c and mddriver.c (to avoid > duplicate definitions) - but after that, it built cleanly with: > > gcc -DMD=2 *.c *.h -o mddriver > > A couple of pairs of (somewhat spurious) parentheses in mddriver.c, and it > even built cleanly with -Wall. > > Running the test suite with mddriver -x gives results matching the test > vectors in the RFC - a good start! > > Patching the code to dump the checksums immediately after updating with > the pad, and before updating with the checksum: > > *** checksum after padding = 623867b6af52795e5f214e9720beea8d > MD2 ("") = 8350e5a3e24c153df2275c9f80692773 > *** checksum after padding = 19739cada3ba281693348e9d256fff31 > MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1 > *** checksum after padding = 19e29d1b7304368e595a276f302f57cc > MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb > *** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b > MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0 > *** checksum after padding = 4a42d3a377b7e9988fb9289699e4d3a3 > MD2 ("abcdefghijklmnopqrstuvwxyz") = 4e8ddff3650292ab5a4108c3aa47940b > *** checksum after padding = c3db7592ee1dd9b84505cfb4e2f9a765 > MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") = > da33def2a42df13975352846c30338cd > *** checksum after padding = 59ca5673c8f931bc41214f56b5c6c01 > MD2 > ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") > = d5976f79d83d3a0dc9806c3c66f3efd8 > > And here's my python code with the same modification, running the test > suite: > > *** checksum after padding = 623867b6af52795e5f214e9720beea8d > MD2 ("") = 8350e5a3e24c153df2275c9f80692773 > *** checksum after padding = 19739cada3ba281693348e9d256fff31 > MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1 > *** checksum after padding = 19e29d1b7304368e595a276f302f57cc > MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb > *** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b > MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0 > *** checksum after padding = 539ba695f264f365bcabc5c8b10913c7 > MD2 ("abcdefghijklmnopqrstuvwxyz") = 65182bb8c569485fcba44dbc66a02b56 > *** checksum after padding = 365fe0617f5f56a56090af1cfd6caac3 > MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") = > a1ccc835ea9654d6a2926c21f0b20813 > *** checksum after padding = 9acf39425d22c4e3b4ddbdc563d23716 > MD2 > ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") > = 8f1f49dc8de490b9aa7c99cec3fbccdf > > As you can see, the checksums start to go wrong when we hit 16 bytes. > > So, let us turn our attention to the checksum function. > > Here's the python i wrote: > > def checksum_old(c, buf): # c is checksum array, buf is input block > l = c[-1] > for i in xrange(digest_size): > l = S[(buf[i] ^ l)] > c[i] = l > > Here's the C from the RFC: > > unsigned int i, j, t; > t = checksum[15]; > for (i = 0; i < 16; i++) > t = checksum[i] ^= PI_SUBST[block[i] ^ t]; > > Spot the difference. Yes, the assignment into the checksum array is a ^=, > not a straight = - checksum bytes get set to > current-value-of-checksum-byte xor S-box-transformation-of (input-byte xor > accumulator). Translating that into python, we get: > > def checksum(c, buf): > l = c[-1] > for i in xrange(digest_size): > l = S[(buf[i] ^ l)] ^ c[i] > c[i] = l > > And when we put that back into the code, we get the right digests out. > Victory! > > However, here's what the pseudocode in the RFC says: > > For j = 0 to 15 do > Set c to M[i*16+j]. > Set C[j] to S[c xor L]. > Set L to C[j]. > end /* of loop on j */ > > I certainly don't see any sign of a xor with the > current-value-of-checksum-byte in there - it looks like the C and > pseudocode in the RFC don't match up. > > And, yes, googling for "RFC 1319 errata" brings up a report correcting > this. They really ought to amend RFCs to mention errata! > > Correct code here: > > http://urchin.earth.li/~twic/md2.py > > tom > > -- > Mathematics is the door and the key to the sciences. -- Roger Bacon
Hi Tom, thank you very much for your analysis and your solution. Great. (My knowledge of C language is not that good.) Wolfgang -- http://mail.python.org/mailman/listinfo/python-list