On Sat, Aug 2, 2014 at 11:47 AM, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: > - It relies on the checksum being unpredictable, to prevent substitution > attacks: you're expecting object x with checksum a, but somebody > substitutes object y with checksum a instead.
Note that this requirement is only an issue when there are actual attacks involved. In many cases, hashing is used to detect either errors in transfer (eg truncated files) or meaningfully different files (eg MP3s of different songs). A collision between those won't be the result of someone deliberately crafting a file with the right checksum; it'll happen only by chance, so md5sum can be safely used there. But if you're trying to use this to prove that a file was downloaded correctly, you do need to worry about that. If I say that you can download the binary of my program from <this URL>, and that the MD5 checksum is 123456...DEF, then someone could do a DNS hack (cache poisoning, proxy interference, whatever) to capture your attempted download, and send you instead to his own server, where he has a carefully crafted binary that does everything mine does, plus it tells him all your passwords - and it has arbitrary junk buried in it to make sure the MD5 sum matches. So an MD5 checksum is broken for anything from the internet, but is quite usable for certain specific cases. There is one aspect of the unpredictability that's important even in the simple cases, though, and that's the avalanche effect. If anything changes in the file, the whole hash should completely and arbitrarily change. That means you don't need to stare at the whole hash, trying to see if that 8 became a 0; any change to the file will probably make the first few digits visibly different, so it's easily obvious that the hash has changed. ChrisA -- https://mail.python.org/mailman/listinfo/python-list