On Sat, 16 Apr 2005, Petr Baudis wrote:
> Dear diary, on Sat, Apr 16, 2005 at 05:55:37PM CEST, I got a letter > where Simon Fowler <[EMAIL PROTECTED]> told me that... > > > The id is a sha1 hash of the current time and the full path of the > > file being added - the chances of that being replicated without > > malicious intent is extremely small. There are other things that > > could be used, like the hostname, username of the person running the > > program, etc, but I don't really see them being necessary. > > Why not just use UUID? Note that using anything that isn't data-related totally destroys the whole point of the object database. Remember: any time we don't uniquely generate the same name for the same object, we'll waste disk-space. So adding in user/machine/uuid's to the thing is always a mistake. The whole thing depends on the hash being as close to 1:1 with the contents as humanly possible. There's also the issue of size. Yes, I could have chosen sha256 instead of sha1. But the keys would be almost twice as big, which in turn means that the "tree" objects would be bigger, and that the "index" file would be bigger. Is that a huge problem? No. We can certainly move to it if sha1 ever shows itself to be weak. But I really think we are much better off just re-generating the whole tree and history at that point, rather than try to predict the future. The fact is, with current knowledge, sha1 _is_ safe for what git uses it for, for the forseeable future. And we have a migration strategy if I'm wrong. Don't worry about it. Almost all attacks on sha1 will depend on _replacing_ a file with a bogus new one. So guys, instead of using sha256 or going overboard, just make sure that when you synchronize, you NEVER import a file you already have. It's really that simple. Add "--ignore-existing" to your rsync scripts, and you're pretty much done. That guarantees that a new evil blob by the next mad scientist out to take over the world will never touch your repository, and if we make this part of the _standard_ scripts, then dammit, security is in good _practices_ rather than just relying blindly on the hash being secure. In other words, I think we could have used md5's as the hash, if we just make sure we have good practices. And it wouldn't have been "insecure". The fact is, you don't merge with people you don't trust. If you don't trust them, they have a much easier time corrupting your repository by just creating bugs in the code and checking that thing in. Who cares about hash collisions, when you can generate a kernel root vulnerability by just adding a single line of code and use the _correct_ hash for it. So the sha1 hash does not replace _trust_. That comes from something else altogether. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html