On Mon, 5 Sep 2005, H. Peter Anvin wrote: > > It would also hade the somewhat interesting possibility that one could > "remove and recreate" a file and have it exist as a different entity. > That probably needs to be a user option.
It's a totally broken model. Really. You think it solves issues, but it just creates more bugs and problems than it solves. Trust me. The whole point of git is that "content is the only thing that matters", and that there isn't any other meta-data. If you break that fundamental assumption, everything git does so well will break. I think we've already shown that the "content matters" approach works. I claim that the git rename tracking works better than any other SCM out there, _exactly_ because it doesn't make the mistake of trying to track anything but content. The "moved + modified files" is not anything special. The current automatic merger may not handle it, but that's not because it _can't_ handle it, it's because it tries to be simple and efficient. And because it's so _incredibly_ fast for all the normal cases, you can now spend some effort on figuring out renames dynamically for the few cases where it fails. Does it do so now? No. Would adding UUID's help? Hell no. It would be just an unmitigated disaster. Exactly the same way "git-diff-tree" can figure out renames, a merge algorithm can figure them out. Right now, we have two stages in merges: we try the trivial merge first (pure "git-read-tree"), and when that fails, we try the automatic 3-way merge. The fact that we don't have a third (and fourth, and fifth) merge algorithm for when those two trivial merges happen to not work is _not_ an indication that the "contents only" approach doesn't work - it's just an indication of the fact that 99.9% of all merges are trivial, and they should be optimized for. So the next step is _not_ to do UUID's, it's to notice that merge errors happened, and try to figure out why. Right now we just give up and say "sort it out by hand". That's actually a perfectly valid approach even in the presense of moved files - it's a bit painful, but once you _do_ sort it out and commit the merge, especially if you can push the merge back (so that both sides then agree on the final rename), future merges will be trivial again - ie you won't have to go through it over and over again. Of course, if you don't push it back, but keep the two trees separate and keep on modifying files that have different names in the other repository, you'll keep on getting into the situation that the trivial merge doesn't work. So we _do_ want to get an automated "phase 3" (and maybe 4..) merge that can figure out renames, but the point here is that it's something we _can_ figure out. For example, one way of doing it is to just do the exact merge we do now, and then look at the files that didn't merge. Do a cross-diff between such files and new/deleted files (if not _exactly_ the way we do for "git diff -M", then at least it's exactly the same concept), and try to do a three-way merge where the base/first/second pairs don't have the same name. For example, let's say that you have the common commit A, and file "x", and two paths (B and C) where B has renamed the file "x" to "y", and C has modified file "x". You end up with the schenario that our trivial merge fails to handle, and right now we give up, and don't help the user very much at all. But the _solution_ is not to change "read-tree" to know about renames, nor is it to make git keep any new data. The solution is to just make phase 3 say: - "Automatic merge failed, trying rename merge" - go through all files that exist in C but not in B (or vice versa), and pair them up with all files that exist in B but not in C (or vice versa) and see if _they_ can be handled as a three-way merge. And exactly the same way that we do the rename detection, we may want to find the "optimal pairing" by looking at the distance between the files. Notice? This will automatically handle the "renamed in one branch, modified in another" case. In fact, if the renamer modified it too, that's not a problem at all - the three-way merge will work exactly the same way it does now with the case of a non-moved "modified in both" files. Problem solved. Without complicating the trivial (and very common) cases, and without introducing any new metadata that is fundamentally impossible to maintain (and it _is_ fundamentally impossible to maintain, because it has nothing to do with the contents of the files, so patches etc will by definition break it). Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html