On Thu, 14 Apr 2005, Junio C Hamano wrote:
>
> You say "merge these two trees" above (I take it that you mean
> "merge these two trees, taking account of this tree as their
> common ancestor", so actually you are dealing with three trees),
Yes. We're definitely talking three trees.
> and I am tending to agree with the notion of merging trees not
> commits. However you might get richer context and more sensible
> resulting merge if you say "merge these two commits". Since
> commit chaining is part of the fundamental git object model you
> may as well use it.
Yes and no. There are real advantages to using the commit state to just
figure out the trees, and then at least have the _option_ to do the merge
at a pure tree object.
In particular, if you ever find yourself wanting to graft together two
different commit histories, that almost certainly is what you'd want to
do. Somebody might have arrived at the exact same tree some other way,
starting with a 2.6.12 tar.ball or something, and I think we should at
least support the notion of saying "these two totally unrelated commits
actually have the same base tree, so let's merge them in "space" (ie data)
even if we can't really sanely join them in "time" (ie "commits").
I dunno.
And it's also a question of sanity. The fact is, we know how to make tree
merges unambiguous, by just totally ignoring the history between them. Ie
we know how to merge data. I am pretty damn sure that _nobody_ knows how
to merge "data over time". Maybe BK does. I'm pretty sure it actually
takes the "over time" into account. But My goal is to get something that
works, and something that is reliable because it is simple and it has
simple rules.
As you say:
> This however opens up another set of can of worms---it would
> involve not just three trees but all the trees in the commit
> chain in between.
Exactly. I seriously believe that the model is _broken_, simply because
it gets too complicated. At some point it boils down to "keep it simple,
stupid".
> That's when you start wondering if it would
> be better to add renames in the git object model, which is the
> topic of another thread. I have not formed an opinion on that
> one myself yet.
I've not even been convinved that renames are worth it. Nobody has really
given a good reason why.
There are two reasons for renames I can think of:
- space efficiency in delta-based trees. This is a total non-issue for
git, and trying to explicitly track renames is going to cause _more_
space to be wasted rather than less.
- "annotate". Something git doesn't really handle anyway, and it has
little to do with renames. You can fake an annotate, but let's face it,
it's _always_ going to be depending on interpreting a diff. In fact,
that ends up how traditional SCM's do it too - they don't really
annotate lines, they just interpret the diff.
I think you might as well interpret the whole object thing. Git _does_
tell you how the objects changed, and I actually believe that a diff
that works in between objects (ie can show "these lines moved from this
file X to tjhat file Y") is a _hell_ of a lot more powerful than
"rename" is.
So I'd seriously suggest that instead of worryign about renames, people
think about global diffs that aren't per-file. Git is good at limiting
the changes to a set of objects, and it should be entirely possible to
think of diffs as ways of moving lines _between_ objects and not just
within objects. It's quite common to move a function from one file to
another - certainly more so than renaming the whole file.
In other words, I really believe renames are just a meaningless special
case of a much more interesting problem. Which is just one reason why
I'm not at all interested in bothering with them other than as a "data
moved" thing, which git already handles very well indeed.
So there,
Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html