After I re-read [*R1*], in which Linus talks about dircache,
especially this section:
- The "current directory cache" describes some baseline. In particular,
note the "some" part. It's not tied to any special baseline, and you
can change your baseline any way you please.
So it does NOT have to track any particular state in either the object
database _or_ in your actual current working tree. In fact, all real
interactions with "git" are really about updating this staging area one
way or the other: you might check out the state from it into your
working area (partially or fully), you can push your working area into
the staging area (again, partially or fully).
And if you want to, you can write the thing that the staging area
represents as a "tree" into the object database, or you can merge a
tree from the object database into the staging area.
In other words: the staging area aka "current directory cache" is
really how all interaction takes place. The object database never
interacts directly with your working directory contents. ALL
interactions go through the current directory cache.
I started to have more doubts on the approach of *not*
performing the merge in the dircache I set up specifically for
merging, which is the direction in which you are pushing if I
understand you correctly. Maybe I completely misunderstand what
you want. This message is long but I need a clear understanding
of what is expected to be useful to you, so please bear with me.
PB> merge-tree.pl -b $base $(tree-id) $merged | parse-your-output
Please help me understand this example you have given earlier.
Here is my understanding of your assumption when the above
pipeline takes place. Correct me if I am mistaken.
* The user is in a working directory $W. It is controlled by
git-tools and there are $W/.git/. directory and $W/.git/index
dircache.
* The dircache $W/.git/index started its life as a read-tree
from some commit. The git-tools is keeping track of which
commit it is somewhere, presumably in $W/.git/ directory.
Let's call it $C (commit).
? Question. Is the $(tree-id) in your example the same as $C
above?
* The user have run [*1*] (see Footnote below) checkout-cache
on $W/.git/index some time in the past and $W is full of
working files. Some of them may or may not have modified.
There may be some additions or deletions. So the contents of
the working directory may not match the tree associated with
$C.
* The user may or may not have run [*1*] update-cache in $W.
The contents of the dircache $W/.git/index may not match the
tree associated with $C.
? Question. Are you forbidding the user to run update-cache by
hand, and keeping track of the changes yourself, to be
applied all at once at "git commit" time, thereby
guaranteeing the $W/.git/index to match the tree associated
with $C all times? From the description of The "GIT toolkit"
section in README, it is not clear to me which part of his
repository an end user is not supposed to muck with himself.
* Now the user has some changes in his working directory and
notices upstream or a side branch has notable changes
desireble to be picked up. So he runs some git-tools command
to cause the above quoted pipeline to run.
? Question. Does $merged in your example mean such an upstream
or side branch? Is $base in your example the common ancestor
between $C and $merged?
Assuming that my above understanding of your model is correct,
here are my "thinking aloud".
- "merge-trees $base $C $merged" looks only at the git object
database for those three trees named. The data structure of
git object database is optimized to distinguish differences
in those recorded trees (and hence recorded blobs they point
at) without unpacking most of the files if the changes are
small, because all the blobs involved are already hashed. It
is not very good at comparing things in git object store and
working files in random states, which would involve unpacking
blobs and comparing, so "merge-trees" does not bother.
- What can come out from merge-trees is therefore one of the
following for each path from the union of paths contained in
$base, $C, and $merged:
(a) Neither $C nor $merged changed it --- merge result is what
is in $C.
(b) $C changed it but $merged did not --- merge result is what
is in $C.
(c) Both $C and $merged changed it in the same way --- merge
result is what is in $C.
(d) $C did not change it but $merged did --- merge result is
what is in $merged.
(e) Both $C and $merged changed it differently --- merge is
needed and automatically succeeds between $C and $merge.
(f) Both $C and $merged changed it differently --- merge is
needed but have conflicts.
- Assuming we are dealing with the case where working files are
dirty and do not match what is in $C, among the above,
(a)-(c) can be ignored by SCM. What the user has in his
working files is exactly what he would have got if he started
working from the merge result, although in reality the work
was started from $C.
Handling (d), (e) and (f) from SCM's point of view would be
the same. They all involve 3-way merges between the file in
the working directory, and the file from $merged, pivoting on
the file from $base. In order to help SCM, merge-trees
therefore should output SHA1 of blobs for such a file from
$base and $merged and expect SCM to run "cat-file blob" on
them and then merge or diff3. Up to the point of giving
those two SHA1 out is the business of merge-trees and after
that it is up to SCM.
That would work. So I should base the design of output from
merge-trees on the above analysis, which probably needs to be
extended to cover differences between creation, modification,
and deletion.
- However, the above is quite different from the way Linus
envisioned initially, on which my current implementation is
based [*3*].
My current implementation is to record the merge outcome in
the temporary dircache $W/,,merge/.git/index for cases
(a)-(e). The last case (f) is problematic and needs human
validation [*2*], so it is not recorded in that temporary
dircache, but the files to be merged are left in that
temporary directory and merge-trees stops there. It is
expected that the end-user or SCM would merge the resulting
file and run update-cache to update $W/,,merge/.git/index.
After that happens, $W/,,merge/.git/index has the tree
representing the desired result of the merge. It is expected
that the end-user or SCM would write-tree, commit-tree there
in the temporary directory, creating a new commit $C1.
Then, it is expected that the SCM would make a patch file
between $C and the user working directory, checks out $C1
(either in the user's working directory or another temporary
directory; at this point merge-trees does not care because it
has already done its job and exited), applies that patch to
bring the user edits over to $C1. Then that directory would
contain the desired merge of user edits.
That is my understanding of how Linus originally wanted the
tool to do his kernel work with to work. My hesitation to
suggestions from you to change it not to keep its own merge
dircache is coming from here. Not doing what I am currently
doing to $W/,,merge/.git/index dircache would mean that SCM
would have to do more, not less, to arrive at $C1 (the result
of the clean $merge and $C merge pivoted at $base), where the
real SCM merge begins.
Although I suspect I am misunderstanding what you want, your
messages so far suggest that what you want might be quite
different from what Linus wants. Please do not misunderstand
what I mean by saying this. I am not saying that Linus is
always right [*4*] and therefore you are wrong for wanting
something else. It is just that, if what I started writing
needs to support both of those quite different needs, I need to
know what they are. I think I understand what Linus wants well
enough [*5*], but I am not certain about yours.
[Footnotes]
*1* By "The user have run" I mean either the user directly used
the low-level plumbing command himself, or used git-tools to
cause such command to run.
*2* Strictly speaking, case (e) needs human validation as
well, because successful textual merge does not guarantee
sensible semantic merge.
*3* See [*R2*] for descriptions on the way Linus wanted merge
in git to happen. Especially around "5) At this point you need
to MERGE" onwards. The current implementation handles (or
attempts to handle) the `your working directory was fully
committed' case described there.
*4* According to Linus himself, he is always right ;-). [*R3*]
*5* I consider [*R1*] and [*R2*] essential read for anybody
wanting to understand merging operation in git object model (I
am saying this for others; not for Pasky --- it would be like
preaching to the choir ;-)).
[References]
*R1* <[EMAIL PROTECTED]>
http://marc.theaimsgroup.com/?i=%3CPine.LNX.4.58.0504110928360.1267%20()%20ppc970%20!%20osdl%20!%20org%3E
*R2* <[EMAIL PROTECTED]>
http://marc.theaimsgroup.com/?i=%3CPine.LNX.4.58.0504121606580.4501%20()%20ppc970%20!%20osdl%20!%20org%3E
*R3*
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0008.3/0555.html
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html