On 08/12/2015 11:57 PM, David Turner wrote:
> Instead of a linear search over common_list to check whether
> a path is common, use a trie.  The trie search operates on
> path prefixes, and handles excludes.
> 
> Signed-off-by: David Turner <dtur...@twopensource.com>
> ---
> 
> Probably overkill, but maybe we could later use it for making exclude
> or sparse-checkout matching faster (or maybe we have to go all the way
> to McNaughton-Yamada for that to be truly worthwhile).

Let's take a step back.

We have always had a ton of code that uses `git_path()` and friends to
convert abstract things into filesystem paths. Let's take the
reference-handling code as an example:

`git_path("refs/heads/master")` returns something like
".git/refs/heads/master", which happens to be the place where we would
store a loose reference with that name. But in reality,
"refs/heads/master" is a reference name, not a fragment of a path. It's
just that the reference code knows that the transformation done by
`git_path()` *accidentally* happens to convert a reference name into the
name of the path of the corresponding loose reference file.

In fact, the reference code is even smarter than that. It knows that
within submodules, `git_path()` does *not* do the right mapping. In
those cases it calls `git_path_submodule()` instead.

But now we have workspaces, and things have become more complicated.
Some references are stored in `$GIT_DIR`, while others are stored in
`$GIT_COMMON_DIR`. Who should know all of these details?

The current answer is that the reference-handling code remains (mostly)
ignorant of workspaces. It just stupidly calls `git_path()` (or
`git_path_submodule()`) regardless of the reference name. It is
`git_path()` that has grown the global insight to know which files are
now stored in `$GIT_COMMON_DIR` vs `$GIT_DIR`. Now it helpfully
transforms "refs/heads/master" into "$GIT_COMMON_DIR/refs/heads/master"
but transforms "refs/worktree/foo" into "$GIT_DIR/refs/worktree/foo". It
has developed similar insight into lots of other file types. IT KNOWS
TOO MUCH. And because of that, it become a lot more complicated and
might even be a performance problem.

This seems crazy to me. It is the *reference* code that should know
whether a particular reference should be stored under `$GIT_DIR` or
`$GIT_COMMON_DIR`, or indeed whether it should be stored in a database.

We should have two *stupid* functions, `git_workspace_path()` and
`git_common_path()`, and have the *callers* decide which one to call.

The only reason to retain a knows-everything `git_path()` function is as
a crutch for 3rd-party applications that think they are clever enough to
grub around in `$GIT_DIR` at the filesystem level. But that should be
highly discouraged, and we should make it our mission to provide
commands that make it unnecessary.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to