I asked:
> > Does anyone know of any other issues in how git data is stored that
> > might cause problems for some situations? ...

Kevin said:
> If git is retaining hex naming, and not moving to base64, then I don't
> think what I am about to say is relevant. However, if base64 file naming
> is still being considered, then vfat32 compatibility may be a concern
> (I'm not sure about NTFS).

I can't speak for the git developers. However, I think the current
naming scheme for the object database as used in git-pasky
is actually a very good one and should be left as-is
(SHA-1 hex values, directory of 2-char prefixes,
filenames with the rest of the value).

As far as I can tell from various calculations (& supported by the
performance measurements done by others), the hex values
with one level of directory turns out to work pretty well!
It's easily understood, works with non-massive projects on stupid
filesystems, and it has good performance on good filesystems
even with massive projects with huge histories.  You could
tune it further, but a single approach that works "everywhere"
is a whole lot simpler.  So I'd recommend keeping that
approach.

As far as base64/32 vs. hex names, I think there
are many reasons to stay with the hex names.
Using hex names is a good idea for the simple reason that
normally SHA-1 hashes are presented as hex values;
you'll work WITH instead of AGAINST other tools, and
humans who deal with this stuff will "see what they expect".
It takes a few more characters, but not many, and it's not
like base64 is any more comprehensible to humans.
And the fact that hex values don't allow "all" legal values
means that some errors are trivially detectable.

You're right, base64 eliminates many bits of differentiation,
and in a very non-obvious way (I _hate_ weird surprises like
that, they cause lots of trouble).  I think there's another
problem too that's more insideous. Although the _filesystem_
is case-preserving, I suspect some _tools_ on Windows don't take
care to preserve case.  If that's so, it'd be easily possible for a
Windows user to use some tools that screw up a Unix/Linux user
once they were imported, causing all sorts of "extraneous" files &
files that mysteriously disappeared (they were only accessible
from Windows). Ugh.
This can even happen on Unix/Linux systems if they use
a fileserver with NTFS semantics. In contrast,
if a hex value has its case changed, it's easy to fix locally.

By choosing the more traditional hex representation, you
eliminate lots of problems, and it's easier to explain too.

Kevin added:
> I'll take this opportunity to support David's position that it would be
> fantastic if git could end up being valuable for a wide range of
> projects, rather than just the kernel. I also fully understand that the
> kernel is the primary target, but when there are opportunities to make
> the data structures more generally useful without causing problems for
> the kernel project, I hope they are taken.

Thanks for the vote of confidence!

--- David A. Wheeler
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to