Re: Revision control

olafBuddenhagen Wed, 25 Jun 2008 21:35:31 -0700

Hi,

On Mon, Jun 23, 2008 at 11:41:17AM +0200, Arne Babenhauserheide wrote:
> Am Montag 16 Juni 2008 19:08:00 schrieb [EMAIL PROTECTED]:

> > Something feeling intuitive depends solely on previous experience.
> > It is *always* subjective.
> 
> There are usability people saying quite the contrary. Have a look at
> http://openusability.org for once. 

I don't expect to find much surprises there, considering that I've
recenly seen two of Ellen's presentations :-)

> A program feeling intuitive depends on the people you write *for*, and
> on workflows for the most often used actions. 

You are confusing things. Usability is *not* the same as intuitiveness.
Intuitiveness is only *one* element of usability. (And IMHO one that is
overrated by most usability people...)

It is perfectly true that usability depends on the target audience. (And
the target audience of Git -- serious programmers -- does match with
that of the Hurd repository...)

Intuitiveness on the other hand does depend only on the user's previous
experience. A Windows user has a (slightly) different idea of it than an
Apple user, which in turn differs from a CDE user etc.

Mercurial's interface *might* be more intuitive than Git for CVS users
(I can't tell); but that doesn't necessarily mean that overall usability
is better.

> > I don't know what "hg up" does -- but if it indeed follows CVS, it
> > is something entirely different... "cvs up" roughly translates to
> > "git-pull". (Or "git-fetch && git-rebase origin" if you want to
> > avoid clobbering history when you have local changes.)
> 
> it does what "svn up" does, but locally. 
> 
> "hg pull" gets the changes from soemwhere else. 
> 
> "hg up" updates the files you see. 

I see.

> > > > And in fact, the git developers never cease to point out that
> > > > such interfaces can be easily created for git. The fact that
> > > > very little actually exists in this way, 
> 
> Uh, what about Cogito? 
> 
> It isn't developed anymore, but it was created and used, so the
> problem did exist, and even though the situation became better, git
> still requires much learning new commands from developers (as by your
> own words). 

>From what I gathered, the standard interface of Git itself didn't
initially offer a complete set of comfortable high-level commands. Now
it does, and Cogito became obsolete.

Or maybe it just proves my point that a stronger abstraction than the
one offered by the standard interface is not really what people want in
the long run... :-)

> Differently put: "People can get used to any workflow, and once they
> got used to it, the workflow will feel efficient to them." 
> 
> Until they try out other workflows where there may be more efficient
> ones. 

Which is precisely why I like Git for easily allowing all kinds of
workflows -- including ones the developers hadn't explicitelly
considered :-)

> > Eh? Why would accessing a few objects in a single pack *ever* be
> > less efficient than accessing the same objects in per-file
> > structures?
> 
> You have to open the whole pack to get a few objects. When the changes
> are only in a few files, getting the files will require far less data
> to read. 

open() is mostly a no-op, except for some bookkeeping; the file size
doesn't matter at all. If by "open" you actually mean reading the whole
contents of the pack: I'm pretty certain Git doesn't do that. Would be
rather stupid.

I don't know how Mercurial stores branches; but I very much hope that it
doesn't need to read all the stuff from unrelated side branches ond/or
ancient history when accessing a file, either... Otherwise, its
efficiency must be *much* worse than of Git.

> > Sanitizing history is important in any distributed project,
> > regardless of the workflow.
> 
> How often? 

All the time...

> For the Linux kernel, you need to "sanitize" yourself, but in smaller
> projects, the individual history might be very interesting to other
> developers. 

Sorry, this is bullshit. There is absolutely no reason why any project
-- small or large -- should have the history littered with individual
developer's meandering. It has no value whatsoever, and only makes it
harder to understand.

Making it impossible (or harder) to clean up the history, only
discourages using version control aggressively. In fact, if you think
all detours should be visible in the history, you can just as well stick
with a centralized system and upload every change immediately...

> In Git I have to care for the store from time to time, to avoid it
> getting inefficient. so it gets in my way. 
> 
> I don't want to have to care for my tool. 
> 
> It's not my child. It's a tool. 
> 
> I want to care for the programs I write with it, instead. 

Sorry, my bad: I was thinking we were discussing practical merits, not
philosophy :-P

Seriously, garbage collection in Git is hardly a burden worth mentioning
-- it isn't needed nearly as often as to make it into one.

> > > If at some time the repository grows too big and you didn't ever
> > > gc before, I will have to access the whole project history to just
> > > get the few changes you did after I last pulled your code.
> > 
> > I have a growing suspicion that you do not really understand how
> > git's object store works...
> 
> As far as I know the whole repository will be packed, that's why. 

So?

> I read up on how it works, but maybe they changed it since then. 
> 
> For network access they seem to have fixed the issue. Today it creates
> a custom pack, whenver you pull changes. 

I seriously doubt it was ever different -- except maybe in the very
beginnig, when even the basic functionality was still under
construction...

> > But anyways, if you really need to, there is an option to repack
> > existing packs.
> 
> And create a single big pack with the disadvantages I wrote about
> above. 

The *only* situation where too big packs are a problem is when someone
is pulling stuff through a dumb transport.

And anyways, there is an option to limit the size of the generated
packs, if you care.

> > I only can say that in spite of various attempts, AFAIK versioned
> > filesystems never got implemented in any mainstream system except
> > VMS...
> ...
> > And this is still totally unrelated to the question which version
> > control system to use for the Hurd repository...
> 
> It is related to the question, why I think that having to gc is a bad
> idea. 

So the argument goes like: Garbage collection would pose serious
problems in some hypothetic use case -> garbage collection is bad -> Git
is bad -> we shouldn't use Git for the Hurd repository, even if it has
no relation whatsoever with the problematic use case?

Don't you think this is a bit silly?...

> > The point is that in git, like in UNIX shell, the interface is not
> > abstracted from the internals -- which means a somewhat steeper
> > learning curve, but offers a lot of advantages in the long run --
> > not excepting usability.
> 
> Usability for whom? 

Usability for serious programmers, who work with the version control
every day; have their individual workflows, and need to do untypical
things sometimes.

On Mon, Jun 23, 2008 at 11:41:32AM +0200, Arne Babenhauserheide wrote:
> Am Sonntag 22 Juni 2008 02:44:25 schrieb [EMAIL PROTECTED]:

> I tried to get the point accross, that in my view the "unabstracted
> interface" doesn't really provide you with additional flexibility, but
> rather costs more that it brings. 

We will have to agree to disagree.

> Can I use 

> git commit -a -m "blah and foo" blah.c foo.py

Well, without the "-a" -- it means "all", which wouldn't make sense with
explicitely specified file name :-)

Aside from that, yes.

> > Anyways, I don't quite understand what you are getting at. How is
> > this related to large projects?
> 
> In a large project where you only maintain a small part of the whole
> tree, you're likely to want to commit only the parts 

By that definition, Hurd must be a large project indeed... ;-)

Anyways, partial commits have absolutely nothing to do with that. Parts
of the tree you don't touch, won't ever be changed in your working copy,
so there is nothing to omit in the commit...

Partial commits are useful if you have several individual changes in
your working copy, and don't want to put them all in one commit. (There
are any number of situations where this could happen: You had some
earlier changes you forgot to commit. While working on something, you
realized you could improve something else. (E.g. a comment -- happens
all the time to me...) You were making several changes you wanted to
test together. While making your changes you made some hacks for testing
that you don't mean to commit. And so on.)

Of course, you can always resolve such situations using temporary
branches and undoing unrelated changes... But partial commits are often
much, much easier.

All of this is totally unrelated to the size of the project.

> > > # And now the same using the provided shorthands in Mercurial,
> > > saving much typing.
> >
> > If the "efficiency" of Mercurial's interface is manifested mostly in
> > saving a few characters here and there, I'm deeply unimpressed.
> 
> That's what is called "usability". Archieving the same, but more
> comfortably and easier to learn for new users. 

Shorthand aliases are certainly *not* easier to learn.

As for comfort, that is debatable:

> With git I have to type these "few characters" in every single version
> tracking action I do, and I do many of those, so it accounts for much
> lost time and effort. 
> 
> I commit every time I do an untrivial change, so I can later track
> what I did, so the few strokes become rather much on the long run. 

I on my part seldom type long commands by hand -- usually I get them
from shell history. For things I do really often, I can always create a
shell alias. You could alias "gca" to "git-commit -a" and "gu" to
"git-checkout -m" for example... That beats even the hg variants ;-)

I have no strong opinion on the shorthands really. When I first saw the
lack of them while learning Git, I was rather shocked; but when I
actually started using it, I found that it doesn't really bother me at
all...

> > Indeed, Git's UI makes a clean cut for the most part. The result is
> > that aside from a few glitches, it is actually remarkably logical
> > and consistent (once you ctake the trouble to learn it properly) in
> > a way that nothing trying to resemble CVS can ever be -- CVS is
> > already inconsistent in itself, and trying to preserve the interface
> > in a distributed system can't really work very well IMHO.
> 
> You didn't yet try out Mercurial, yet you say it can't work well. 
[...]
> Having similar commands will show that many concepts of distributed
> systems are very similar to centralized ones, and that's a good thing
> for me. 

Well, if we look at the example of "hg up" above, I intuitively expected
it to do something different...

The truth is that the concepts are *not* similar. What is a single step
(sync working copy with repository) in centralized systems, turns into
two distinct steps (sync working copy with local repository, sync local
repository with other repositories) in distributed systems. And this
distinction manifests in almost all actions. It is a totally different
workflow; it's simply impossible to map the commands from one to the
other. The "hg up" example demonstrates this quite clearly.

(Well, perhaps it was rather stupid of me, and I could have guessed what
"hg up" does if I thought about it more... But at least it shows that
it's not really intuitive.)

> "git checkout" updates your working files, so why not call it "git
> update"? Because what it does in technical terms is checkout the files
> from the local repository. What it does for the user is update the
> working files, though, and asking for that is what people are used to. 

It's not as simple as that.

In CVS, "checkout" is used to create the initial working copy, but
"update" is later used to pull in subsequent changes from the
repository. However, "checkout" can *also* be used for that! (I don't
remember what the exact difference was. I think something about
directory handling. Anyways, it was a rather tiny difference.)

So, "git-checkout" is actually very similar to "cvs checkout" in that it
can do both. (Except that "cvs checkout" automatically merges, while
"git-checkout" requires giving "-m" explicitely.) This really makes
sense: Both actions check out stuff from the repository to the working
copy, only that in the second case local changes are merged.

Of course, it would be possible to introduce "git-update" as an alias
for "git-checkout -m". But that would only bloat the command set; and
worse, it would hide the fact that the commands are essentially the
same, thus preventing the user from gaining a true understanding.

> > > Did you try out Mercurial now?
> >
> > No, didn't need it so far.
> >
> > As much as I'd like to know it to be able to have a better founded
> > discussion, this is not really reason enough to learn it :-)
> 
> Just try it for 15 minutes. You might be surprised how quick you
> become familiar with it. 

I might do that if I need just to check out the code of some project
using Mercurial. But that's certainly *not* what I would do if I ever
want to seriously work with a Mercurial repository, or if I want to
argue about it seriously. For that, I would want to learn it properly,
reading at least the main manual from end to end.

My whole point was that I do *not* judge by how familiar a tool seems
after 15 minutes, but how powerful it turns out in the long run.

There are different use cases of course. If I wanted to introduce
version control for a designer team or for executives or so, I'd
probably go for Mercurial. But here we are talking about serious
programmers -- people who are able to understand and appreciate a less
abstract interface.

-antrik-

Re: Revision control

Reply via email to