On Sun, May 27, 2018 at 08:31:14AM +0900, Junio C Hamano wrote:
> > So I actually would much prefer that foir git gc, "--prune=now" means
> >
> > (a) "now"
> >
> > (b) now at the _start_ of the "git gc" operation, not the time at
> > the _end_ of the operation when we've already spent a minute or
> > two doing repacking and are now doing the final pruning.
> >
> > anyway, with that explanation in mind, I'm appending a patch that is
> > pretty small and does that. It's a bit hacky, but I think it still makes
> > sense.
> >
> > Comments?
>
> Closing the possiblity of racing a running "gc" and new object
> creation like the above generally makes sense, I would think,
> whether the creation is due to 'pull/fetch', 'add', or even 'push'.
I think Linus's suggestion is an obvious improvement. It does shorten
the window for confusing things to happen, and I think it makes things
much easier to reason about if all parts of the gc are using the same
timestamp.
Regarding the implementation:
> > - if (prune_expire && parse_expiry_date(prune_expire, &dummy))
> > - die(_("failed to parse prune expiry value %s"), prune_expire);
> > + if (prune_expire) {
> > + if (!strcmp(prune_expire, "now"))
> > + prune_expire = show_date(time(NULL), 0,
> > DATE_MODE(ISO8601));
> > + if (parse_expiry_date(prune_expire, &dummy))
> > + die(_("failed to parse prune expiry value %s"),
> > prune_expire);
> > + }
We'd also accept relative times like "5.minutes.ago" (in fact, the
default is a relative 2.weeks.ago, though it's long enough that the
difference between "2 weeks" and "2 weeks plus 5 minutes" may not matter
much). So we probably ought to just normalize _everything_ without even
bothering to match "now". It's a noop for non-relative times, but that's
OK.
> I however have to wonder if there are opposite "oops" end-user
> operation we also need to worry about, i.e. we are doing a large-ish
> fetch, and get bored and run a gc fron another terminal. Perhaps
> *that* is a bit too stupid to worry about? Auto-gc deliberately
> does not use 'now' because it wants to leave a grace period to avoid
> exactly that kind of race.
There are still possibilities for a race, even with the grace period.
You can have an unreferenced 2-week-old object sitting on disk, and
somebody can choose to reference it at the same time as we are pruning
it. My freshness patches from a few years ago made things a bit better:
- when we optimize out the write of an existing object, we now at
least update its timestamp
- we consider non-fresh objects reachable from fresh ones to also be
fresh
But fundamentally none of this is atomic. You can have an old tree, and
while you're pruning somebody writes a new commit referencing it and
sticks that in a ref. It's more common if your grace period is "now",
but it can still happen with any grace period.
-Peff