James Melvin <[email protected]> writes:
> These options are designed to prevent stale file handle exceptions
> during git operations which can happen on users of NFS repos when
> repacking is done on them. The strategy is to preserve old pack files
> around until the next repack with the hopes that they will become
> unreferenced by then and not cause any exceptions to running processes
> when they are finally deleted (pruned).
I find it a very sensible strategy to work around NFS, but it does
not explain why the directory the old ones are moved to need to be
configurable. It feels to me that a boolean that causes the old
ones renamed s/^pack-/^old-&/ in the same directory (instead of
pruning them right away) would risk less chances of mistakes (e.g.
making "preserved" subdirectory on a separate device mounted there
in a hope to reduce disk usage of the primary repository, which
may defeat the whole point of moving the still-active file around
instead of removing them).
And if we make "preserve-old" a boolean, perhaps the presence of
"prune-preserved" would serve as a substitute for it, iow, perhaps
we may only need --prune-preserved option (and repack.prunePreserved
configuration variable)?
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 677bc7c81..f1a0c97f3 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -10,8 +10,10 @@
>
> static int delta_base_offset = 1;
> static int pack_kept_objects = -1;
> +static int preserve_oldpacks = 0;
> +static int prune_preserved = 0;
We avoid initializing statics to 0 or NULL and instead let BSS take
care of them...
> static int write_bitmaps;
> -static char *packdir, *packtmp;
> +static char *packdir, *packtmp, *preservedir;
... just like what you did here.
> @@ -108,6 +110,27 @@ static void get_non_kept_pack_filenames(struct s
> ...
> +static void preserve_pack(const char *file_path, const char *file_name,
> const char *file_ext)
> +{
> + char *fname_old;
> +
> + if (mkdir(preservedir, 0700) && errno != EEXIST)
> + error(_("failed to create preserve directory"));
You do not want to do the rest of this function after issuing this
error, no? Because ...
> +
> + fname_old = mkpathdup("%s/%s.old-%s", preservedir, file_name,
> ++file_ext);
> + rename(file_path, fname_old);
... this rename(2) would fail, whose error return you would catch
and act on.
> + free(fname_old);
> +}
> +
> +static void remove_preserved_dir(void) {
> + struct strbuf buf = STRBUF_INIT;
> +
> + strbuf_addstr(&buf, preservedir);
> + remove_dir_recursively(&buf, 0);
This is a wrong helper function to use on files and directories
inside .git/; the function is about removing paths in the working
tree.
> @@ -121,7 +144,10 @@ static void remove_redundant_pack(const char *dir_name,
> const char *base_name)
> for (i = 0; i < ARRAY_SIZE(exts); i++) {
> strbuf_setlen(&buf, plen);
> strbuf_addstr(&buf, exts[i]);
> - unlink(buf.buf);
> + if (preserve_oldpacks)
> + preserve_pack(buf.buf, base_name, exts[i]);
> + else
> + unlink(buf.buf);
OK.
> @@ -194,6 +220,10 @@ int cmd_repack(int argc, const char **argv, const char
> *prefix)
> N_("maximum size of each packfile")),
> OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects,
> N_("repack objects in packs marked with
> .keep")),
> + OPT_BOOL(0, "preserve-oldpacks", &preserve_oldpacks,
> + N_("move old pack files into the preserved
> subdirectory")),
> + OPT_BOOL(0, "prune-preserved", &prune_preserved,
> + N_("prune old pack files from the preserved
> subdirectory after repacking")),
> OPT_END()
> };
>
> @@ -217,6 +247,7 @@ int cmd_repack(int argc, const char **argv, const char
> *prefix)
>
> packdir = mkpathdup("%s/pack", get_object_directory());
> packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
> + preservedir = mkpathdup("%s/preserved", packdir);
>
> sigchain_push_common(remove_pack_on_signal);
>
> @@ -404,6 +435,9 @@ int cmd_repack(int argc, const char **argv, const char
> *prefix)
>
> /* End of pack replacement. */
>
> + if (prune_preserved)
> + remove_preserved_dir();
I am not sure if I understand your design. Your model looks to me
like there are two modes of operation. #1 uses "--preserve-old" and
sends old ones to purgatory instead of removing them and #2 uses
"--prune-preserved" to remove all the ones in the purgatory
immediately. A few things that come to my mind immediately:
* When "--prune-preseved" is used, it removes both ancient ones and
more recent ones indiscriminately. Would it make more sense to
"expire" only the older ones while keeping the more recent ones?
* It appears that the main reason you would want --prune-preserved
in this design is after running with "--preserve-old" number of
times, you want to remove really old ones that have accumulated,
and I would imagine that at that point of time, you are only
interested in repack, but the code structure tells me that this
will force the users to first run a repack before pruning.
I suspect that a design that is simpler to explain to the users may
be to add a command line option "--preserve-pruned=<expiration>" and
a matching configuration variable repack.preservePruned, which
defaults to "immediate" (i.e. no preserving), and
- When the value of preserve_pruned is not "immediate", use
preserve_pack() instead of unlink();
- At the end, find preserved packs that are older than the value in
preserve_pruned and unlink() them.
It also may make sense to add another command line option
"--prune-preserved-packs-only" (without matching configuration
variable) that _ONLY_ does the "find older preserved packs and
unlink them" part, without doing any repack.