Re: [PATCH v2 08/21] read_packed_refs(): read references with minimal copying
On 09/20/2017 08:27 PM, Jeff King wrote: > On Tue, Sep 19, 2017 at 08:22:16AM +0200, Michael Haggerty wrote: > >> Instead of copying data from the `packed-refs` file one line at time >> and then processing it, process the data in place as much as possible. >> >> Also, instead of processing one line per iteration of the main loop, >> process a reference line plus its corresponding peeled line (if >> present) together. >> >> Note that this change slightly tightens up the parsing of the >> `parse-ref` file. Previously, the parser would have accepted multiple > > s/parse-ref/packed-refs/, I assume Thanks; will fix. > The patch itself looks good, though I did notice an interesting tangent. > >> +if (eof - pos < GIT_SHA1_HEXSZ + 2 || >> +parse_oid_hex(p, &oid, &p) || >> +!isspace(*p++)) >> +die_invalid_line(refs->path, pos, eof - pos); > > I wondered why you didn't just check the output of parse_oid_hex(), and > included the length check (since in the long run we'd like to get rid of > uses of the static GIT_SHA1_HEXSZ macro). I imagine the answer is that > this is an mmap'd buffer, and we can't guarantee that parse_oid_hex() > wouldn't walk off the end of it. Yes. > That's fine for now, but I suspect it may become a problem when we move > to having a second hash function with a different length. You can't just > say "it must have as many bytes as the longest hash", because of course > we could have the shorter hash at the end of the buffer. But we also > can't say "it must have as many bytes as the shortest hash", because if > the content implies it's a longer hash, we'd read off the end of the > buffer. > > I think in the long run we will need a parse_oid_hex() function that > takes a ptr/len (or start/end) pair. Yes, that makes sense. > [...] Michael
Re: [PATCH v2 02/21] prefix_ref_iterator: break when we leave the prefix
On 09/20/2017 10:25 PM, Stefan Beller wrote: > On Mon, Sep 18, 2017 at 11:22 PM, Michael Haggerty > wrote: >> [...] >> +/* Return -1, 0, 1 if refname is before, inside, or after the prefix. */ >> +static int compare_prefix(const char *refname, const char *prefix) >> +{ >> + while (*prefix) { >> + if (*refname != *prefix) >> + return ((unsigned char)*refname < (unsigned >> char)*prefix) ? -1 : +1; > > This looks interesting. > > We get a signed char* , cast it to unsigned char* and then > compare byte by byte. Not quite. We get a `char *` of unknown signedness (it's implementation-dependent), dereference it into a `char`, then cast that value to `unsigned char`. What you described would be *(const unsigned char *)refname < *(const unsigned char *)prefix But I assume that these two variants would result in identical assembly code. > [...] Michael
[PATCH v3 2/4] filter-branch: preserve and restore $GIT_AUTHOR_* and $GIT_COMMITTER_*
These are modified by set_ident() but a subsequent patch would like to operate on their original values. Signed-off-by: Ian Campbell --- git-filter-branch.sh | 33 + 1 file changed, 33 insertions(+) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 3da281f8a..9edb94206 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -219,6 +219,13 @@ trap 'cd "$orig_dir"; rm -rf "$tempdir"' 0 ORIG_GIT_DIR="$GIT_DIR" ORIG_GIT_WORK_TREE="$GIT_WORK_TREE" ORIG_GIT_INDEX_FILE="$GIT_INDEX_FILE" +ORIG_GIT_AUTHOR_NAME="$GIT_AUTHOR_NAME" +ORIG_GIT_AUTHOR_EMAIL="$GIT_AUTHOR_EMAIL" +ORIG_GIT_AUTHOR_DATE="$GIT_AUTHOR_DATE" +ORIG_GIT_COMMITTER_NAME="$GIT_COMMITTER_NAME" +ORIG_GIT_COMMITTER_EMAIL="$GIT_COMMITTER_EMAIL" +ORIG_GIT_COMMITTER_DATE="$GIT_COMMITTER_DATE" + GIT_WORK_TREE=. export GIT_DIR GIT_WORK_TREE @@ -545,6 +552,8 @@ if [ "$filter_tag_name" ]; then fi unset GIT_DIR GIT_WORK_TREE GIT_INDEX_FILE +unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE +unset GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL GIT_COMMITTER_DATE test -z "$ORIG_GIT_DIR" || { GIT_DIR="$ORIG_GIT_DIR" && export GIT_DIR } @@ -556,6 +565,30 @@ test -z "$ORIG_GIT_INDEX_FILE" || { GIT_INDEX_FILE="$ORIG_GIT_INDEX_FILE" && export GIT_INDEX_FILE } +test -z "$ORIG_GIT_AUTHOR_NAME" || { + GIT_AUTHOR_NAME="$ORIG_GIT_AUTHOR_NAME" && + export GIT_AUTHOR_NAME +} +test -z "$ORIG_GIT_AUTHOR_EMAIL" || { + GIT_AUTHOR_EMAIL="$ORIG_GIT_AUTHOR_EMAIL" && + export GIT_AUTHOR_EMAIL +} +test -z "$ORIG_GIT_AUTHOR_DATE" || { + GIT_AUTHOR_DATE="$ORIG_GIT_AUTHOR_DATE" && + export GIT_AUTHOR_DATE +} +test -z "$ORIG_GIT_COMMITTER_NAME" || { + GIT_COMMITTER_NAME="$ORIG_GIT_COMMITTER_NAME" && + export GIT_COMMITTER_NAME +} +test -z "$ORIG_GIT_COMMITTER_EMAIL" || { + GIT_COMMITTER_EMAIL="$ORIG_GIT_COMMITTER_EMAIL" && + export GIT_COMMITTER_EMAIL +} +test -z "$ORIG_GIT_COMMITTER_DATE" || { + GIT_COMMITTER_DATE="$ORIG_GIT_COMMITTER_DATE" && + export GIT_COMMITTER_DATE +} cd "$orig_dir" rm -rf "$tempdir" -- 2.11.0
[PATCH v3 0/4] filter-branch: support for incremental update + fix for ancient tag format
This is the third version of my patches to add incremental support to git-filter-branch. Since the last time I have replaced `git mktag -- allow-missing-tagger` with `git hash-object -t tag -w --stdin`. I've force pushed to [1] (Travis is still running) and have set off the process of re-rewriting the devicetree tree from scratch (a multi-day affair) to validate (it's looking good). Ian. [0] https://git.kernel.org/pub/scm/linux/kernel/git/devicetree/devicetree-rebasing.git/ [1] https://github.com/ijc/git/tree/git-filter-branch
[PATCH v3 4/4] filter-branch: use hash-object instead of mktag
This allows us to recreate even historical tags which would now be consider invalid, such as v2.6.12-rc2..v2.6.13-rc3 in the Linux kernel source tree which lack the `tagger` header. $ git rev-parse v2.6.12-rc2 9e734775f7c22d2f89943ad6c745571f1930105f $ git cat-file tag v2.6.12-rc2 | git mktag error: char76: could not find "tagger " fatal: invalid tag signature file $ git cat-file tag v2.6.12-rc2 | git hash-object -t tag -w --stdin 9e734775f7c22d2f89943ad6c745571f1930105f Signed-off-by: Ian Campbell --- git-filter-branch.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 956869b8e..3365a3b86 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -561,7 +561,7 @@ if [ "$filter_tag_name" ]; then }' \ -e '/^-BEGIN PGP SIGNATURE-/q' \ -e 'p' ) | - git mktag) || + git hash-object -t tag -w --stdin) || die "Could not create new tag object for $ref" if git cat-file tag "$ref" | \ sane_grep '^-BEGIN PGP SIGNATURE-' >/dev/null 2>&1 -- 2.11.0
[PATCH v3 1/4] filter-branch: reset $GIT_* before cleaning up
This is pure code motion to enable a subsequent patch to add code which needs to happen with the reset $GIT_* but before the temporary directory has been cleaned up. Signed-off-by: Ian Campbell --- git-filter-branch.sh | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 3a74602ef..3da281f8a 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -544,11 +544,6 @@ if [ "$filter_tag_name" ]; then done fi -cd "$orig_dir" -rm -rf "$tempdir" - -trap - 0 - unset GIT_DIR GIT_WORK_TREE GIT_INDEX_FILE test -z "$ORIG_GIT_DIR" || { GIT_DIR="$ORIG_GIT_DIR" && export GIT_DIR @@ -562,6 +557,11 @@ test -z "$ORIG_GIT_INDEX_FILE" || { export GIT_INDEX_FILE } +cd "$orig_dir" +rm -rf "$tempdir" + +trap - 0 + if [ "$(is_bare_repository)" = false ]; then git read-tree -u -m HEAD || exit fi -- 2.11.0
[PATCH v3 3/4] filter-branch: stash away ref map in a branch
With "--state-branch=" option, the mapping from old object names and filtered ones in ./map/ directory is stashed away in the object database, and the one from the previous run is read to populate the ./map/ directory, allowing for incremental updates of large trees. Signed-off-by: Ian Campbell --- I have been using this as part of the device tree extraction from the Linux kernel source since 2013, about time I sent the patch upstream! v2: - added several preceding cleanup patches, including: - new: use of mktag --allow-missing tagger. - split-out: preserving $GIT_*. - use git rev-parse rather than git show-ref. - improved error handling for Perl sub-processes. - collapsed some shell pipelines involving piping output of git and ls into Perl into the Perl scripts. - style fixes for conditionals and sub-shells. - fixup indentation. - added documentation. - improved commit message. --- Documentation/git-filter-branch.txt | 8 +- git-filter-branch.sh| 49 - 2 files changed, 55 insertions(+), 2 deletions(-) diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt index 9e5169aa6..bebdcdec5 100644 --- a/Documentation/git-filter-branch.txt +++ b/Documentation/git-filter-branch.txt @@ -14,7 +14,7 @@ SYNOPSIS [--commit-filter ] [--tag-name-filter ] [--subdirectory-filter ] [--prune-empty] [--original ] [-d ] [-f | --force] - [--] [...] + [--state-branch ] [--] [...] DESCRIPTION --- @@ -198,6 +198,12 @@ to other tags will be rewritten to point to the underlying commit. directory or when there are already refs starting with 'refs/original/', unless forced. +--state-branch :: + This option will cause the mapping from old to new objects to + be loaded from named branch upon startup and saved as a new + commit to that branch upon exit, enabling incremental of large + trees. If '' does not exist it will be created. + ...:: Arguments for 'git rev-list'. All positive refs included by these options are rewritten. You may also specify options diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 9edb94206..956869b8e 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -86,7 +86,7 @@ USAGE="[--setup ] [--env-filter ] [--parent-filter ] [--msg-filter ] [--commit-filter ] [--tag-name-filter ] [--subdirectory-filter ] [--original ] - [-d ] [-f | --force] + [-d ] [-f | --force] [--state-branch ] [--] [...]" OPTIONS_SPEC= @@ -106,6 +106,7 @@ filter_msg=cat filter_commit= filter_tag_name= filter_subdir= +state_branch= orig_namespace=refs/original/ force= prune_empty= @@ -181,6 +182,9 @@ do --original) orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/ ;; + --state-branch) + state_branch="$OPTARG" + ;; *) usage ;; @@ -259,6 +263,26 @@ export GIT_INDEX_FILE # map old->new commit ids for rewriting parents mkdir ../map || die "Could not create map/ directory" +if test -n "$state_branch" +then + state_commit=$(git rev-parse --no-flags --revs-only "$state_branch") + if test -n "$state_commit" + then + echo "Populating map from $state_branch ($state_commit)" 1>&2 + perl -e'open(MAP, "-|", "git show $ARGV[0]:filter.map") or die; + while () { + m/(.*):(.*)/ or die; + open F, ">../map/$1" or die; + print F "$2" or die; + close(F) or die; + } + close(MAP) or die;' "$state_commit" \ + || die "Unable to load state from $state_branch:filter.map" + else + echo "Branch $state_branch does not exist. Will create" 1>&2 + fi +fi + # we need "--" only if there are no path arguments in $@ nonrevs=$(git rev-parse --no-revs "$@") || exit if test -z "$nonrevs" @@ -590,6 +614,29 @@ test -z "$ORIG_GIT_COMMITTER_DATE" || { export GIT_COMMITTER_DATE } +if test -n "$state_branch" +then + echo "Saving rewrite state to $state_branch" 1>&2 + state_blob=$( + perl -e'opendir D, "../map" or die; + open H, "|-", "git hash-object -w --stdin" or die; + foreach (sort readdir(D)) { + next if m/^\.\.?$/; + open F, "<../map/$_" or die; + chomp($f = ); + print H "$_:$f\n" or die; + } + close(H) or die;' || die "Unable to save state") + state_tree=$(/bin/echo -e "100644 blob $state_blob\tfilter.map" | git mktree) + if test -n "$state_commit"
Re: [PATCH v2 13/21] packed_ref_cache: keep the `packed-refs` file mmapped if possible
On 09/20/2017 08:40 PM, Jeff King wrote: > [...] > The overall strategy for this compile-time knob makes sense, but one > thing confused me: > >> +ifdef MMAP_PREVENTS_DELETE >> +BASIC_CFLAGS += -DMMAP_PREVENTS_DELETE >> +else >> +ifdef USE_WIN32_MMAP >> +BASIC_CFLAGS += -DMMAP_PREVENTS_DELETE >> +endif >> +endif > > So setting the knob does what you'd expect. But if you don't set it, > then we still auto-tweak it based on the USE_WIN32_MMAP knob. Do we need > that? It seems like we set our new knob in config.mak.uname any time > we'd set USE_WIN32_MMAP. So this only has an effect in two cases: > > 1. You aren't on Windows, but you set USE_WIN32_MMAP yourself. > > 2. You are on Windows, but you manually unset MMAP_PREVENTS_DELETE. > > I expect both cases are rare (and would probably involve somebody > actively debugging these knobs). Probably it's a minor convenience in > case 1, but in case 2 it would be actively confusing, I'd think. Makes sense. I'll delete the `else` clause from the above hunk. On 09/20/2017 08:51 PM, Jeff King wrote: > On Wed, Sep 20, 2017 at 02:40:58PM -0400, Jeff King wrote: > >>> +enum mmap_strategy { >>> + /* >>> +* Don't use mmap() at all for reading `packed-refs`. >>> +*/ >>> + MMAP_NONE, >>> + >>> + /* >>> +* Can use mmap() for reading `packed-refs`, but the file must >>> +* not remain mmapped. This is the usual option on Windows, >>> +* where you cannot rename a new version of a file onto a file >>> +* that is currently mmapped. >>> +*/ >>> + MMAP_TEMPORARY, >> >> I suspect you originally distinguished these cases so that NO_MMAP does >> not read into a fake-mmap buffer, followed by us copying it into another >> buffer. But AFAICT we handle the "NONE" and "TEMPORARY" cases exactly >> the same (by just doing a read_in_full() into our own buffer). Do we >> actually need separate strategies? > > In case you are reading these sequentially, I think I talked myself into > the utility of this during the next patch. ;) Sorry about that. I'll add a forward breadcrumb to the log message of this commit. Michael
Re: [PATCH v2 14/21] read_packed_refs(): ensure that references are ordered when read
On 09/20/2017 08:50 PM, Jeff King wrote: > On Tue, Sep 19, 2017 at 08:22:22AM +0200, Michael Haggerty wrote: >> If `packed-refs` was already sorted, then (if the system allows it) we >> can use the mmapped file contents directly. But if the system doesn't >> allow a file that is currently mmapped to be replaced using >> `rename()`, then it would be bad for us to keep the file mmapped for >> any longer than necessary. So, on such systems, always make a copy of >> the file contents, either as part of the sorting process, or >> afterwards. > > So this sort-of answers my question from the previous commit (why we > care about the distinction between NONE and TEMPORARY), since we now > start treating them differently. > > But I'm a bit confused why there's any advantage in the TEMPORARY case > to doing the mmap-and-copy versus just treating it like NONE and reading > it directly. > > Hmm, I guess it comes down to the double-allocation thing again? Let me > see if I have this right: > > 1. For NO_MMAP, we'd read the buffer once. If it's sorted, good. If > not, then we temporarily hold it in memory twice while we copy it > into the new sorted buffer. > > 2. For TEMPORARY, we mmap once. If it's sorted, then we make a single > copy. If it's not sorted, then we do the copy+sort as a single > step. > > 3. For MMAP_OK, if it's sorted, we're done. Otherwise, we do the > single copy. > > So this is really there to help the TEMPORARY case reading an old > unsorted file avoid needing to use double-the-ram during the copy? > > That seems like a good reason (and it does not seem to add too much > complexity). Yes, that's correct. A deeper question is whether it's worth this extra effort to optimize the conversion from "unsorted" to "known-sorted", which it seems should only happen once per repository. But * Many more Git commands read the `packed-refs` file than write it. So an "unsorted" file might have to be read multiple times before the first time it is rewritten with the "sorted" trait. * Users might have multiple Git clients writing their `packed-refs` file (e.g., the git CLI plus JGit in Eclipse), and they might not upgrade both clients at the same time to versions that write the "sorted" trait. So a single repository might go back and forth between "unsorted" and "known-sorted" multiple times in its life. * Theoretically, somebody might have a `packed-refs` file that is so big that it takes up more than half of memory. I think there are scenarios where such a file could be converted to "sorted" form in `MMAP_TEMPORARY` mode but would fail in `MMAP_NONE` mode. On the downside, in my benchmarking on Linux, there were hints that the `MMAP_TEMPORARY` branch might be a *tiny* bit slower than the `MMAP_OK` branch when operating on a known-sorted `packed-refs` file. But the speed difference was barely detectable (and might be illusory). And anyway, the speed difference on Linux is moot, since on Linux `MMAP_OK` mode will usually be used. It *would* be interesting to know if there is a significant speed difference on Windows. Dscho? Michael
Re: [RFC v2] refs: strip out not allowed flags from ref_transaction_update
On 09/13/2017 12:59 AM, Thomas Gummerer wrote: > Callers are only allowed to pass certain flags into > ref_transaction_update, other flags are internal to it. To prevent > mistakes from the callers, strip the internal only flags out before > continuing. > > This was noticed because of a compiler warning gcc 7.1.1 issued about > passing a NULL parameter as second parameter to memcpy (through > hashcpy): > > In file included from refs.c:5:0: > refs.c: In function ‘ref_transaction_verify’: > cache.h:948:2: error: argument 2 null where non-null expected > [-Werror=nonnull] > memcpy(sha_dst, sha_src, GIT_SHA1_RAWSZ); > ^~~~ > In file included from git-compat-util.h:165:0, > from cache.h:4, > from refs.c:5: > /usr/include/string.h:43:14: note: in a call to function ‘memcpy’ declared > here > extern void *memcpy (void *__restrict __dest, const void *__restrict __src, > ^~ > > The call to hascpy in ref_transaction_add_update is protected by the s/hascpy/hashcpy/ > passed in flags, but as we only add flags there, gcc notices > REF_HAVE_NEW or REF_HAVE_OLD flags could be passed in from the outside, > which would potentially result in passing in NULL as second parameter to > memcpy. > > Fix both the compiler warning, and make the interface safer for its > users by stripping the internal flags out. > > Suggested-by: Michael Haggerty > Signed-off-by: Thomas Gummerer > --- > >> This might be a nice change to have anyway, to isolate >> `ref_transaction_update()` from mistakes by its callers. For that >> matter, one might want to be even more selective about what bits are >> allowed in the `flags` argument to `ref_transaction_update()`'s >> callers: >> >>> flags &= REF_ALLOWED_FLAGS; /* value would need to be determined */ > > Here's my attempt at doing this. > > The odd flag out as the flag that's advertised as internal but can't > stripped out is REF_ISPRUNING. REF_ISPRUNING is passed in as argument > to 'ref_transaction_delete()' in 'prune_ref()'. > > Maybe this flag should be public, or maybe I'm missing something here? > Having only this internal flags as part of the allowed flags feels a > bit ugly, but I'm also unfamiliar with the refs code, hence the RFC. > If someone has more suggestions they would be very welcome :) I wouldn't worry too much about this anomaly. `REF_ISPRUNING` is an ugly internal kludge, but allowing it in the mask doesn't make anything worse. > refs.c | 2 ++ > refs.h | 8 > 2 files changed, 10 insertions(+) > > diff --git a/refs.c b/refs.c > index ba22f4acef..fad61be1da 100644 > --- a/refs.c > +++ b/refs.c > @@ -921,6 +921,8 @@ int ref_transaction_update(struct ref_transaction > *transaction, > return -1; > } > > + flags &= REF_TRANSACTION_UPDATE_ALLOWED_FLAGS; > + I would advocate considering it a bug if the caller passes in options that we are going to ignore anyway: if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS) BUG("illegal flags %x in ref_transaction_update", flags); Would this also squelch the compiler warning? > flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 0); > > ref_transaction_add_update(transaction, refname, flags, > diff --git a/refs.h b/refs.h > index 6daa78eb50..4d75c207e1 100644 > --- a/refs.h > +++ b/refs.h > @@ -354,6 +354,14 @@ int refs_pack_refs(struct ref_store *refs, unsigned int > flags); > #define REF_NODEREF 0x01 > #define REF_FORCE_CREATE_REFLOG 0x40 > > +/* > + * Flags that can be passed in to ref_transaction_update > + */ > +#define REF_TRANSACTION_UPDATE_ALLOWED_FLAGS \ > + REF_ISPRUNING | \ > + REF_FORCE_CREATE_REFLOG |\ > + REF_NODEREF > + > /* > * Setup reflog before using. Fill in err and return -1 on failure. > */ > Thanks for working on this. Michael
Re: [PATCH] fast-export: do not copy from modified file
Thank you, Jonathan! On Thu, Sep 21, 2017 at 1:55 AM, Jonathan Tan wrote: > When run with the "-C" option, fast-export writes 'C' commands in its > output whenever the internal diff mechanism detects a file copy, > indicating that fast-import should copy the given existing file to the > given new filename. However, the diff mechanism works against the > prior version of the file, whereas fast-import uses whatever is current. > This causes issues when a commit both modifies a file and uses it as the > source for a copy. > > Therefore, teach fast-export to refrain from writing 'C' when it has > already written a modification command for a file. > > An existing test in t9350-fast-export is also fixed in this patch. The > existing line "C file6 file7" copies the wrong version of file6, but it > has coincidentally worked because file7 was subsequently overridden. > > Reported-by: Juraj Oršulić > Signed-off-by: Jonathan Tan > --- > I tested this with the reproduction commands given by Juraj Oršulić and > it works. > > I also have a version that prints the redundant 'C' (and does not > require the change to t9350), omitting it only if it would cause a wrong > result. That seems imprecise to me, but I can send that out if the > redundant 'C' is preferred. > --- > builtin/fast-export.c | 46 -- > t/t9350-fast-export.sh | 20 +++- > 2 files changed, 51 insertions(+), 15 deletions(-) > > diff --git a/builtin/fast-export.c b/builtin/fast-export.c > index d412c0a8f..da42ee5e6 100644 > --- a/builtin/fast-export.c > +++ b/builtin/fast-export.c > @@ -344,6 +344,7 @@ static void show_filemodify(struct diff_queue_struct *q, > struct diff_options *options, void *data) > { > int i; > + struct string_list *changed = data; > > /* > * Handle files below a directory first, in case they are all deleted > @@ -359,20 +360,31 @@ static void show_filemodify(struct diff_queue_struct *q, > case DIFF_STATUS_DELETED: > printf("D "); > print_path(spec->path); > + string_list_insert(changed, spec->path); > putchar('\n'); > break; > > case DIFF_STATUS_COPIED: > case DIFF_STATUS_RENAMED: > - printf("%c ", q->queue[i]->status); > - print_path(ospec->path); > - putchar(' '); > - print_path(spec->path); > - putchar('\n'); > - > - if (!oidcmp(&ospec->oid, &spec->oid) && > - ospec->mode == spec->mode) > - break; > + /* > +* If a change in the file corresponding to > ospec->path > +* has been observed, we cannot trust its contents > +* because the diff is calculated based on the prior > +* contents, not the current contents. So, declare a > +* copy or rename only if there was no change > observed. > +*/ > + if (!string_list_has_string(changed, ospec->path)) { > + printf("%c ", q->queue[i]->status); > + print_path(ospec->path); > + putchar(' '); > + print_path(spec->path); > + string_list_insert(changed, spec->path); > + putchar('\n'); > + > + if (!oidcmp(&ospec->oid, &spec->oid) && > + ospec->mode == spec->mode) > + break; > + } > /* fallthrough */ > > case DIFF_STATUS_TYPE_CHANGED: > @@ -393,6 +405,7 @@ static void show_filemodify(struct diff_queue_struct *q, >get_object_mark(object)); > } > print_path(spec->path); > + string_list_insert(changed, spec->path); > putchar('\n'); > break; > > @@ -528,7 +541,8 @@ static void anonymize_ident_line(const char **beg, const > char **end) > *end = out->buf + out->len; > } > > -static void handle_commit(struct commit *commit, struct rev_info *rev) > +static void handle_commit(struct commit *commit, struct rev_info *rev, > + struct string_list *paths_of_changed_objects) > { > int saved_output_format = rev->diffopt.output_format; > const char *commit_buffer; > @@ -615,6 +629,7 @@ static void handle_commit(struct commit *commit, struct > rev_info *rev) > if (full_tree) > printf("deleteal
Re: [PATCH 2/3] merge-base: return fork-point outside reflog
Junio C Hamano venit, vidit, dixit 21.09.2017 08:27: > Junio C Hamano writes: > >> ... I agree that there is a value in what your patch 2/3 >> wants to do when the current one that is more strict would say >> "there is no known fork-point"---we would gain a way to say "... but >> this is the best guess based on available data that may be better >> than getting no answer." which we lack. >> >> Having said all that, I do not agree with your last sentence in the >> paragraph I quoted above. It is a mere implementation detail to >> consult the reflog to find out the set of "historical tips of the >> Branch"; the current tip by definition is among the commits in that >> set, even when the reflog of Branch is missing. What 4f21454b55 did >> was a reasonable "fix" that is still in line with the definition of >> "--fork-point" from that point of view. >> >> Whether we add a "looser" version of "--fork-point" to the system or >> not, the more strict version should still use the current tip as one >> of the historical tips (i.e. those that we would take from the >> reflog if the reflog were not empty) in the more "strict" mode. The >> looser version may also want to do so as well. > > So, should I mark this in What's cooking report as "expecting a > reroll", anticipating that a new option would be added to trigger > the new & looser behaviour? > I dunno. Some participants in this thread considered my patch to be a fix rather than alternative behaviour. So I hoped for more responses to your response. (Re-adding dscho on cc - our thread graph forked...) Also, I'm undecided about about your reflog argument above - if we leave "--fork-point" to be the current behaviour including Jeff's fix then the documentation would need an even bigger overhaul, because it's neither "reflog also" (as claimed in the doc) nor "reflog only" (as in the original implementation) but "historical tips as inferred from the current value and the reflog". In any case, for two modes we need two names for the options. Maybe --fork-point and --fork-base because in the loose mode, you may get a "base of a strict fork point"? Michael
[PATCH 0/2] Two small comment fixes.
Han-Wen Nienhuys (2): Fix typo in submodule.h Document the string_list structure string-list.h | 6 ++ submodule.h | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) -- 2.14.1.821.g8fa685d3b7-goog
[PATCH 2/2] Document the string_list structure
Signed-off-by: Han-Wen Nienhuys --- string-list.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/string-list.h b/string-list.h index 29bfb7ae4..08b534166 100644 --- a/string-list.h +++ b/string-list.h @@ -8,6 +8,12 @@ struct string_list_item { typedef int (*compare_strings_fn)(const char *, const char *); +/* A resizable array of strings. The strings are owned if + * 'strdup_strings' is set. It can be used as a sorted array, and a + * custom comparison may be given in 'cmp'. The field 'items[i].util' + * may be used to implement an array of pairs. In that case, the + * caller is responsible for managing memory pointed to by 'util'. + */ struct string_list { struct string_list_item *items; unsigned int nr, alloc; -- 2.14.1.821.g8fa685d3b7-goog
[PATCH 1/2] Fix typo in submodule.h
Signed-off-by: Han-Wen Nienhuys --- submodule.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/submodule.h b/submodule.h index 6b52133c8..f0da0277a 100644 --- a/submodule.h +++ b/submodule.h @@ -120,7 +120,7 @@ extern int submodule_move_head(const char *path, /* * Prepare the "env_array" parameter of a "struct child_process" for executing - * a submodule by clearing any repo-specific envirionment variables, but + * a submodule by clearing any repo-specific environment variables, but * retaining any config in the environment. */ extern void prepare_submodule_repo_env(struct argv_array *out); -- 2.14.1.821.g8fa685d3b7-goog
Re: [PATCH v7 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
On 9/20/2017 10:24 PM, Ben Peart wrote: On 9/20/2017 10:00 PM, Junio C Hamano wrote: Ben Peart writes: Pretty much the same places you would also use CE_MATCH_IGNORE_VALID and CE_MATCH_IGNORE_SKIP_WORKTREE which serve the same role for those features. That is generally when you are about to overwrite data so want to be *really* sure you have what you think you have. Now that makes me worried gravely. IGNORE_VALID is ignored in these places because we have been burned by end-users lying to us. IGNORE_SKIP_WORKTREE must be ignored because we know that the working tree state does not match the "reality" the index wants to have. The fact that the code treats the status reported and kept up to date by fsmonitor the same way as these two implies that it is merely advisory and cannot be trusted? Is that the reason why we tell the codepath with IGNORE_FSMONITOR to ignore the state fsmonitor reported and check the state ourselves? Sorry for causing unnecessary worry. The fsmonitor data can be trusted (as much as you can trust that Watchman or your file system monitor is not buggy). I wasn't 100% sure *why* these places passed the various IGNORE_VALID and IGNORE_SKIP_WORKTREE flags. When I looked at them, that lack of trust seemed to be the reason. Adding IGNORE_FSMONITOR in those same places was simply an abundance of caution on my part. The only down side of passing the flag for fsmonitor is that we will end up calling lstat() on a file where we technically didn't need too. That seemed safer than potentially missing a change if I had misunderstood the code. I'd much rather return correct results (and fall back to the old performance) than potentially be incorrect. I followed that same principal in the entire design of fsmonitor - if anything doesn't look right, fall back to the old code path just in case... I spent some time with git blame/show trying to figure out the *why* for all the places CE_MATCH_IGNORE_* are passed without gaining a lot of additional understanding. Based on your description above of why these exist, I believe there are very few places we actually need to pass CE_MATCH_IGNORE_FSMONITOR and that I was being overly cautious. Here is a patch that removes the unnecessary CE_MATCH_IGNORE_FSMONITOR instances. While the test suite passes with this change, I'm not 100% confident that we actually have test cases that would have detected all the places that we needed the CE_MATCH_IGNORE_* flags. If this seems like a reasonable additional optimization to make, I can roll it into the next iteration of the patch series as I have some spelling, documentation changes and other tweaks as a result of all the feedback. From 6ff7ed0467fd736dca73efe62391bb3ee9b4e771 Mon Sep 17 00:00:00 2001 From: Ben Peart Date: Thu, 21 Sep 2017 09:09:42 -0400 Subject: [PATCH] fsmonitor: remove unnecessary uses of CE_MATCH_IGNORE_FSMONITOR With a better understanding of *why* the CE_MATCH_IGNORE_* flags are used, it is now more clear they are not required in most cases where CE_MATCH_IGNORE_FSMONITOR was being passed out of an abundance of caution. Since the fsmonitor data can be trusted and is kept in sync with the working directory, the only remaining valid uses are those locations where we don't want to trigger an unneeded refresh_fsmonitor() call. One is where preload_index() is doing a fast precompute of state for the bulk of the index entries but is not required for correctness as refresh_cache_ent() will ensure any "missed" by preload_index() are up-to-date if/when they are needed. The second is in is_staging_gitmodules_ok() where we don't want to trigger a complete refresh just to check the .gitignore file. The net result of this change will be that there are more cases where we will be able to use the cached index state and avoid unnecessary lstat() calls. Signed-off-by: Ben Peart --- apply.c| 2 +- entry.c| 2 +- read-cache.c | 4 ++-- unpack-trees.c | 6 +++--- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/apply.c b/apply.c index 9061cc5f15..71cbbd141c 100644 --- a/apply.c +++ b/apply.c @@ -3399,7 +3399,7 @@ static int verify_index_match(const struct cache_entry *ce, struct stat *st) return -1; return 0; } - return ce_match_stat(ce, st, CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE|CE_MATCH_IGNORE_FSMONITOR); + return ce_match_stat(ce, st, CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE); } #define SUBMODULE_PATCH_WITHOUT_INDEX 1 diff --git a/entry.c b/entry.c index 5e6794f9fc..3a7b667373 100644 --- a/entry.c +++ b/entry.c @@ -404,7 +404,7 @@ int checkout_entry(struct cache_entry *ce, if (!check_path(path.buf, path.len, &st, state->base_dir_len)) { const struct submodule *sub; - unsigned changed = ce_match_stat(ce, &st, CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE|CE_MATCH_IGNORE_FSMONITOR); + unsigned changed =
[GSoC][PATCH v4 1/4] submodule--helper: introduce get_submodule_displaypath()
LGTM with nits. +static char *get_submodule_displaypath(const char *path, const char *prefix) this could do with a comment /* the result should be freed by the caller. */ + } else if (super_prefix) { + int len = strlen(super_prefix); + const char *format = is_dir_sep(super_prefix[len - 1]) ? "%s%s" : "%s/%s"; what if len == 0? The handling of '/' looks like a change from the original.
[GSoC][PATCH v4 3/4] submodule: port set_name_rev() from shell to C
LGTM with nits commit message: "revision name, and later handles its formating and printing." typo: formatting + if (!capture_command(&cp, &sb, 0) && sb.len) { + strbuf_strip_suffix(&sb, "\n"); + return strbuf_detach(&sb, NULL); + } you discard all output if these commands fail, so if the argument is a not a submodule, or the other is not a sha1, it will just print nothing without error message. Maybe that is OK, though? I don't see documentation for these commands, so maybe this is not meant to be usable?
Re: [PATCH v2] for_each_string_list_item: avoid undefined behavior for empty list
On Sep 21 2017, Junio C Hamano wrote: > Now you make me curious. How would that variant be different from > option C. in Jonathan's message? Only in the parity of the condition. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
[GSoC][PATCH v4 4/4] submodule: port submodule subcommand 'status' from shell to C
+ const char *const git_submodule_helper_usage[] = { + N_("git submodule status [--quiet] [--cached] [--recursive] []"), + NULL the manpage over here says git submodule [--quiet] status [--cached] [--recursive] [--] [...] ie. multiple path arguments. Should this usage string be tweaked? +static void print_status(struct status_cb *info, char state, const char *path, +const struct object_id *oid, const char *displaypath) +{ could do with a comment. What are the options for the `state` char? + if (state == ' ' || state == '+') { + struct argv_array get_rev_args = ARGV_ARRAY_INIT; + + argv_array_pushl(&get_rev_args, "get-rev-name", +path, oid_to_hex(oid), NULL); + get_rev_name(get_rev_args.argc, get_rev_args.argv, +info->prefix); since you're not really subprocessing, can't you simply have a do_print_rev_name(char *path, char *sha) { .. printf("\n"); } and call that directly? Or call compute_rev_name directly. Then you don't have to do argv setup here. Also, the name get_rev_name() is a little unfortunate, since it doesn't return a name, but rather prints it. Maybe the functions implementing helper commands could be named like: command_get_rev_name or similar.
Re: [PATCH 2/2] Document the string_list structure
On Thu, Sep 21, 2017 at 02:43:38PM +0200, Han-Wen Nienhuys wrote: > Signed-off-by: Han-Wen Nienhuys > --- > string-list.h | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/string-list.h b/string-list.h > index 29bfb7ae4..08b534166 100644 > --- a/string-list.h > +++ b/string-list.h > @@ -8,6 +8,12 @@ struct string_list_item { > > typedef int (*compare_strings_fn)(const char *, const char *); > > +/* A resizable array of strings. The strings are owned if > + * 'strdup_strings' is set. It can be used as a sorted array, and a > + * custom comparison may be given in 'cmp'. The field 'items[i].util' > + * may be used to implement an array of pairs. In that case, the > + * caller is responsible for managing memory pointed to by 'util'. > + */ > struct string_list { > struct string_list_item *items; > unsigned int nr, alloc; There's a considerable amount of documentation for string-list in Documentation/technical/api-string-list.txt. Perhaps we should look at migrating it into string-list.h, where it's more likely to be found (and kept up to date). We did something similar for strbuf.h a while back, and I think the result is much better. See the commits in: git log bdfdaa4978^..d468fa2721 for an example. -Peff
[PATCH 0/4] avoid some -Wsign-compare warnings
This series removes 687 '-Wsign-compare' warnings when applied to the current master branch (@ commit 59c0ea183). Using gcc v5.4.0 and adding '-Wextra' to the compilation flags, we can summarize the warnings as follows: $ grep warning out1 | sed -e 's/.*\[/\[/' | sort | uniq -c | sort -rn 1437 [-Wunused-parameter] 1410 [-Wsign-compare] 670 [-Wmissing-field-initializers] 7 [-Wempty-body] $ After applying this series, we see: $ grep warning out2 | sed -e 's/.*\[/\[/' | sort | uniq -c | sort -rn 1437 [-Wunused-parameter] 723 [-Wsign-compare] 670 [-Wmissing-field-initializers] 7 [-Wempty-body] $ The number of -Wunused-parameter warnings is not as bad as it seems; for example, we can get rid of 690 such warning with the following patch: $ git diff diff --git a/git-compat-util.h b/git-compat-util.h index cedad4d58..8e7388082 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -118,6 +118,9 @@ /* Approximation of the length of the decimal representation of this type. */ #define decimal_length(x)((int)(sizeof(x) * 2.56 + 0.5) + 1) +/* suppress 'unused parameter' warnings */ +#define UNUSED(x) ((void)(x)) + #if defined(__sun__) /* * On Solaris, when _XOPEN_EXTENDED is set, its header file @@ -341,6 +344,7 @@ typedef uintmax_t timestamp_t; #ifndef has_dos_drive_prefix static inline int git_has_dos_drive_prefix(const char *path) { + UNUSED(path); return 0; } #define has_dos_drive_prefix git_has_dos_drive_prefix @@ -349,6 +353,7 @@ static inline int git_has_dos_drive_prefix(const char *path) #ifndef skip_dos_drive_prefix static inline int git_skip_dos_drive_prefix(char **path) { + UNUSED(path); return 0; } #define skip_dos_drive_prefix git_skip_dos_drive_prefix $ $ grep warning out3 | sed -e 's/.*\[/\[/' | sort | uniq -c | sort -rn 747 [-Wunused-parameter] 723 [-Wsign-compare] 670 [-Wmissing-field-initializers] 7 [-Wempty-body] $ The original version of the UNUSED macro used the gcc __unused__ attribute, but that caused some issues with msvc IIRC, so this version has been simplified. I have been meaning to check that this does not cause any bloat in the git executable (with all optimisation levels), so I haven't submitted it before. What do you think? These patches reduce the error count quite a bit, without touching too many files, but additional patches may have to be batched up and submitted over several releases, viz: $ grep warning out2 | cut -d: -f1 | sort | uniq | wc -l 264 $ Ramsay Jones (4): git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings commit-slab.h: avoid -Wsign-compare warnings cache.h: hex2chr() - avoid -Wsign-compare warnings ALLOC_GROW: avoid -Wsign-compare warnings builtin/pack-objects.c | 4 ++-- cache.h| 4 ++-- commit-slab.h | 6 +++--- config.c | 2 +- diff.c | 2 +- git-compat-util.h | 6 -- line-log.c | 18 +- line-log.h | 2 +- revision.c | 2 +- tree-walk.c| 3 +-- 10 files changed, 25 insertions(+), 24 deletions(-) -- 2.14.0
[PATCH 1/4] git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings
Signed-off-by: Ramsay Jones --- git-compat-util.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/git-compat-util.h b/git-compat-util.h index 9bc15b036..cedad4d58 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -898,9 +898,11 @@ static inline char *xstrdup_or_null(const char *str) static inline size_t xsize_t(off_t len) { - if (len > (size_t) len) + size_t size = (size_t) len; + + if (len != (off_t) size) die("Cannot handle files this big"); - return (size_t)len; + return size; } __attribute__((format (printf, 3, 4))) -- 2.14.0
[PATCH 2/4] commit-slab.h: avoid -Wsign-compare warnings
Signed-off-by: Ramsay Jones --- commit-slab.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/commit-slab.h b/commit-slab.h index 333d81e37..dcaab8ca0 100644 --- a/commit-slab.h +++ b/commit-slab.h @@ -78,7 +78,7 @@ static MAYBE_UNUSED void init_ ##slabname(struct slabname *s) \ \ static MAYBE_UNUSED void clear_ ##slabname(struct slabname *s) \ { \ - int i; \ + unsigned int i; \ for (i = 0; i < s->slab_count; i++) \ free(s->slab[i]); \ s->slab_count = 0; \ @@ -89,13 +89,13 @@ static MAYBE_UNUSED elemtype *slabname## _at_peek(struct slabname *s, \ const struct commit *c, \ int add_if_missing) \ { \ - int nth_slab, nth_slot; \ + unsigned int nth_slab, nth_slot;\ \ nth_slab = c->index / s->slab_size; \ nth_slot = c->index % s->slab_size; \ \ if (s->slab_count <= nth_slab) {\ - int i; \ + unsigned int i; \ if (!add_if_missing)\ return NULL;\ REALLOC_ARRAY(s->slab, nth_slab + 1); \ -- 2.14.0
[PATCH 3/4] cache.h: hex2chr() - avoid -Wsign-compare warnings
Signed-off-by: Ramsay Jones --- cache.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/cache.h b/cache.h index a916bc79e..a0e3e362c 100644 --- a/cache.h +++ b/cache.h @@ -1243,8 +1243,8 @@ static inline unsigned int hexval(unsigned char c) */ static inline int hex2chr(const char *s) { - int val = hexval(s[0]); - return (val < 0) ? val : (val << 4) | hexval(s[1]); + unsigned int val = hexval(s[0]); + return (val & ~0xf) ? val : (val << 4) | hexval(s[1]); } /* Convert to/from hex/sha1 representation */ -- 2.14.0
[PATCH 4/4] ALLOC_GROW: avoid -Wsign-compare warnings
Signed-off-by: Ramsay Jones --- builtin/pack-objects.c | 4 ++-- config.c | 2 +- diff.c | 2 +- line-log.c | 18 +- line-log.h | 2 +- revision.c | 2 +- tree-walk.c| 3 +-- 7 files changed, 16 insertions(+), 17 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index a57b4f058..a6ee653bf 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2563,8 +2563,8 @@ struct in_pack_object { }; struct in_pack { - int alloc; - int nr; + unsigned int alloc; + unsigned int nr; struct in_pack_object *array; }; diff --git a/config.c b/config.c index cd5a69e63..aeab02c06 100644 --- a/config.c +++ b/config.c @@ -2200,7 +2200,7 @@ static struct { size_t *offset; unsigned int offset_alloc; enum { START, SECTION_SEEN, SECTION_END_SEEN, KEY_SEEN } state; - int seen; + unsigned int seen; } store; static int matches(const char *key, const char *value) diff --git a/diff.c b/diff.c index ea7e5978b..be94ad4f4 100644 --- a/diff.c +++ b/diff.c @@ -1541,7 +1541,7 @@ static void emit_rewrite_diff(const char *name_a, struct diff_words_buffer { mmfile_t text; - long alloc; + unsigned long alloc; struct diff_words_orig { const char *begin, *end; } *orig; diff --git a/line-log.c b/line-log.c index ab0709f9a..545ad0f28 100644 --- a/line-log.c +++ b/line-log.c @@ -90,7 +90,7 @@ static int range_cmp(const void *_r, const void *_s) */ static void range_set_check_invariants(struct range_set *rs) { - int i; + unsigned int i; if (!rs) return; @@ -110,8 +110,8 @@ static void range_set_check_invariants(struct range_set *rs) */ void sort_and_merge_range_set(struct range_set *rs) { - int i; - int o = 0; /* output cursor */ + unsigned int i; + unsigned int o = 0; /* output cursor */ QSORT(rs->ranges, rs->nr, range_cmp); @@ -144,7 +144,7 @@ void sort_and_merge_range_set(struct range_set *rs) static void range_set_union(struct range_set *out, struct range_set *a, struct range_set *b) { - int i = 0, j = 0; + unsigned int i = 0, j = 0; struct range *ra = a->ranges; struct range *rb = b->ranges; /* cannot make an alias of out->ranges: it may change during grow */ @@ -186,7 +186,7 @@ static void range_set_union(struct range_set *out, static void range_set_difference(struct range_set *out, struct range_set *a, struct range_set *b) { - int i, j = 0; + unsigned int i, j = 0; for (i = 0; i < a->nr; i++) { long start = a->ranges[i].start; long end = a->ranges[i].end; @@ -397,7 +397,7 @@ static void diff_ranges_filter_touched(struct diff_ranges *out, struct diff_ranges *diff, struct range_set *rs) { - int i, j = 0; + unsigned int i, j = 0; assert(out->target.nr == 0); @@ -426,7 +426,7 @@ static void range_set_shift_diff(struct range_set *out, struct range_set *rs, struct diff_ranges *diff) { - int i, j = 0; + unsigned int i, j = 0; long offset = 0; struct range *src = rs->ranges; struct range *target = diff->target.ranges; @@ -873,7 +873,7 @@ static char *output_prefix(struct diff_options *opt) static void dump_diff_hacky_one(struct rev_info *rev, struct line_log_data *range) { - int i, j = 0; + unsigned int i, j = 0; long p_lines, t_lines; unsigned long *p_ends = NULL, *t_ends = NULL; struct diff_filepair *pair = range->pair; @@ -906,7 +906,7 @@ static void dump_diff_hacky_one(struct rev_info *rev, struct line_log_data *rang long t_start = range->ranges.ranges[i].start; long t_end = range->ranges.ranges[i].end; long t_cur = t_start; - int j_last; + unsigned int j_last; while (j < diff->target.nr && diff->target.ranges[j].end < t_start) j++; diff --git a/line-log.h b/line-log.h index 7a5c24e2d..e2a5ee7c6 100644 --- a/line-log.h +++ b/line-log.h @@ -14,7 +14,7 @@ struct range { /* A set of ranges. The ranges must always be disjoint and sorted. */ struct range_set { - int alloc, nr; + unsigned int alloc, nr; struct range *ranges; }; diff --git a/revision.c b/revision.c index f9a90d71d..c8c9cb32c 100644 --- a/revision.c +++ b/revision.c @@ -1105,7 +1105,7 @@ static void add_rev_cmdline(struct rev_info *revs, unsigned flags) { struct rev_cmdline_info *info = &revs->cmdline; - int nr = info->nr; + unsigned int nr = info->nr
Re: [PATCH v2 02/21] prefix_ref_iterator: break when we leave the prefix
On Wed, Sep 20, 2017 at 9:59 PM, Jeff King wrote: >> But this compare function is not to order by the natural encoding order, >> but it's used to detect the '0' at the end of prefix, which orders >> before *any* unsigned char. > > It's not just detecting the "0". We care about the ordering overall (so > that "refs/foo" comes after "refs/bar", and we know that "refs/bar/baz" > cannot come after "refs/foo", and we can stop iterating). refs/{foo,bar,bar/baz} is all ASCII, such that the ordering by byte value and byte position (2nd order) orders 'correctly'. correct in the sense as Git expects. But if you use other encodings, the natural encoding may differ from the arbitrary order that we have here. (Example utf-8 with BOM and smileys) However these different orders do not matter (according to my initial conclusion), because we need (a) find out about '\0' and (b) only need only 'arbitrary' order (which is referred to by the " ordered" trait). >> Essentially compare_prefix wants to provide the same return >> value as `strncmp(refname, prefix, min(strlen(refname), strlen(prefix)));` >> except that it is optimized as we do not have to walk over a string >> multiple times (to determine length and then pass it to compare). > > Hmm, yeah, I think that would be an equivalent. I didn't think of that, > but as you say it would be less efficient. I was just too lazy to check if we had the results of strlen around already, such that the comparison would be equally cheap, but more readable. Also IIRC later patches enhance this function, so we shall not make use of the strncmp function here. > The patch is credited to me, but I actually screwed up the ordering by > failing to do the unsigned cast. Michael fixed that part before posting > it. :) Thanks, Stefan
Re: [PATCH 2/2] Document the string_list structure
On 09/21, Han-Wen Nienhuys wrote: > Signed-off-by: Han-Wen Nienhuys > --- > string-list.h | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/string-list.h b/string-list.h > index 29bfb7ae4..08b534166 100644 > --- a/string-list.h > +++ b/string-list.h > @@ -8,6 +8,12 @@ struct string_list_item { > > typedef int (*compare_strings_fn)(const char *, const char *); > > +/* A resizable array of strings. The strings are owned if nit: the start of the comment block should be on its own line: /* * A resizable ... > + * 'strdup_strings' is set. It can be used as a sorted array, and a > + * custom comparison may be given in 'cmp'. The field 'items[i].util' > + * may be used to implement an array of pairs. In that case, the > + * caller is responsible for managing memory pointed to by 'util'. > + */ The util field can be freed if 'free_util' is set when calling string_list_clear() but yeah, in general the caller is managing that memory especially if a complex free function is required (in the case of allocating a struct with dynamically allocated fields to be stored in the util feild). Just pointing that out, and I'm happy with this and the previous patch. > struct string_list { > struct string_list_item *items; > unsigned int nr, alloc; > -- > 2.14.1.821.g8fa685d3b7-goog > -- Brandon Williams
Re: RFC: Design and code of partial clones (now, missing commits and trees OK)
There's a lot in this patch series. I'm still studying it, but here are some notes and questions. I'll start with direct responses to the RFC here and follow up in a second email with specific questions and comments to keep this from being too long). On 9/15/2017 4:43 PM, Jonathan Tan wrote: For those interested in partial clones and/or missing objects in repos, I've updated my original partialclone patches to not require an explicit list of promises. Fetch/clone still only permits exclusion of blobs, but the infrastructure is there for a local repo to support missing trees and commits as well. They can be found here: https://github.com/jonathantanmy/git/tree/partialclone2 To make the number of patches more manageable, I have omitted support for a custom fetching hook (but it can be readded in fetch-object.c), and only support promisor packfiles for now (but I don't have any objection to supporting promisor loose objects in the future). Let me know what you think of the overall approach. In particular, I'm still wondering if there is a better approach than to toggle "fetch_if_missing" whenever we need lazy fetching (or need to suppress it). Also, if there any patches that you think might be useful to others, let me know and I'll send them to this mailing list for review. A demo and an overview of the design (also available from that repository's README): Demo Obtain a repository. $ make prefix=$HOME/local install $ cd $HOME/tmp $ git clone https://github.com/git/git Make it advertise the new feature and allow requests for arbitrary blobs. $ git -C git config uploadpack.advertiseblobmaxbytes 1 $ git -C git config uploadpack.allowanysha1inwant 1 Perform the partial clone and check that it is indeed smaller. Specify "file://" in order to test the partial clone mechanism. (If not, Git will perform a local clone, which unselectively copies every object.) $ git clone --blob-max-bytes=10 "file://$(pwd)/git" git2 $ git clone "file://$(pwd)/git" git3 $ du -sh git2 git3 116M git2 129M git3 Observe that the new repo is automatically configured to fetch missing objects from the original repo. Subsequent fetches will also be partial. $ cat git2/.git/config [core] repositoryformatversion = 1 filemode = true bare = false logallrefupdates = true [remote "origin"] url = [snip] fetch = +refs/heads/*:refs/remotes/origin/* blobmaxbytes = 10 [extensions] partialclone = origin [branch "master"] remote = origin merge = refs/heads/master I like that git-clone saves the partial clone settings in the .git/config. This should make it easier for subsequent commands to default to the right settings. Do we allow a partial-fetch following a regular clone (such that git-fetch would create these config file settings)? This seems like a reasonable upgrade path for a user with an existing repo to take advantage of partial fetches going forward. Or do we require that git-fetch only be allowed to do partial-fetches after a partial-clone (so that only clone creates these settings) and fetch always does partial-fetches thereafter? It might be useful to allow fetch to do a full fetch on top of a partial-clone, but there are probably thin-pack issues to work out first. Also, there is an assumption here that the user will want to keep using the same filtering settings on subsequent fetches. That's probably fine for now and until we get a chance to try it out for real. Unlike in an older version of this code (see the `partialclone` branch), this also works with the HTTP/HTTPS protocols. Design == Local repository layout --- A repository declares its dependence on a *promisor remote* (a remote that declares that it can serve certain objects when requested) by a repository extension "partialclone". `extensions.partialclone` must be set to the name of the remote ("origin" in the demo above). Do we allow EXACTLY ONE promisor-remote? That is, can we later add another remote as a promisor-remote? And then do partial-fetches from either? Do we need to disallow removing or altering a remote that is listed as a promisor-remote? I think for now, one promisor-remote is fine. Just asking. Changing a remote's URL might be fine, but deleting the promisor-remote would put the user in a weird state. We don't need to worry about it now though. A packfile can be annotated as originating from the promisor remote by the existence of a "(packfile name).promisor" file with arbitrary contents (similar to the ".keep" file). Whenever a promisor remote sends an object, it declares that it can serve every object directly or indirectly referenced by the sent object. A promisor packfile is a packfile annotated with the ".promisor" file. A promisor object is an object in a promisor packfile. A promised object is an object dire
commit-msg hook with verbose commit flag
When using the --verbose flag on git-commit with a commit-msg hook the contents of the diff are passed to the commit-msg hook. I would have expected the diff to have been removed since it's below the scissor line. Since the scissor line is removed and not visible to commit-msg there doesn't seem to be a great way to ignore the diff contents either.
Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 2/3)
(part 2) Additional overall comments on: https://github.com/jonathantanmy/git/commits/partialclone2 {} I think it would help to split the blob-max-bytes filtering and the promisor/promised concepts and discuss them independently. {} Then we can talk about about the promisor/promised functionality independent of any kind of filter. The net-net is that the client has missing objects and it doesn't matter what filter criteria or mechanism caused that to happened. {} blob-max-bytes is but one such filter we should have. This might be sufficient if the goal is replace LFS (where you rarely ever need any given very very large object) and dynamically loading them as needed is sufficient and the network round-trip isn't too much of a perf penalty. {} But if we want to do things like a "sparse-enlistments" where the client only needs a small part of the tree using sparse-checkout. For example, only populating 50,000 files from a tree of 3.5M files at HEAD, then we need a more general filtering. {} And as I said above, how we chose to filter should be independent of how the client handles promisor/promised objects. {} Also, if we rely strictly on dynamic object fetching to fetch missing objects, we are effectively tethered to the server during operations (such as checkout) that the user might not think about as requiring a network connection. And we are forced to keep the same limitations of LFS in that you can't prefetch and go offline (without actually checking out to your worktree first). And we can't bulk or parallel fetch objects. {} I think it would also help to move the blob-max-bytes calculation out of pack-objects.c : add_object_entry() [1]. The current code isolates the computation there so that only pack-objects can do the filtering. Instead, put it in list-objects.c and traverse_commit_list() so that pack-objects and rev-list can share it (as Peff suggested [2] in response to my first patch series in March). For example, this would let the client have a pre-checkout hook, use rev-list to compute the set of missing objects needed for that commit, and pipe that to a command to BULK fetch them from the server BEFORE starting the actual checkout. This would allow the savy user to manually run a prefetch before going offline. [1] https://github.com/jonathantanmy/git/commit/68e529484169f4800115c5a32e0904c25ad14bd8#diff-a8d2c9cf879e775d748056cfed48440cR1110 [2] https://public-inbox.org/git/20170309073117.g3br5btsfwntc...@sigill.intra.peff.net/ {} This also locks us into size-only filtering and makes it more difficult to add other filters. In that the add_object_entry() code gets called on an object after the traversal has decided what to do with it. It would be difficult to add tree-trimming at this level, for example. {} An early draft of this type of filtering is here [3]. I hope to push up a revised draft of this shortly. [3] https://public-inbox.org/git/20170713173459.3559-1-...@jeffhostetler.com/ Thanks, Jeff
Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 3)
(part 3) Additional overall comments on: https://github.com/jonathantanmy/git/commits/partialclone2 {} WRT the code in is_promised() [1] [1] https://github.com/jonathantanmy/git/commit/7a9c2d9b6e2fce293817b595dee29a7eede0#diff-5d5d5dc185ef37dc30bb7d9a7ae0c4e8R1960 {} it looked like it was adding ALL promisor- and promised-objects to the "promised" OIDSET, rather than just promised-objects, but I could be mistaken. {} Is this iterating over ALL promisor-packfiles? {} It looked like this was being used by fsck and rev-list. I have concerns about how big this OIDSET will get and how it will scale, since if we start with a partial-clone all packfiles will be promisor-packfiles. {} When iterating thru a tree object, you add everything that it references (everything in that folder). This adds all of the child OIDs -- without regard to whether they are new to this version of the tree object. (Granted, this is hard to compute.) My concern is that this will add too many objects to the OIDSET. That is, a new tree object (because of a recent change to something in that folder) will also have the OIDs of the other *unchanged* files which may be present in an earlier non-provisor packfile from an earlier commit. I worry that this will grow the OIDSET to essentially include everything. And possibly defeating its own purpose. I could be wrong here, but that's my concern. {} I'm not opposed to the .promisor file concept, but I have concerns that in practice all packfiles after a partial clone will be promisor-packfiles and therefore short-cut during fsck, so fsck still won't gain anything. It would help if there are also non-promisor packfiles present, but only for objects referenced by non-promisor packfiles. But then I also have to wonder whether we can even support non-promisor packfiles after starting with a partial clone -- because of the properties of received thin-packs on a non-partial fetch. Thanks, Jeff
Re: [PATCH] git: add --no-optional-locks option
Am 21.09.2017 um 06:32 schrieb Jeff King: diff --git a/Documentation/git.txt b/Documentation/git.txt index 6e3a6767e5..8dd3ae05ae 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -159,6 +159,10 @@ foo.bar= ...`) sets `foo.bar` to the empty string which ` git config Add "icase" magic to all pathspec. This is equivalent to setting the `GIT_ICASE_PATHSPECS` environment variable to `1`. +--no-optional-locks:: + Do not perform optional operations that require locks. This is + equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`. + GIT COMMANDS @@ -697,6 +701,15 @@ of clones and fetches. which feed potentially-untrusted URLS to git commands. See linkgit:git-config[1] for more details. +`GIT_OPTIONAL_LOCKS`:: + If set to `0`, Git will avoid performing any operations which + require taking a lock and which are not required to complete the + requested operation. For example, this will prevent `git status` + from refreshing the index as a side effect. This is useful for + processes running in the background which do not want to cause + lock contention with other operations on the repository. + Defaults to `1`. I don't think we should pass this environment variable to remote repositories. It should be listed in local_repo_env[] in environment.c. -- Hannes
Re: [PATCH 1/4] git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings
On 2017-09-21 18:46, Ramsay Jones wrote: > > Signed-off-by: Ramsay Jones > --- > git-compat-util.h | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/git-compat-util.h b/git-compat-util.h > index 9bc15b036..cedad4d58 100644 > --- a/git-compat-util.h > +++ b/git-compat-util.h > @@ -898,9 +898,11 @@ static inline char *xstrdup_or_null(const char *str) > > static inline size_t xsize_t(off_t len) > { > - if (len > (size_t) len) > + size_t size = (size_t) len; > + > + if (len != (off_t) size) > die("Cannot handle files this big"); > - return (size_t)len; > + return size; Hm, can someone help me out ? Why is the cast not needed ? > } > > __attribute__((format (printf, 3, 4))) >
Re: [PATCH 1/4] git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings
On Thu, Sep 21, 2017 at 11:49 AM, Torsten Bögershausen wrote: > On 2017-09-21 18:46, Ramsay Jones wrote: >> >> Signed-off-by: Ramsay Jones >> --- >> git-compat-util.h | 6 -- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/git-compat-util.h b/git-compat-util.h >> index 9bc15b036..cedad4d58 100644 >> --- a/git-compat-util.h >> +++ b/git-compat-util.h >> @@ -898,9 +898,11 @@ static inline char *xstrdup_or_null(const char *str) >> >> static inline size_t xsize_t(off_t len) >> { >> - if (len > (size_t) len) >> + size_t size = (size_t) len; >> + >> + if (len != (off_t) size) >> die("Cannot handle files this big"); >> - return (size_t)len; >> + return size; > > Hm, can someone help me out ? > Why is the cast not needed ? Because 'size' is already size_t, (cast was done in first line of the function: size_t size = (size_t) len; ), previously we cast len multiple times, now we cast it once and use size thereafter. > >> } >> >> __attribute__((format (printf, 3, 4))) >> >
Re: [PATCH 1/4] git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings
On 21/09/17 19:49, Torsten Bögershausen wrote: > On 2017-09-21 18:46, Ramsay Jones wrote: >> >> Signed-off-by: Ramsay Jones >> --- >> git-compat-util.h | 6 -- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/git-compat-util.h b/git-compat-util.h >> index 9bc15b036..cedad4d58 100644 >> --- a/git-compat-util.h >> +++ b/git-compat-util.h >> @@ -898,9 +898,11 @@ static inline char *xstrdup_or_null(const char *str) >> >> static inline size_t xsize_t(off_t len) >> { >> -if (len > (size_t) len) >> +size_t size = (size_t) len; >> + >> +if (len != (off_t) size) >> die("Cannot handle files this big"); >> -return (size_t)len; >> +return size; > > Hm, can someone help me out ? > Why is the cast not needed ? The cast in the return statement? If so, because the 'size' variable has been declared with the size_t type. (The truncation, if any, would happen when 'size' is initialised in the declaration). Hope that helps. ATB, Ramsay Jones
[PATCH] Documentation/githooks: mention merge in commit-msg hook
The commit-msg hook is invoked by both commit and merge now. Reported-by: Kaartic Sivaraam Signed-off-by: Stefan Beller --- Documentation/githooks.txt | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index 623ed1a138..f4e75b9c90 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -127,11 +127,10 @@ help message found in the commented portion of the commit template. commit-msg ~~ -This hook is invoked by 'git commit', and can be bypassed -with the `--no-verify` option. It takes a single parameter, the -name of the file that holds the proposed commit log message. -Exiting with a non-zero status causes the 'git commit' to -abort. +This hook is invoked by 'git commit' and 'git merge', and can be +bypassed with the `--no-verify` option. It takes a single parameter, +the name of the file that holds the proposed commit log message. +Exiting with a non-zero status causes the command to abort. The hook is allowed to edit the message file in place, and can be used to normalize the message into some project standard format. It -- 2.14.0.rc0.3.g6c2e499285
[PATCH] connect: in ref advertisement, shallows are last
Currently, get_remote_heads() parses the ref advertisement in one loop, allowing refs and shallow lines to intersperse, despite this not being allowed by the specification. Refactor get_remote_heads() to use two loops instead, enforcing that refs come first, and then shallows. This also makes it easier to teach get_remote_heads() to interpret other lines in the ref advertisement, which will be done in a subsequent patch. Signed-off-by: Jonathan Tan --- It seems that some people are concerned about looseness in interpreting the ref advertisement, so here is a patch to tighten it instead. This is a replacement for Brandon's PATCH 1.5. I think this is what Jonathan Nieder meant by his instruction flow idea. I've checked that Brandon's other patches apply cleanly on this patch, except for "connect: teach client to recognize v1 server response" which has to be modified to the following: @@ -149,6 +150,26 @@ struct ref **get_remote_heads(int in, char *src_buf, size_t src_len, *list = NULL; len = read_remote_ref(in, &src_buf, &src_len, &responded); + + switch (determine_protocol_version_client(packet_buffer)) { + case protocol_v1: + /* +* First pkt-line contained the version string. +* Continue on to process the ref advertisement. +*/ + len = read_remote_ref(in, &src_buf, &src_len, &responded); + break; + case protocol_v0: + /* +* Server is speaking protocol v0 and sent a +* ref so we need to process it. +*/ + break; + default: + die("server is speaking an unknown protocol"); + break; + } + connect.c | 112 ++ 1 file changed, 61 insertions(+), 51 deletions(-) diff --git a/connect.c b/connect.c index 49b28b83b..9bf97adf6 100644 --- a/connect.c +++ b/connect.c @@ -107,6 +107,26 @@ static void annotate_refs_with_symref_info(struct ref *ref) string_list_clear(&symref, 0); } +/* + * Read one line of a server's ref advertisement into packet_buffer. + */ +int read_remote_ref(int in, char **src_buf, size_t *src_len, int *responded) +{ + int len = packet_read(in, src_buf, src_len, + packet_buffer, sizeof(packet_buffer), + PACKET_READ_GENTLE_ON_EOF | + PACKET_READ_CHOMP_NEWLINE); + const char *arg; + if (len < 0) + die_initial_contact(*responded); + if (len > 4 && skip_prefix(packet_buffer, "ERR ", &arg)) + die("remote error: %s", arg); + + *responded = 1; + + return len; +} + /* * Read all the refs from the other end */ @@ -123,46 +143,23 @@ struct ref **get_remote_heads(int in, char *src_buf, size_t src_len, * willing to talk to us. A hang-up before seeing any * response does not necessarily mean an ACL problem, though. */ - int saw_response; - int got_dummy_ref_with_capabilities_declaration = 0; + int responded = 0; + int len; *list = NULL; - for (saw_response = 0; ; saw_response = 1) { - struct ref *ref; + + len = read_remote_ref(in, &src_buf, &src_len, &responded); + do { struct object_id old_oid; char *name; - int len, name_len; - char *buffer = packet_buffer; - const char *arg; + int name_len; - len = packet_read(in, &src_buf, &src_len, - packet_buffer, sizeof(packet_buffer), - PACKET_READ_GENTLE_ON_EOF | - PACKET_READ_CHOMP_NEWLINE); - if (len < 0) - die_initial_contact(saw_response); - - if (!len) + if (len < GIT_SHA1_HEXSZ + 2 || + get_oid_hex(packet_buffer, &old_oid) || + packet_buffer[GIT_SHA1_HEXSZ] != ' ') break; - if (len > 4 && skip_prefix(buffer, "ERR ", &arg)) - die("remote error: %s", arg); - - if (len == GIT_SHA1_HEXSZ + strlen("shallow ") && - skip_prefix(buffer, "shallow ", &arg)) { - if (get_oid_hex(arg, &old_oid)) - die("protocol error: expected shallow sha-1, got '%s'", arg); - if (!shallow_points) - die("repository on the other end cannot be shallow"); - oid_array_append(shallow_points, &old_oid); - continue; - } - - if (len < GIT_SHA1_HEXSZ + 2 || get_oid_hex(buffer, &old_oid) || - buffer[GIT_SHA1_HEXSZ] != ' ') -
[PATCH v1] travis-ci: fix "skip_branch_tip_with_tag()" string comparison
From: Lars Schneider 09f5e97 ("travis-ci: skip a branch build if equal tag is present", 2017-09-17) introduced the "skip_branch_tip_with_tag" function with a broken string comparison. Fix it! Reported-by: SZEDER Gábor Signed-off-by: Lars Schneider --- Hi, previous discussion: https://public-inbox.org/git/3b175d35-5b1c-43cd-a7e9-85693335b...@gmail.com/ Sorry that this trivial fix took so long. Cheers, Lars Notes: Base Commit: a81423d7cf (a81423d7cfdc57238783f053941064c99165) Diff on Web: https://github.com/larsxschneider/git/commit/6b532a42f0 Checkout:git fetch https://github.com/larsxschneider/git travisci/fix-skip-branch-v1 && git checkout 6b532a42f0 ci/lib-travisci.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ci/lib-travisci.sh b/ci/lib-travisci.sh index 9c4ae9bdd0..c3b46f4a7d 100755 --- a/ci/lib-travisci.sh +++ b/ci/lib-travisci.sh @@ -14,7 +14,7 @@ skip_branch_tip_with_tag () { # of a tag. if TAG=$(git describe --exact-match "$TRAVIS_BRANCH" 2>/dev/null) && - $TAG != $TRAVIS_BRANCH + [ "$TAG" != "$TRAVIS_BRANCH" ] then echo "Tip of $TRAVIS_BRANCH is exactly at $TAG" exit 0 -- 2.14.1
Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)
> > * sd/branch-copy (2017-06-18) 3 commits > - branch: add a --copy (-c) option to go with --move (-m) > - branch: add test for -m renaming multiple config sections > - config: create a function to format section headers > > "git branch" learned "-c/-C" to create and switch to a new branch > by copying an existing one. > > I personally do not think "branch --copy master backup" while on > "master" that switches to "backup" is a good UI, and I *will* say > "I told you so" when users complain after we merge this down to > 'master'. Junio, what's up with this one? It's been a long time for this being cooked in next. Do we have a plan/idea going forward or is it just waiting for more feedback?
Re: [PATCH v1] travis-ci: fix "skip_branch_tip_with_tag()" string comparison
larsxschnei...@gmail.com wrote: > 09f5e97 ("travis-ci: skip a branch build if equal tag is present", > 2017-09-17) introduced the "skip_branch_tip_with_tag" function with > a broken string comparison. Fix it! > > Reported-by: SZEDER Gábor > Signed-off-by: Lars Schneider > --- Thanks for the fix. 09f5e97 appears to be for the ls/travis-scriptify branch, which is already part of "next" (if it weren't, I'd suggest just squashing your patch into that commit). > --- a/ci/lib-travisci.sh > +++ b/ci/lib-travisci.sh > @@ -14,7 +14,7 @@ skip_branch_tip_with_tag () { > # of a tag. > > if TAG=$(git describe --exact-match "$TRAVIS_BRANCH" 2>/dev/null) && > - $TAG != $TRAVIS_BRANCH > + [ "$TAG" != "$TRAVIS_BRANCH" ] Git style is to use 'test' instead of '[' for this. See https://public-inbox.org/git/2f3cdc85-f051-c0ae-b9db-fd13cac78...@gmail.com/ for more on that subject. Could you squash in the following? Thanks, Jonathan diff --git i/ci/lib-travisci.sh w/ci/lib-travisci.sh index c3b46f4a7d..b3ed0a0dda 100755 --- i/ci/lib-travisci.sh +++ w/ci/lib-travisci.sh @@ -14,7 +14,7 @@ skip_branch_tip_with_tag () { # of a tag. if TAG=$(git describe --exact-match "$TRAVIS_BRANCH" 2>/dev/null) && - [ "$TAG" != "$TRAVIS_BRANCH" ] + test "$TAG" != "$TRAVIS_BRANCH" then echo "Tip of $TRAVIS_BRANCH is exactly at $TAG" exit 0
Re: [PATCH 3/8] daemon: recognize hidden request arguments
On 09/20, Jonathan Tan wrote: > On Wed, 20 Sep 2017 17:24:43 -0700 > Jonathan Tan wrote: > > > On Wed, 13 Sep 2017 14:54:43 -0700 > > Brandon Williams wrote: > > > > > A normal request to git-daemon is structured as > > > "command path/to/repo\0host=..\0" and due to a bug in an old version of > > > git-daemon 73bb33a94 (daemon: Strictly parse the "extra arg" part of the > > > command, 2009-06-04) we aren't able to place any extra args (separated > > > by NULs) besides the host. > > > > > > In order to get around this limitation teach git-daemon to recognize > > > additional request arguments hidden behind a second NUL byte. Requests > > > can then be structured like: > > > "command path/to/repo\0host=..\0\0version=1\0key=value\0". git-daemon > > > can then parse out the extra arguments and set 'GIT_PROTOCOL' > > > accordingly. > > > > A test in this patch (if possible) would be nice, but it is probably > > clearer to test this when one of the commands (e.g. upload-pack) is > > done. Could a test be added to the next patch to verify (using > > GIT_TRACE_PACKET, maybe) that the expected strings are sent? Then > > mention in this commit message that this will be tested in the next > > patch too. > > Ah, I see that it is tested in 6/8. You can ignore this comment. Yeah I felt it would have been difficult to test any earlier without both the client and server sides done. -- Brandon Williams
Re: [PATCH v1] travis-ci: fix "skip_branch_tip_with_tag()" string comparison
> On 21 Sep 2017, at 23:28, Jonathan Nieder wrote: > > larsxschnei...@gmail.com wrote: > >> 09f5e97 ("travis-ci: skip a branch build if equal tag is present", >> 2017-09-17) introduced the "skip_branch_tip_with_tag" function with >> a broken string comparison. Fix it! >> >> Reported-by: SZEDER Gábor >> Signed-off-by: Lars Schneider >> --- > > Thanks for the fix. > > 09f5e97 appears to be for the ls/travis-scriptify branch, which is > already part of "next" (if it weren't, I'd suggest just squashing your > patch into that commit). > >> --- a/ci/lib-travisci.sh >> +++ b/ci/lib-travisci.sh >> @@ -14,7 +14,7 @@ skip_branch_tip_with_tag () { >> # of a tag. >> >> if TAG=$(git describe --exact-match "$TRAVIS_BRANCH" 2>/dev/null) && >> -$TAG != $TRAVIS_BRANCH >> +[ "$TAG" != "$TRAVIS_BRANCH" ] > > Git style is to use 'test' instead of '[' for this. See > https://public-inbox.org/git/2f3cdc85-f051-c0ae-b9db-fd13cac78...@gmail.com/ > for more on that subject. Oh, you're right! > Could you squash in the following? @Junio: Can you squash it when you apply the patch? Thank you, Lars > > Thanks, > Jonathan > > diff --git i/ci/lib-travisci.sh w/ci/lib-travisci.sh > index c3b46f4a7d..b3ed0a0dda 100755 > --- i/ci/lib-travisci.sh > +++ w/ci/lib-travisci.sh > @@ -14,7 +14,7 @@ skip_branch_tip_with_tag () { > # of a tag. > > if TAG=$(git describe --exact-match "$TRAVIS_BRANCH" 2>/dev/null) && > - [ "$TAG" != "$TRAVIS_BRANCH" ] > + test "$TAG" != "$TRAVIS_BRANCH" > then > echo "Tip of $TRAVIS_BRANCH is exactly at $TAG" > exit 0
Re: RFC: Design and code of partial clones (now, missing commits and trees OK)
On Thu, 21 Sep 2017 13:57:30 -0400 Jeff Hostetler wrote: > There's a lot in this patch series. I'm still studying it, but here > are some notes and questions. I'll start with direct responses to > the RFC here and follow up in a second email with specific questions > and comments to keep this from being too long). OK - thanks for your detailed comments. > I like that git-clone saves the partial clone settings in the > .git/config. This should make it easier for subsequent commands to > default to the right settings. > > Do we allow a partial-fetch following a regular clone (such that > git-fetch would create these config file settings)? This seems like > a reasonable upgrade path for a user with an existing repo to take > advantage of partial fetches going forward. A "git-fetch ...options... --save-options" does not sound unreasonable, although I would think that (i) partial fetches/clones are useful on very large repositories, and (ii) the fact that you had already cloned this large repository shows that you can handle the disk and network load, so partial fetch after non-partial clone doesn't seem very useful. But if there is a use case, I think it could work. Although note that the GC in my patch set stops at promisor objects, so if an object originally cloned cannot be reached through a walk (with non-promisor intermediate objects only), it might end up GC-ed (which might be fine). > Or do we require that git-fetch only be allowed to do partial-fetches > after a partial-clone (so that only clone creates these settings) and > fetch always does partial-fetches thereafter? It might be useful to > allow fetch to do a full fetch on top of a partial-clone, but there > are probably thin-pack issues to work out first. About the thin-pack issue, I wonder if it is sufficient to turn on fetch_if_missing while index-pack is trying to fix the thin pack. If the thin-pack issues are worked out, then non-partial fetch after partial clone seems doable (all commits from our tip to their tip are fetched, as well as all new trees and all new blobs; any trees and blobs still missing are already promised). Thanks for these questions - I am concentrating on repos in which both clone and fetch are partial, but it is good to discuss what happens if the user does something else. > Also, there is an assumption here that the user will want to keep > using the same filtering settings on subsequent fetches. That's > probably fine for now and until we get a chance to try it out for > real. Yes. Having said that, the fetching of missing objects does not take into account the filter at all, so the filter can be easily changed. > Do we allow EXACTLY ONE promisor-remote? That is, can we later add > another remote as a promisor-remote? And then do partial-fetches from > either? Yes, exactly one (because we need to know which remote to fetch missing objects from, and which remote to allow partial fetches from). But the promisor remote can be switched to another. > Do we need to disallow removing or altering a remote that is > listed as a promisor-remote? Perhaps, although I think that right now configs are checked when we run the command using the config, not when we run "git config". > I think for now, one promisor-remote is fine. Just asking. > > Changing a remote's URL might be fine, but deleting the > promisor-remote would put the user in a weird state. We don't need > to worry about it now though. Agreed. > I struggled with the terms here a little when looking at the source. > () A remote responding to a partial-clone is termed a > "promisor-remote". () Packfiles received from a promisor-remote are > marked with ".promisor" like ".keep" names. > () An object actually contained in such packfiles is called a > "promisor-object". () An object not-present but referenced by one of > the above promisor-objects is called a "promised-object" (aka a > "missing-object"). > > I think the similarity of the words "promisOR" and "promisED" threw > me here and as I was looking at the code. The code in is_promised() > [1] looked like it was adding all promisor- and promised-objects to > the "promised" OIDSET, but I could be mistaken. > > [1] > https://github.com/jonathantanmy/git/commit/7a9c2d9b6e2fce293817b595dee29a7eede0#diff-5d5d5dc185ef37dc30bb7d9a7ae0c4e8R1960 I was struggling a bit with the terminology, true. Right now I'm thinking of: - promisor remote (as you defined) - promisor packfile (as you defined) - promisor object is an object known to belong to the promisor (whether because we have it in a promisor packfile or because it is referenced by an object in a promisor packfile) This might eliminate "promise(d)", and thus eliminate the confusion between "promised" and "promisor", but I haven't done an exhaustive search. > The contents of the ".promisor" file are described as arbitrary? > Should we write the name of the remote (or some other structured data) > into this file so that later fe
Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 2/3)
On Thu, 21 Sep 2017 13:59:43 -0400 Jeff Hostetler wrote: > (part 2) > > Additional overall comments on: > https://github.com/jonathantanmy/git/commits/partialclone2 > > {} I think it would help to split the blob-max-bytes filtering and the > promisor/promised concepts and discuss them independently. > > {} Then we can talk about about the promisor/promised > functionality independent of any kind of filter. The net-net is that > the client has missing objects and it doesn't matter what filter > criteria or mechanism caused that to happened. > > {} blob-max-bytes is but one such filter we should have. This > might be sufficient if the goal is replace LFS (where you rarely ever >need any given very very large object) and dynamically loading >them as needed is sufficient and the network round-trip isn't >too much of a perf penalty. > > {} But if we want to do things like a "sparse-enlistments" where > the client only needs a small part of the tree using sparse-checkout. >For example, only populating 50,000 files from a tree of 3.5M > files at HEAD, then we need a more general filtering. > > {} And as I said above, how we chose to filter should be > independent of how the client handles promisor/promised objects. I agree that they are independent. (I put everything together so that people could see how they work together, but they can be changed independently of each other.) > {} Also, if we rely strictly on dynamic object fetching to fetch > missing objects, we are effectively tethered to the server during > operations (such as checkout) that the user might not think about as > requiring a network connection. And we are forced to keep the same > limitations of LFS in that you can't prefetch and go offline (without > actually checking out to your worktree first). And we can't bulk or > parallel fetch objects. I don't think dynamic object fetching precludes any other more optimized way of fetching or prefetching - I implemented dynamic object fetching first so that we would have a fallback, but the others definitely can be implemented too. > {} I think it would also help to move the blob-max-bytes calculation > out of pack-objects.c : add_object_entry() [1]. The current code > isolates the computation there so that only pack-objects can do the > filtering. > > Instead, put it in list-objects.c and traverse_commit_list() so > that pack-objects and rev-list can share it (as Peff suggested [2] in > response to my first patch series in March). > > For example, this would let the client have a pre-checkout hook, > use rev-list to compute the set of missing objects needed for that > commit, and pipe that to a command to BULK fetch them from the server > BEFORE starting the actual checkout. This would allow the savy user > to manually run a prefetch before going offline. > > [1] > https://github.com/jonathantanmy/git/commit/68e529484169f4800115c5a32e0904c25ad14bd8#diff-a8d2c9cf879e775d748056cfed48440cR1110 > > [2] > https://public-inbox.org/git/20170309073117.g3br5btsfwntc...@sigill.intra.peff.net/ In your specific example, how would rev-list know, on the client, to include (or exclude) a large blob in its output if it does not have it, and thus does not know its size? My reason for including it in pack-objects.c is because I only needed it there and it is much simpler, but I agree that if it can be used elsewhere, we can put it in a more general place. > {} This also locks us into size-only filtering and makes it more > difficult to add other filters. In that the add_object_entry() > code gets called on an object after the traversal has decided > what to do with it. It would be difficult to add tree-trimming > at this level, for example. That is true. > {} An early draft of this type of filtering is here [3]. I hope to > push up a revised draft of this shortly. > > [3] > https://public-inbox.org/git/20170713173459.3559-1-...@jeffhostetler.com/ OK - I'll take a look when that is done (I think I commented on an earlier version on that).
Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 3)
On Thu, 21 Sep 2017 14:00:40 -0400 Jeff Hostetler wrote: > (part 3) > > Additional overall comments on: > https://github.com/jonathantanmy/git/commits/partialclone2 > > {} WRT the code in is_promised() [1] > > [1] > https://github.com/jonathantanmy/git/commit/7a9c2d9b6e2fce293817b595dee29a7eede0#diff-5d5d5dc185ef37dc30bb7d9a7ae0c4e8R1960 > > {} it looked like it was adding ALL promisor- and > promised-objects to the "promised" OIDSET, rather than just > promised-objects, but I could be mistaken. As far as I can tell, it is just adding the promised objects (some of which may also be promisor objects). If you're saying that you expected it to add the promisor objects as well, that might be a reasonable expectation...I'm thinking of doing that. > {} Is this iterating over ALL promisor-packfiles? Yes. > {} It looked like this was being used by fsck and rev-list. I > have concerns about how big this OIDSET will get and how it will > scale, since if we start with a partial-clone all packfiles will be >promisor-packfiles. It's true that scaling is an issue. I'm not sure if omitting the oidset will solve anything, though - as it is, Git maintains an object hash and adds to it quite liberally. One thing that might help is some sort of flushing of objects in promisor packfiles from the local repository - that way, the oidset won't be so large. > > {} When iterating thru a tree object, you add everything that it >references (everything in that folder). This adds all of the >child OIDs -- without regard to whether they are new to this >version of the tree object. (Granted, this is hard to compute.) The oidset will deduplicate OIDs. >My concern is that this will add too many objects to the > OIDSET. That is, a new tree object (because of a recent change to > something in that folder) will also have the OIDs of the other > *unchanged* files which may be present in an earlier non-provisor > packfile from an earlier commit. > >I worry that this will grow the OIDSET to essentially include >everything. And possibly defeating its own purpose. I could > be wrong here, but that's my concern. Same answer as above (about flushing of objects in promisor packfiles). > {} I'm not opposed to the .promisor file concept, but I have concerns > that in practice all packfiles after a partial clone will be > promisor-packfiles and therefore short-cut during fsck, so fsck > still won't gain anything. > > It would help if there are also non-promisor packfiles present, > but only for objects referenced by non-promisor packfiles. > > But then I also have to wonder whether we can even support > non-promisor packfiles after starting with a partial clone -- because > of the properties of received thin-packs on a non-partial fetch. Same reply as to your other e-mail - locally created objects are not in promisor packfiles. (Or were you thinking of a situation where locally created objects are immediately uploaded to the promisor remote, thus making them promisor objects too?)
[PATCH v2] connect: in ref advertisement, shallows are last
Currently, get_remote_heads() parses the ref advertisement in one loop, allowing refs and shallow lines to intersperse, despite this not being allowed by the specification. Refactor get_remote_heads() to use two loops instead, enforcing that refs come first, and then shallows. This also makes it easier to teach get_remote_heads() to interpret other lines in the ref advertisement, which will be done in a subsequent patch. Signed-off-by: Jonathan Tan --- In some in-office discussion, I was informed that my original patch relaxed the ordering of ".keep" lines. Here is an update. I'm also using a switch statement now, which avoids having multiple lines of read_remote_ref(). connect.c | 159 -- 1 file changed, 93 insertions(+), 66 deletions(-) diff --git a/connect.c b/connect.c index 49b28b83b..e0821dbff 100644 --- a/connect.c +++ b/connect.c @@ -107,6 +107,84 @@ static void annotate_refs_with_symref_info(struct ref *ref) string_list_clear(&symref, 0); } +/* + * Read one line of a server's ref advertisement into packet_buffer. + */ +static int read_remote_ref(int in, char **src_buf, size_t *src_len, + int *responded) +{ + int len = packet_read(in, src_buf, src_len, + packet_buffer, sizeof(packet_buffer), + PACKET_READ_GENTLE_ON_EOF | + PACKET_READ_CHOMP_NEWLINE); + const char *arg; + if (len < 0) + die_initial_contact(*responded); + if (len > 4 && skip_prefix(packet_buffer, "ERR ", &arg)) + die("remote error: %s", arg); + + *responded = 1; + + return len; +} + +#define EXPECTING_REF 0 +#define EXPECTING_SHALLOW 1 + +static int process_ref(int *state, int len, struct ref ***list, + unsigned int flags, struct oid_array *extra_have) +{ + struct object_id old_oid; + char *name; + int name_len; + + if (len < GIT_SHA1_HEXSZ + 2 || + get_oid_hex(packet_buffer, &old_oid) || + packet_buffer[GIT_SHA1_HEXSZ] != ' ') { + (*state)++; + return 0; + } + + name = packet_buffer + GIT_SHA1_HEXSZ + 1; + name_len = strlen(name); + if (len != name_len + GIT_SHA1_HEXSZ + 1) { + free(server_capabilities); + server_capabilities = xstrdup(name + name_len + 1); + } + + if (extra_have && !strcmp(name, ".have")) { + oid_array_append(extra_have, &old_oid); + } else if (!strcmp(name, "capabilities^{}")) { + if (**list) + /* cannot coexist with other refs */ + die("protocol error: unexpected capabilities^{}"); + /* There should be no more refs; proceed to the next state. */ + (*state)++; + } else if (check_ref(name, flags)) { + struct ref *ref = alloc_ref(name); + oidcpy(&ref->old_oid, &old_oid); + **list = ref; + *list = &ref->next; + } + return 1; +} + +static int process_shallow(int *state, struct oid_array *shallow_points) +{ + const char *arg; + struct object_id old_oid; + + if (!skip_prefix(packet_buffer, "shallow ", &arg)) + return 0; + + if (get_oid_hex(arg, &old_oid)) + die("protocol error: expected shallow sha-1, got '%s'", arg); + if (!shallow_points) + die("repository on the other end cannot be shallow"); + oid_array_append(shallow_points, &old_oid); + return 1; +} + /* * Read all the refs from the other end */ @@ -123,76 +201,25 @@ struct ref **get_remote_heads(int in, char *src_buf, size_t src_len, * willing to talk to us. A hang-up before seeing any * response does not necessarily mean an ACL problem, though. */ - int saw_response; - int got_dummy_ref_with_capabilities_declaration = 0; + int responded = 0; + int len; + int state = EXPECTING_REF; *list = NULL; - for (saw_response = 0; ; saw_response = 1) { - struct ref *ref; - struct object_id old_oid; - char *name; - int len, name_len; - char *buffer = packet_buffer; - const char *arg; - - len = packet_read(in, &src_buf, &src_len, - packet_buffer, sizeof(packet_buffer), - PACKET_READ_GENTLE_ON_EOF | - PACKET_READ_CHOMP_NEWLINE); - if (len < 0) - die_initial_contact(saw_response); - - if (!len) - break; - - if (len > 4 && skip_prefix(buffer, "ERR ", &arg)) - die("remote error: %s", arg); - - if (len == GIT_SHA1_HEXSZ + strlen("sha
Re: [PATCH v2] connect: in ref advertisement, shallows are last
On 09/21, Jonathan Tan wrote: > Currently, get_remote_heads() parses the ref advertisement in one loop, > allowing refs and shallow lines to intersperse, despite this not being > allowed by the specification. Refactor get_remote_heads() to use two > loops instead, enforcing that refs come first, and then shallows. > > This also makes it easier to teach get_remote_heads() to interpret other > lines in the ref advertisement, which will be done in a subsequent > patch. > > Signed-off-by: Jonathan Tan > --- > In some in-office discussion, I was informed that my original patch > relaxed the ordering of ".keep" lines. Here is an update. > > I'm also using a switch statement now, which avoids having multiple > lines of read_remote_ref(). Looks cleaner than the last patch. > > connect.c | 159 > -- > 1 file changed, 93 insertions(+), 66 deletions(-) > > diff --git a/connect.c b/connect.c > index 49b28b83b..e0821dbff 100644 > --- a/connect.c > +++ b/connect.c > @@ -107,6 +107,84 @@ static void annotate_refs_with_symref_info(struct ref > *ref) > string_list_clear(&symref, 0); > } > > +/* > + * Read one line of a server's ref advertisement into packet_buffer. > + */ > +static int read_remote_ref(int in, char **src_buf, size_t *src_len, > +int *responded) > +{ > + int len = packet_read(in, src_buf, src_len, > + packet_buffer, sizeof(packet_buffer), > + PACKET_READ_GENTLE_ON_EOF | > + PACKET_READ_CHOMP_NEWLINE); > + const char *arg; > + if (len < 0) > + die_initial_contact(*responded); > + if (len > 4 && skip_prefix(packet_buffer, "ERR ", &arg)) > + die("remote error: %s", arg); > + > + *responded = 1; > + > + return len; > +} > + > +#define EXPECTING_REF 0 > +#define EXPECTING_SHALLOW 1 > + > +static int process_ref(int *state, int len, struct ref ***list, > +unsigned int flags, struct oid_array *extra_have) > +{ > + struct object_id old_oid; > + char *name; > + int name_len; > + > + if (len < GIT_SHA1_HEXSZ + 2 || > + get_oid_hex(packet_buffer, &old_oid) || > + packet_buffer[GIT_SHA1_HEXSZ] != ' ') { > + (*state)++; I think it may be cleaner if the state variable is updated outside of this function based on a return value from this function. > + return 0; > + } > + > + name = packet_buffer + GIT_SHA1_HEXSZ + 1; > + name_len = strlen(name); > + if (len != name_len + GIT_SHA1_HEXSZ + 1) { > + free(server_capabilities); > + server_capabilities = xstrdup(name + name_len + 1); > + } > + > + if (extra_have && !strcmp(name, ".have")) { > + oid_array_append(extra_have, &old_oid); > + } else if (!strcmp(name, "capabilities^{}")) { > + if (**list) > + /* cannot coexist with other refs */ > + die("protocol error: unexpected capabilities^{}"); > + /* There should be no more refs; proceed to the next state. */ > + (*state)++; > + } else if (check_ref(name, flags)) { > + struct ref *ref = alloc_ref(name); > + oidcpy(&ref->old_oid, &old_oid); > + **list = ref; > + *list = &ref->next; > + } > + return 1; > +} > + > +static int process_shallow(int *state, struct oid_array *shallow_points) state isn't needed here and could be dropped from the parameter list. > +{ > + const char *arg; > + struct object_id old_oid; > + > + if (!skip_prefix(packet_buffer, "shallow ", &arg)) > + return 0; > + > + if (get_oid_hex(arg, &old_oid)) > + die("protocol error: expected shallow sha-1, got '%s'", arg); > + if (!shallow_points) > + die("repository on the other end cannot be shallow"); > + oid_array_append(shallow_points, &old_oid); > + return 1; > +} > + > /* > * Read all the refs from the other end > */ > @@ -123,76 +201,25 @@ struct ref **get_remote_heads(int in, char *src_buf, > size_t src_len, >* willing to talk to us. A hang-up before seeing any >* response does not necessarily mean an ACL problem, though. >*/ > - int saw_response; > - int got_dummy_ref_with_capabilities_declaration = 0; > + int responded = 0; > + int len; > + int state = EXPECTING_REF; > > *list = NULL; > - for (saw_response = 0; ; saw_response = 1) { > - struct ref *ref; > - struct object_id old_oid; > - char *name; > - int len, name_len; > - char *buffer = packet_buffer; > - const char *arg; > - > - len = packet_read(in, &src_buf, &src_len, > - packet_buffer, sizeof(packet_buffer), > - PACKET_READ_GENTLE_ON_
[PATCH v3] connect: in ref advertisement, shallows are last
Currently, get_remote_heads() parses the ref advertisement in one loop, allowing refs and shallow lines to intersperse, despite this not being allowed by the specification. Refactor get_remote_heads() to use two loops instead, enforcing that refs come first, and then shallows. This also makes it easier to teach get_remote_heads() to interpret other lines in the ref advertisement, which will be done in a subsequent patch. Signed-off-by: Jonathan Tan --- I sent the wrong version of this patch :-( This should be the correct one. A bit less clean because I introduced a 3rd state, however. connect.c | 167 +- 1 file changed, 101 insertions(+), 66 deletions(-) diff --git a/connect.c b/connect.c index 49b28b83b..ef6358cfc 100644 --- a/connect.c +++ b/connect.c @@ -107,6 +107,91 @@ static void annotate_refs_with_symref_info(struct ref *ref) string_list_clear(&symref, 0); } +/* + * Read one line of a server's ref advertisement into packet_buffer. + */ +static int read_remote_ref(int in, char **src_buf, size_t *src_len, + int *responded) +{ + int len = packet_read(in, src_buf, src_len, + packet_buffer, sizeof(packet_buffer), + PACKET_READ_GENTLE_ON_EOF | + PACKET_READ_CHOMP_NEWLINE); + const char *arg; + if (len < 0) + die_initial_contact(*responded); + if (len > 4 && skip_prefix(packet_buffer, "ERR ", &arg)) + die("remote error: %s", arg); + + *responded = 1; + + return len; +} + +#define EXPECTING_REF_WITH_CAPABILITIES 0 +#define EXPECTING_REF 1 +#define EXPECTING_SHALLOW 2 + +static int process_ref(int *state, int len, struct ref ***list, + unsigned int flags, struct oid_array *extra_have) +{ + struct object_id old_oid; + char *name; + int name_len; + + if (len < GIT_SHA1_HEXSZ + 2 || + get_oid_hex(packet_buffer, &old_oid) || + packet_buffer[GIT_SHA1_HEXSZ] != ' ') { + *state = EXPECTING_SHALLOW; + return 0; + } + + name = packet_buffer + GIT_SHA1_HEXSZ + 1; + name_len = strlen(name); + if (*state == EXPECTING_REF_WITH_CAPABILITIES && + len != name_len + GIT_SHA1_HEXSZ + 1) { + free(server_capabilities); + server_capabilities = xstrdup(name + name_len + 1); + } else if (*state == EXPECTING_REF) { + if (len != name_len + GIT_SHA1_HEXSZ + 1) + die("unexpected capabilities after ref name"); + } + + if (extra_have && !strcmp(name, ".have")) { + oid_array_append(extra_have, &old_oid); + } else if (!strcmp(name, "capabilities^{}")) { + if (**list) + /* cannot coexist with other refs */ + die("protocol error: unexpected capabilities^{}"); + /* There should be no more refs; proceed to the next state. */ + *state = EXPECTING_SHALLOW; + return 1; + } else if (check_ref(name, flags)) { + struct ref *ref = alloc_ref(name); + oidcpy(&ref->old_oid, &old_oid); + **list = ref; + *list = &ref->next; + } + *state = EXPECTING_REF; + return 1; +} + +static int process_shallow(struct oid_array *shallow_points) +{ + const char *arg; + struct object_id old_oid; + + if (!skip_prefix(packet_buffer, "shallow ", &arg)) + return 0; + + if (get_oid_hex(arg, &old_oid)) + die("protocol error: expected shallow sha-1, got '%s'", arg); + if (!shallow_points) + die("repository on the other end cannot be shallow"); + oid_array_append(shallow_points, &old_oid); + return 1; +} + /* * Read all the refs from the other end */ @@ -123,76 +208,26 @@ struct ref **get_remote_heads(int in, char *src_buf, size_t src_len, * willing to talk to us. A hang-up before seeing any * response does not necessarily mean an ACL problem, though. */ - int saw_response; - int got_dummy_ref_with_capabilities_declaration = 0; + int responded = 0; + int len; + int state = EXPECTING_REF_WITH_CAPABILITIES; *list = NULL; - for (saw_response = 0; ; saw_response = 1) { - struct ref *ref; - struct object_id old_oid; - char *name; - int len, name_len; - char *buffer = packet_buffer; - const char *arg; - - len = packet_read(in, &src_buf, &src_len, - packet_buffer, sizeof(packet_buffer), - PACKET_READ_GENTLE_ON_EOF | - PACKET_READ_CHOMP_NEWLINE); - if (len < 0) -
Re: [PATCH 1/4] cat-file: handle NULL object_context.path
Jeff King wrote: > Commit dc944b65f1 (get_sha1_with_context: dynamically > allocate oc->path, 2017-05-19) changed the rules that > callers must follow for seeing if we parsed a path in the > object name. The rules switched from "check if the oc.path > buffer is empty" to "check if the oc.path pointer is NULL". > But that commit forgot to update some sites in > cat_one_file(), meaning we might dereference a NULL pointer. > > You can see this by making a path-aware request like > --textconv without specifying --path, and giving an object > name that doesn't have a path in it. Like: > > git cat-file --textconv HEAD > > which will reliably segfault. > > Signed-off-by: Jeff King > --- > builtin/cat-file.c | 4 ++-- > t/t8010-cat-file-filters.sh | 5 + > 2 files changed, 7 insertions(+), 2 deletions(-) Yikes. Commit dc944b65f1 even touched this function, so we reviewers have no excuse for not having found it. Technically this changes the behavior of cat-file --path='', but I don't think that matters. Do other GET_SHA1_RECORD_PATH callers need similar treatment? * builtin/grep.c appears to do the right thing (it stores NULL in list, so it passes NULL to grep_object, which calls grep_oid, which calls grep_source_init, which stores NULL for the grep machinery that is able to cope with a NULL). * builtin/log.c is correctly updated as part of the patch. Those are the only other callers. So we're safe. *phew* Reviewed-by: Jonathan Nieder
Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)
Sahil Dua writes: >> >> * sd/branch-copy (2017-06-18) 3 commits >> - branch: add a --copy (-c) option to go with --move (-m) >> - branch: add test for -m renaming multiple config sections >> - config: create a function to format section headers >> >> "git branch" learned "-c/-C" to create and switch to a new branch >> by copying an existing one. >> >> I personally do not think "branch --copy master backup" while on >> "master" that switches to "backup" is a good UI, and I *will* say >> "I told you so" when users complain after we merge this down to >> 'master'. > > Junio, what's up with this one? It's been a long time for this being > cooked in next. Do we have a plan/idea going forward or is it just > waiting for more feedback? Thanks for pinging. I was (in a strange way) hoping that it was just me who felt that a "copy" operation of the current branch that moves me to a new branch is a design mistake, and I was planning to start merging this down to 'next' based on that assumption, but IIRC it turned out that it wasn't just me. During the renewed discussion I somehow got an impression that the concensus was for "copy" to just copy without ever changing what HEAD says, and if an operation that switches to a new branch based on the current branch is needed, the right place to do so is to enhance/extend "checkout -b". My understanding of the next step was for those who are interested in moving this topic forward to update these patches in that direction.
Re: [PATCH v7 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
Ben Peart writes: > Since the fsmonitor data can be trusted and is kept in sync with the > working directory, the only remaining valid uses are those locations > where we don't want to trigger an unneeded refresh_fsmonitor() call. Now that is a lot more assuring ;-) And the exceptions below also make sense to me. Thanks for thinking this through.
Re: [PATCH v1 1/1] test-lint: echo -e (or -E) is not portable
Torsten Bögershausen writes: > Junio, if you wouldn't mind to squash that in, > another fix is needed as well(trailing '-' after '-E') : > > s/'-n', '-e' or '-E-'/'-n', '-e' or '-E' Yup. Thanks all.
Re: [PATCH v3] connect: in ref advertisement, shallows are last
Jonathan Tan writes: > Currently, get_remote_heads() parses the ref advertisement in one loop, > allowing refs and shallow lines to intersperse, despite this not being > allowed by the specification. Refactor get_remote_heads() to use two > loops instead, enforcing that refs come first, and then shallows. > > This also makes it easier to teach get_remote_heads() to interpret other > lines in the ref advertisement, which will be done in a subsequent > patch. Sounds sensible. This still replaces the earlier 1.5? > > Signed-off-by: Jonathan Tan > --- > I sent the wrong version of this patch :-( > > This should be the correct one. A bit less clean because I introduced a > 3rd state, however. > > connect.c | 167 > +- > 1 file changed, 101 insertions(+), 66 deletions(-) > > diff --git a/connect.c b/connect.c > index 49b28b83b..ef6358cfc 100644 > --- a/connect.c > +++ b/connect.c > @@ -107,6 +107,91 @@ static void annotate_refs_with_symref_info(struct ref > *ref) > string_list_clear(&symref, 0); > } > > +/* > + * Read one line of a server's ref advertisement into packet_buffer. > + */ > +static int read_remote_ref(int in, char **src_buf, size_t *src_len, > +int *responded) > +{ > + int len = packet_read(in, src_buf, src_len, > + packet_buffer, sizeof(packet_buffer), > + PACKET_READ_GENTLE_ON_EOF | > + PACKET_READ_CHOMP_NEWLINE); > + const char *arg; > + if (len < 0) > + die_initial_contact(*responded); > + if (len > 4 && skip_prefix(packet_buffer, "ERR ", &arg)) > + die("remote error: %s", arg); > + > + *responded = 1; > + > + return len; > +} > + > +#define EXPECTING_REF_WITH_CAPABILITIES 0 > +#define EXPECTING_REF 1 > +#define EXPECTING_SHALLOW 2 > + > +static int process_ref(int *state, int len, struct ref ***list, > +unsigned int flags, struct oid_array *extra_have) > +{ > + struct object_id old_oid; > + char *name; > + int name_len; > + > + if (len < GIT_SHA1_HEXSZ + 2 || > + get_oid_hex(packet_buffer, &old_oid) || > + packet_buffer[GIT_SHA1_HEXSZ] != ' ') { > + *state = EXPECTING_SHALLOW; > + return 0; > + } > + > + name = packet_buffer + GIT_SHA1_HEXSZ + 1; > + name_len = strlen(name); > + if (*state == EXPECTING_REF_WITH_CAPABILITIES && > + len != name_len + GIT_SHA1_HEXSZ + 1) { > + free(server_capabilities); > + server_capabilities = xstrdup(name + name_len + 1); > + } else if (*state == EXPECTING_REF) { > + if (len != name_len + GIT_SHA1_HEXSZ + 1) > + die("unexpected capabilities after ref name"); > + } > + > + if (extra_have && !strcmp(name, ".have")) { > + oid_array_append(extra_have, &old_oid); > + } else if (!strcmp(name, "capabilities^{}")) { > + if (**list) > + /* cannot coexist with other refs */ > + die("protocol error: unexpected capabilities^{}"); > + /* There should be no more refs; proceed to the next state. */ > + *state = EXPECTING_SHALLOW; > + return 1; > + } else if (check_ref(name, flags)) { > + struct ref *ref = alloc_ref(name); > + oidcpy(&ref->old_oid, &old_oid); > + **list = ref; > + *list = &ref->next; > + } > + *state = EXPECTING_REF; > + return 1; > +} > + > +static int process_shallow(struct oid_array *shallow_points) > +{ > + const char *arg; > + struct object_id old_oid; > + > + if (!skip_prefix(packet_buffer, "shallow ", &arg)) > + return 0; > + > + if (get_oid_hex(arg, &old_oid)) > + die("protocol error: expected shallow sha-1, got '%s'", arg); > + if (!shallow_points) > + die("repository on the other end cannot be shallow"); > + oid_array_append(shallow_points, &old_oid); > + return 1; > +} > + > /* > * Read all the refs from the other end > */ > @@ -123,76 +208,26 @@ struct ref **get_remote_heads(int in, char *src_buf, > size_t src_len, >* willing to talk to us. A hang-up before seeing any >* response does not necessarily mean an ACL problem, though. >*/ > - int saw_response; > - int got_dummy_ref_with_capabilities_declaration = 0; > + int responded = 0; > + int len; > + int state = EXPECTING_REF_WITH_CAPABILITIES; > > *list = NULL; > - for (saw_response = 0; ; saw_response = 1) { > - struct ref *ref; > - struct object_id old_oid; > - char *name; > - int len, name_len; > - char *buffer = packet_buffer; > - const char *arg; > - > - len = packet_read(in, &src_buf, &src_len, > - pack
Re: [PATCH v3] connect: in ref advertisement, shallows are last
Junio C Hamano writes: > Jonathan Tan writes: > >> Currently, get_remote_heads() parses the ref advertisement in one loop, >> allowing refs and shallow lines to intersperse, despite this not being >> allowed by the specification. Refactor get_remote_heads() to use two >> loops instead, enforcing that refs come first, and then shallows. >> >> This also makes it easier to teach get_remote_heads() to interpret other >> lines in the ref advertisement, which will be done in a subsequent >> patch. > > Sounds sensible. This still replaces the earlier 1.5? Well, it does, but it also invalidates how the new "pick the version offered and used" feature is integrated to this callchain. I guess we'd need a new "we are now expecting the version info" state in a patch to replace "connect: teach client to recognize v1 server response". >> +static int process_ref(int *state, int len, struct ref ***list, >> + unsigned int flags, struct oid_array *extra_have) >> +{ >> +struct object_id old_oid; >> +char *name; >> +int name_len; >> + >> +if (len < GIT_SHA1_HEXSZ + 2 || >> +get_oid_hex(packet_buffer, &old_oid) || >> +packet_buffer[GIT_SHA1_HEXSZ] != ' ') { >> +*state = EXPECTING_SHALLOW; >> +return 0; >> +} >> + >> +name = packet_buffer + GIT_SHA1_HEXSZ + 1; >> +name_len = strlen(name); >> +if (*state == EXPECTING_REF_WITH_CAPABILITIES && >> +len != name_len + GIT_SHA1_HEXSZ + 1) { >> +free(server_capabilities); Is this free() still needed? After hitting this block, you'd set *state to EXPECTING_REF before you return, so nobody would set server_capabilities by hitting this block twice, and an attempt to do so will hit the die("unexpected cap") below, no? Or it may be a signal that this patch tightens it too much and breaks older or third-party implementations of the other side that can emit more than one refs with capability advertisement? >> +server_capabilities = xstrdup(name + name_len + 1); >> +} else if (*state == EXPECTING_REF) { >> +if (len != name_len + GIT_SHA1_HEXSZ + 1) >> +die("unexpected capabilities after ref name"); >> +} >> +... >> +} >> +*state = EXPECTING_REF; >> +return 1; >> +} >> @@ -123,76 +208,26 @@ struct ref **get_remote_heads(int in, char *src_buf, >> size_t src_len, >> * willing to talk to us. A hang-up before seeing any >> * response does not necessarily mean an ACL problem, though. >> */ >> -int saw_response; >> -int got_dummy_ref_with_capabilities_declaration = 0; >> +int responded = 0; >> +int len; >> +int state = EXPECTING_REF_WITH_CAPABILITIES; >> >> *list = NULL; >> +while ((len = read_remote_ref(in, &src_buf, &src_len, &responded))) { >> +switch (state) { >> +case EXPECTING_REF_WITH_CAPABILITIES: >> +case EXPECTING_REF: >> +if (process_ref(&state, len, &list, flags, extra_have)) >> +break; >> +/* fallthrough */ OK. This fallthrough is because expecting-ref is really expecting ref or shallow and once we see a shallow, we no longer expect ref and expect only shallow. So from that point of view, an assignment to set state to EXPECTING_SHALLOW could happen here, not inside process_ref. I mention this because in general, passing state around and let it be updated in helper functions would make the state transition harder to follow, not easier, even though refactoring the processing needed in different stages into helper functions like this patch does ought to make it easier to see by shrinking the outer loop (i.e. this one) that controls the whole process. I think if we split process_ref() further into two, then we no longer need to pass &state to that function? We start this loop with "expecting the dummy ref (or other)" state, have a new process_dummy_ref() function check if we got "capabilities^{}" thing and do its thing if that is the case (otherwise we fall through to the call to process_ref(), just like the above code falls through to call process_shallow() when it realizes what it got is not a ref), and after the first call to process_dummy_ref() we'd be in the "expecting ref (or other)" state---and the state transition can happen in this caller, not in process_dummy_ref() or process_ref(). Inside process_dummy_ref() and process_ref(), there would be a call to the same helper that notices and extracts the server capability and stores it (or barfs against the second line that advertises the capability, by noticing that server_capabilities is not NULL). Wouldn't that make the presentation of the state machine cleaner?
Re: [PATCH 2/3] merge-base: return fork-point outside reflog
Michael J Gruber writes: > Also, I'm undecided about about your reflog argument above - if we leave > "--fork-point" to be the current behaviour including Jeff's fix then the > documentation would need an even bigger overhaul, because it's neither > "reflog also" (as claimed in the doc) nor "reflog only" (as in the > original implementation) but "historical tips as inferred from the > current value and the reflog". Even though things like "reflog only", "reflog also", may be something implementors may care about, they are irrelevant implementation details to the intended audience. "The bases that are not in reflogs are ignored" _does_ matter, as it affects the outcome, and that may be a bit too strict a filter (which this series takes us in a good direction to fix) but what the readers need to know is the real-world implications of the choices made at the implementation detail level, and more importantly, what the implementation is trying to compute. It is a documentation bug (with or without these patches) if the current text gives an impression that the code is trying to do anything but "guessing the fork point using historical tips". > In any case, for two modes we need two names for the options. Maybe > --fork-point and --fork-base because in the loose mode, you may get a > "base of a strict fork point"? Perhaps.
Re: [PATCH] Documentation/githooks: mention merge in commit-msg hook
Stefan Beller writes: > The commit-msg hook is invoked by both commit and merge now. > > Reported-by: Kaartic Sivaraam > Signed-off-by: Stefan Beller > --- Thanks for tying the loose end. Very much appreciated. > Documentation/githooks.txt | 9 - > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt > index 623ed1a138..f4e75b9c90 100644 > --- a/Documentation/githooks.txt > +++ b/Documentation/githooks.txt > @@ -127,11 +127,10 @@ help message found in the commented portion of the > commit template. > commit-msg > ~~ > > -This hook is invoked by 'git commit', and can be bypassed > -with the `--no-verify` option. It takes a single parameter, the > -name of the file that holds the proposed commit log message. > -Exiting with a non-zero status causes the 'git commit' to > -abort. > +This hook is invoked by 'git commit' and 'git merge', and can be > +bypassed with the `--no-verify` option. It takes a single parameter, > +the name of the file that holds the proposed commit log message. > +Exiting with a non-zero status causes the command to abort. > > The hook is allowed to edit the message file in place, and can be used > to normalize the message into some project standard format. It
Re: [PATCH v1] travis-ci: fix "skip_branch_tip_with_tag()" string comparison
Lars Schneider writes: > Oh, you're right! > > >> Could you squash in the following? > > @Junio: Can you squash it when you apply the patch? I do not mind and I already did. The patches in the series this patch is fixing up were solely about splitting these scripts out of the YAML file as-is, and I think it was correct to carry these style differences over without adjusting them. But resulting scripts in ci/ are riddled with styleguide deviations, which may want to be cleaned up later. There also are some bash-isms marked with "#!env bash" in them, but I think they are OK because we know we are running them only at a very specific place and the need to make them portable is very small. Thanks.
Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)
Junio C Hamano writes: > My understanding of the next step was for those who are interested > in moving this topic forward to update these patches in that > direction. Well, I am one of those who are interested in moving this topic forward, not because I'm likely to use it, but because the fewer number of topics I have to keep in flight, the easier my work gets. So here is such an update. As the topic is not in 'next' yet, it could also be implemented by replacing patch(es) in the series, but doing it as a follow-up fix made it easier to see what got changed (both in the code and in the tests), so that is how I decided to do this patch. -- >8 -- Subject: [PATCH] branch: fix "copy" to never touch HEAD Probably because "git branch -c A B" piggybacked its implementation on "git branch -m A B", when creating a new branch B by copying the branch A that happens to be the current branch, it also updated HEAD to point at the new branch. This does not match the usual expectation. If I were sitting on a blue chair, and somebody comes and repaints it to red, I would accept ending up sitting on a red chair, but if somebody creates a new red chair, modelling it after the blue chair I am sitting on, I do not expect to be booted off of the blue chair and ending up on sitting on the red one. Let's fix this strange behaviour before it hits 'next'. Those who want to create a new branch and switch to it can do "git checkout B" after creating it by copying the current branch, or if that is so useful to deserve a short-hand way to do so, perhaps extend "git checkout -b B" to copy configurations while creating the new branch B. A "copy" should remain to be "copy", not "copy and sometimes checkout". Signed-off-by: Junio C Hamano --- builtin/branch.c | 9 +++-- t/t3200-branch.sh | 10 +- 2 files changed, 8 insertions(+), 11 deletions(-) diff --git a/builtin/branch.c b/builtin/branch.c index 89f64f4123..e2e3692838 100644 --- a/builtin/branch.c +++ b/builtin/branch.c @@ -506,12 +506,9 @@ static void copy_or_rename_branch(const char *oldname, const char *newname, int oldref.buf + 11); } - if (replace_each_worktree_head_symref(oldref.buf, newref.buf, logmsg.buf)) { - if (copy) - die(_("Branch copied to %s, but HEAD is not updated!"), newname); - else - die(_("Branch renamed to %s, but HEAD is not updated!"), newname); - } + if (!copy && + replace_each_worktree_head_symref(oldref.buf, newref.buf, logmsg.buf)) + die(_("Branch renamed to %s, but HEAD is not updated!"), newname); strbuf_release(&logmsg); diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh index 5d03ad16f6..be9b3784c6 100755 --- a/t/t3200-branch.sh +++ b/t/t3200-branch.sh @@ -422,7 +422,7 @@ test_expect_success 'git branch --copy is a synonym for -c' ' test_cmp expect actual ' -test_expect_success 'git branch -c ee ef should copy and checkout branch ef' ' +test_expect_success 'git branch -c ee ef should copy to create branch ef' ' git checkout -b ee && git reflog exists refs/heads/ee && git config branch.ee.dummy Hello && @@ -431,7 +431,7 @@ test_expect_success 'git branch -c ee ef should copy and checkout branch ef' ' git reflog exists refs/heads/ef && test $(git config branch.ee.dummy) = Hello && test $(git config branch.ef.dummy) = Hello && - test $(git rev-parse --abbrev-ref HEAD) = ef + test $(git rev-parse --abbrev-ref HEAD) = ee ' test_expect_success 'git branch -c f/f g/g should work' ' @@ -494,12 +494,12 @@ test_expect_success 'git branch -C c1 c2 should succeed when c1 is checked out' git checkout -b c1 && git branch c2 && git branch -C c1 c2 && - test $(git rev-parse --abbrev-ref HEAD) = c2 + test $(git rev-parse --abbrev-ref HEAD) = c1 ' -test_expect_success 'git branch -C c1 c2 should add entries to .git/logs/HEAD' ' +test_expect_success 'git branch -C c1 c2 should never touch HEAD' ' msg="Branch: copied refs/heads/c1 to refs/heads/c2" && - grep "$msg$" .git/logs/HEAD + ! grep "$msg$" .git/logs/HEAD ' test_expect_success 'git branch -C master should work when master is checked out' ' -- 2.14.1-907-g5aa63875cf
Re: [PATCH 2/4] test-line-buffer: simplify command parsing
Jeff King writes: > However, each case arm of the switch falls through to the > one below it. This is pointless (we know that a command > starting with 'b' does not need to check any of the commands > in the 'c' block), and it makes gcc's -Wimplicit-fallthrough > complain. Wow, this is an embarassment. The code tries to cleverly optimize it, and then fails miserably. The updated one reads good. Thanks.
KDevelop developers obviously don't use git :(
https://bugs.kde.org/show_bug.cgi?id=379219 $ git push minor-rant-warning KDevelop has a despicable habit of performing `git status --porcelain` whenever it damn well pleases ... or rather when it sees an alteration. This this breaks git rebase because it grabs the $*()&@#$ lock file, often leading to catastrophic heart failure and sometimes really forcing you to git rebase --abort and start over (at least, that's been my solution when git rebase --continue complained that there was nothing to commit and maybe I had forgotten to add something). I have argued that they should at least verify that there isn't a .git/rebase-merge directory present before doing that, but if I want to prevent a heart attack, it looks like I'm going to have to fix this one myself. (Sorry for the rant!) $ git reset minor-rant-warning~1 So first I need to inquire if there is a current mechanism to say "git status, but please fail if you're busy with a non-atomic operation". If there is not one, then it would seem that there should be and how do I go about requesting it? Thanks, Daniel
Re: [PATCH 4/4] ALLOC_GROW: avoid -Wsign-compare warnings
Ramsay Jones writes: > Signed-off-by: Ramsay Jones > --- > builtin/pack-objects.c | 4 ++-- > config.c | 2 +- > diff.c | 2 +- > line-log.c | 18 +- > line-log.h | 2 +- > revision.c | 2 +- > tree-walk.c| 3 +-- > 7 files changed, 16 insertions(+), 17 deletions(-) Thanks. I did not spot any questionable conversion (e.g. "something that used to be signed was because it wanted to store -1 as a sentinel" would be broken if we just change that to unsigned) by going over the places these variables and fields are actually used. A review of a patch like this involves reading through 10x more lines than we see in the above diffstat, and producing it would most likely have taken the same amount of effort, at least. Very much appreciated. > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index a57b4f058..a6ee653bf 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -2563,8 +2563,8 @@ struct in_pack_object { > }; > > struct in_pack { > - int alloc; > - int nr; > + unsigned int alloc; > + unsigned int nr; This is a bit questionable ;-) but it is something I can locally tweak easily.
Re: KDevelop developers obviously don't use git :(
On Thu, Sep 21, 2017 at 11:03:53PM -0500, Daniel Santos wrote: > So first I need to inquire if there is a current mechanism to say "git > status, but please fail if you're busy with a non-atomic operation". If > there is not one, then it would seem that there should be and how do I > go about requesting it? You might be interested in this patch from yesterday: https://public-inbox.org/git/20170921043214.pyhdsrpy4omy5...@sigill.intra.peff.net/ -Peff
Re: [RFC v2] refs: strip out not allowed flags from ref_transaction_update
Michael Haggerty writes: > I wouldn't worry too much about this anomaly. `REF_ISPRUNING` is an ugly > internal kludge, but allowing it in the mask doesn't make anything worse. > >> refs.c | 2 ++ >> refs.h | 8 >> 2 files changed, 10 insertions(+) >> >> diff --git a/refs.c b/refs.c >> index ba22f4acef..fad61be1da 100644 >> --- a/refs.c >> +++ b/refs.c >> @@ -921,6 +921,8 @@ int ref_transaction_update(struct ref_transaction >> *transaction, >> return -1; >> } >> >> +flags &= REF_TRANSACTION_UPDATE_ALLOWED_FLAGS; >> + > > I would advocate considering it a bug if the caller passes in options > that we are going to ignore anyway: > > if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS) > BUG("illegal flags %x in ref_transaction_update", flags); It sounds like a sensible thing to do. Thanks.
Re: [PATCH] git: add --no-optional-locks option
On Thu, Sep 21, 2017 at 08:25:50PM +0200, Johannes Sixt wrote: > > +`GIT_OPTIONAL_LOCKS`:: > > + If set to `0`, Git will avoid performing any operations which > > + require taking a lock and which are not required to complete the > > + requested operation. For example, this will prevent `git status` > > + from refreshing the index as a side effect. This is useful for > > + processes running in the background which do not want to cause > > + lock contention with other operations on the repository. > > + Defaults to `1`. > > I don't think we should pass this environment variable to remote > repositories. It should be listed in local_repo_env[] in environment.c. I'm not sure I agree. This is really about the context in which the command is executing, not anything about the particular repository you're operating on. For fetch/push operations that touch a remote, I doubt it would matter either way (and anyway, those often cross network boundaries that don't propagate environment variables anyway). But imagine that "git status" learns to recurse into submodules and run "git status" inside them. Surely we would want the submodule repos to also avoid taking any unnecessary locks? -Peff
Re: KDevelop developers obviously don't use git :(
Daniel Santos writes: > So first I need to inquire if there is a current mechanism to say "git > status, but please fail if you're busy with a non-atomic operation". Reading this list sometimes makes me wonder if there is some higher intelligence telling independent souls to come up with the same wish at the same time. https://public-inbox.org/git/20170921050835.mrbgx2zryy3ju...@sigill.intra.peff.net/
Re: [PATCH v3 0/4] filter-branch: support for incremental update + fix for ancient tag format
Ian Campbell writes: > This is the third version of my patches to add incremental support to > git-filter-branch. Since the last time I have replaced `git mktag -- > allow-missing-tagger` with `git hash-object -t tag -w --stdin`. > > I've force pushed to [1] (Travis is still running) and have set off the > process of re-rewriting the devicetree tree from scratch (a multi-day > affair) to validate (it's looking good). Thanks.
Re: KDevelop developers obviously don't use git :(
On 09/21/2017 11:38 PM, Junio C Hamano wrote: > Daniel Santos writes: > >> So first I need to inquire if there is a current mechanism to say "git >> status, but please fail if you're busy with a non-atomic operation". > Reading this list sometimes makes me wonder if there is some higher > intelligence telling independent souls to come up with the same wish > at the same time. > > https://public-inbox.org/git/20170921050835.mrbgx2zryy3ju...@sigill.intra.peff.net/ LMAO! This has been irritating me for some time, but I had to abort a somewhat complex interactive rebase earlier and it finally drove me to action. I'm juggling so many projects right now that I really don't want to fix this myself, but. grrr! and raaarh!, etc. Daniel
[PATCH 4/3] branch: fix "copy" to never touch HEAD
When creating a new branch B by copying the branch A that happens to be the current branch, it also updates HEAD to point at the new branch. It probably was made this way because "git branch -c A B" piggybacked its implementation on "git branch -m A B", This does not match the usual expectation. If I were sitting on a blue chair, and somebody comes and repaints it to red, I would accept ending up sitting on a chair that is now red (I am also OK to stand, instead, as there no longer is my favourite blue chair). But if somebody creates a new red chair, modelling it after the blue chair I am sitting on, I do not expect to be booted off of the blue chair and ending up on sitting on the new red one. Let's fix this before it hits 'next'. Those who want to create a new branch and switch to it can do "git checkout B" after doing a "git branch -c B", and if that operation is so useful and deserves a short-hand way to do so, perhaps extend "git checkout -b B" to copy configurations while creating the new branch B. Signed-off-by: Junio C Hamano --- * Let's send an updated one as a follow-up to the discussion thread that it is a follow-up to. The patch in this message is the same as the one I previously sent; the proposed log message has been updated somewhat. builtin/branch.c | 9 +++-- t/t3200-branch.sh | 10 +- 2 files changed, 8 insertions(+), 11 deletions(-) diff --git a/builtin/branch.c b/builtin/branch.c index 89f64f4123..e2e3692838 100644 --- a/builtin/branch.c +++ b/builtin/branch.c @@ -506,12 +506,9 @@ static void copy_or_rename_branch(const char *oldname, const char *newname, int oldref.buf + 11); } - if (replace_each_worktree_head_symref(oldref.buf, newref.buf, logmsg.buf)) { - if (copy) - die(_("Branch copied to %s, but HEAD is not updated!"), newname); - else - die(_("Branch renamed to %s, but HEAD is not updated!"), newname); - } + if (!copy && + replace_each_worktree_head_symref(oldref.buf, newref.buf, logmsg.buf)) + die(_("Branch renamed to %s, but HEAD is not updated!"), newname); strbuf_release(&logmsg); diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh index 5d03ad16f6..be9b3784c6 100755 --- a/t/t3200-branch.sh +++ b/t/t3200-branch.sh @@ -422,7 +422,7 @@ test_expect_success 'git branch --copy is a synonym for -c' ' test_cmp expect actual ' -test_expect_success 'git branch -c ee ef should copy and checkout branch ef' ' +test_expect_success 'git branch -c ee ef should copy to create branch ef' ' git checkout -b ee && git reflog exists refs/heads/ee && git config branch.ee.dummy Hello && @@ -431,7 +431,7 @@ test_expect_success 'git branch -c ee ef should copy and checkout branch ef' ' git reflog exists refs/heads/ef && test $(git config branch.ee.dummy) = Hello && test $(git config branch.ef.dummy) = Hello && - test $(git rev-parse --abbrev-ref HEAD) = ef + test $(git rev-parse --abbrev-ref HEAD) = ee ' test_expect_success 'git branch -c f/f g/g should work' ' @@ -494,12 +494,12 @@ test_expect_success 'git branch -C c1 c2 should succeed when c1 is checked out' git checkout -b c1 && git branch c2 && git branch -C c1 c2 && - test $(git rev-parse --abbrev-ref HEAD) = c2 + test $(git rev-parse --abbrev-ref HEAD) = c1 ' -test_expect_success 'git branch -C c1 c2 should add entries to .git/logs/HEAD' ' +test_expect_success 'git branch -C c1 c2 should never touch HEAD' ' msg="Branch: copied refs/heads/c1 to refs/heads/c2" && - grep "$msg$" .git/logs/HEAD + ! grep "$msg$" .git/logs/HEAD ' test_expect_success 'git branch -C master should work when master is checked out' ' -- 2.14.1-929-g25eae544e9
Re: [PATCH 2/4] commit-slab.h: avoid -Wsign-compare warnings
On Thu, Sep 21, 2017 at 05:47:36PM +0100, Ramsay Jones wrote: > diff --git a/commit-slab.h b/commit-slab.h > index 333d81e37..dcaab8ca0 100644 > --- a/commit-slab.h > +++ b/commit-slab.h > @@ -78,7 +78,7 @@ static MAYBE_UNUSED void init_ ##slabname(struct slabname > *s) \ > \ > static MAYBE_UNUSED void clear_ ##slabname(struct slabname *s) > \ > {\ > - int i; \ > + unsigned int i; \ > for (i = 0; i < s->slab_count; i++) \ > free(s->slab[i]); \ > s->slab_count = 0; \ > @@ -89,13 +89,13 @@ static MAYBE_UNUSED elemtype *slabname## _at_peek(struct > slabname *s, \ > const struct commit *c, \ > int add_if_missing) \ > {\ > - int nth_slab, nth_slot; \ > + unsigned int nth_slab, nth_slot;\ I have a feeling that in the long run these should all be size_t, but it's probably pretty unlikely to overflow in practice. At any rate, the slab index itself is an unsigned, so it probably makes sense to match that for now. -Peff
Re: [PATCH 3/4] cache.h: hex2chr() - avoid -Wsign-compare warnings
On Thu, Sep 21, 2017 at 05:48:38PM +0100, Ramsay Jones wrote: > diff --git a/cache.h b/cache.h > index a916bc79e..a0e3e362c 100644 > --- a/cache.h > +++ b/cache.h > @@ -1243,8 +1243,8 @@ static inline unsigned int hexval(unsigned char c) > */ > static inline int hex2chr(const char *s) > { > - int val = hexval(s[0]); > - return (val < 0) ? val : (val << 4) | hexval(s[1]); > + unsigned int val = hexval(s[0]); > + return (val & ~0xf) ? val : (val << 4) | hexval(s[1]); > } Ironically, the unsigned return from hexval() comes from internally converting the signed char in hexval_table. And then we again return it as a signed int from hex2chr(). Would it make sense to return a signed int from hexval()? That would make hex2chr just work as it tries to above. I admit that shifting signed values is a little funny, but it should be fine here since we know they're no larger than 8 bits in the first place. As an aside, I also see some uses of hexval() that don't appear to be quite as rigorous in checking for invalid characters. A few unconditionally shift the first nibble and assume that there will still be high bits set. I think that's generally true for twos-complement negative numbers, but isn't shifting off the left side of a signed integer undefined behavior? And mailinfo's decode_q_segment() does not seem to check for errors at all. Handling those is getting far off your original patch, but I'm having trouble figuring out if it's saner for us to consistently stick to all-signed or all-unsigned here. -Peff
Re: [PATCH] git: add --no-optional-locks option
On 09/20/2017 11:32 PM, Jeff King wrote: > Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not* > to lock the index and update it, 2016-08-12). Folks working on GitHub > Desktop complained to me that it's only available on Windows. :) > > I expanded the scope a bit to let us give the same treatment to more > commands in the long run. I'd also be OK with just cherry-picking your > patch to non-Windows Git if you don't find my reasoning below > compelling. But I think we need _something_ like this, as the other > solutions I could come up with don't seem very promising. > > -Peff > > -- >8 -- > Some tools like IDEs or fancy editors may periodically run > commands like "git status" in the background to keep track > of the state of the repository. Some of these commands may > refresh the index and write out the result in an > opportunistic way: if they can get the index lock, then they > update the on-disk index with any updates they find. And if > not, then their in-core refresh is lost and just has to be > recomputed by the next caller. > > But taking the index lock may conflict with other operations > in the repository. Especially ones that the user is doing > themselves, which _aren't_ opportunistic. In other words, > "git status" knows how to back off when somebody else is > holding the lock, but other commands don't know that status > would be happy to drop the lock if somebody else wanted it. Interestingly, this usually slaps me when performing an _interactive_ rebase. It occurred to me that if I'm performing an interaction operation, it doesn't seem unreasonable for git wait up to 125ms or so for the lock and then prompting the user to ask if they want to continue waiting for the lock. > There are a couple possible solutions: > > 1. Have some kind of "pseudo-lock" that allows other > commands to tell status that they want the lock. > > This is likely to be complicated and error-prone to > implement (and maybe even impossible with just > dotlocks to work from, as it requires some > inter-process communication). > > 2. Avoid background runs of commands like "git status" > that want to do opportunistic updates, preferring > instead plumbing like diff-files, etc. > > This is awkward for a couple of reasons. One is that > "status --porcelain" reports a lot more about the > repository state than is available from individual > plumbing commands. And two is that we actually _do_ > want to see the refreshed index. We just don't want to > take a lock or write out the result. Whereas commands > like diff-files expect us to refresh the index > separately and write it to disk so that they can depend > on the result. But that write is exactly what we're > trying to avoid. > > 3. Ask "status" not to lock or write the index. > > This is easy to implement. The big downside is that any > work done in refreshing the index for such a call is > lost when the process exits. So a background process > may end up re-hashing a changed file multiple times > until the user runs a command that does an index > refresh themselves. That is not necessarily the case. I don't actually know git on the inside, but I would ask you to consider a read-write lock and a hybrid of one and three. I don't know what dotlocks are, but I'm certain that you can implement a rw lock using lock files and no other IPC, although it does increase the complexity. The way this works is that `git status' acquires a read lock and does its thing. If it has real changes, instead of discarding them it attempts to upgrade to a write lock. If that fails, you throw it away, otherwise you write them and release. In order to implement rw locks with only lock files, "off the cuff" I say you have a single "lock list" file that should never be deleted and a "lock lock" file that is held in order to read or modify the list. The format of the lock list would have a pair of 32-bit wrapping modification counts (or versions) at the top -- one for modifications to the lock list its self and another for modifications to the underlying data (i.e., the number of times a write lock has been acquired). This header is followed by entries something like this: 'r' if waiting for a read lock 'R' if actively reading 'w' if waiting for write lock 'W' if actively writing The pid If active, the version of the data at the time lock acquired or zero. Time began waiting or time lock acquired An operation of 'r' or 'w' means that you are waiting and upper case means that you are active. is the version of the data at the time the lock became active and writers increment it when they acquire a lock. You wait with file alteration notification on the lock list (if there is any doubt based upon timestamp precision then you can examine the lock list version). When you want to read o