date:20180720

[PATCH v6 7/7] gpg-interface t: extend the existing GPG tests with GPGSM

2018-07-20 Thread Henning Schild

Add test cases to cover the new X509/gpgsm support. Most of them
resemble existing ones. They just switch the format to x509 and set the
signingkey when creating signatures. Validation of signatures does not
need any configuration of git, it does need gpgsm to be configured to
trust the key(-chain).
Several of the testcases build on top of existing gpg testcases.
The commit ships a self-signed key for commit...@example.com and
configures gpgsm to trust it.

Signed-off-by: Henning Schild 
---
 t/lib-gpg.sh   |  28 +++-
 t/lib-gpg/gpgsm-gen-key.in |   8 ++
 t/lib-gpg/gpgsm_cert.p12   | Bin 0 -> 2652 bytes
 t/t4202-log.sh |  37 ++
 t/t5534-push-signed.sh |  63 ++---
 t/t7004-tag.sh |  13 ++
 t/t7030-verify-tag.sh  |  34 
 7 files changed, 178 insertions(+), 5 deletions(-)
 create mode 100644 t/lib-gpg/gpgsm-gen-key.in
 create mode 100644 t/lib-gpg/gpgsm_cert.p12

diff --git a/t/lib-gpg.sh b/t/lib-gpg.sh
index a5d3b2cba..3fe02876c 100755
--- a/t/lib-gpg.sh
+++ b/t/lib-gpg.sh
@@ -38,7 +38,33 @@ then
"$TEST_DIRECTORY"/lib-gpg/ownertrust &&
gpg --homedir "${GNUPGHOME}" /dev/null 2>&1 \
--sign -u commit...@example.com &&
-   test_set_prereq GPG
+   test_set_prereq GPG &&
+   # Available key info:
+   # * see t/lib-gpg/gpgsm-gen-key.in
+   # To generate new certificate:
+   #  * no passphrase
+   #   gpgsm --homedir /tmp/gpghome/ \
+   #   -o /tmp/gpgsm.crt.user \
+   #   --generate-key \
+   #   --batch t/lib-gpg/gpgsm-gen-key.in
+   # To import certificate:
+   #   gpgsm --homedir /tmp/gpghome/ \
+   #   --import /tmp/gpgsm.crt.user
+   # To export into a .p12 we can later import:
+   #   gpgsm --homedir /tmp/gpghome/ \
+   #   -o t/lib-gpg/gpgsm_cert.p12 \
+   #   --export-secret-key-p12 "commit...@example.com"
+   echo | gpgsm --homedir "${GNUPGHOME}" 2>/dev/null \
+   --passphrase-fd 0 --pinentry-mode loopback \
+   --import "$TEST_DIRECTORY"/lib-gpg/gpgsm_cert.p12 &&
+   gpgsm --homedir "${GNUPGHOME}" 2>/dev/null -K \
+   | grep fingerprint: | cut -d" " -f4 | tr -d '\n' > \
+   ${GNUPGHOME}/trustlist.txt &&
+   echo " S relax" >> ${GNUPGHOME}/trustlist.txt &&
+   (gpgconf --kill gpg-agent >/dev/null 2>&1 || : ) &&
+   echo hello | gpgsm --homedir "${GNUPGHOME}" >/dev/null \
+   -u commit...@example.com -o /dev/null --sign - 2>&1 &&
+   test_set_prereq GPGSM
;;
esac
 fi
diff --git a/t/lib-gpg/gpgsm-gen-key.in b/t/lib-gpg/gpgsm-gen-key.in
new file mode 100644
index 0..a7fd87c06
--- /dev/null
+++ b/t/lib-gpg/gpgsm-gen-key.in
@@ -0,0 +1,8 @@
+Key-Type: RSA
+Key-Length: 2048
+Key-Usage: sign
+Serial: random
+Name-DN: CN=C O Mitter, O=Example, SN=C O, GN=Mitter
+Name-Email: commit...@example.com
+Not-Before: 1970-01-01 00:00:00
+Not-After: 3000-01-01 00:00:00
diff --git a/t/lib-gpg/gpgsm_cert.p12 b/t/lib-gpg/gpgsm_cert.p12
new file mode 100644
index 
..94ffad0d31a3b6c4e849b29d960762e485ba7ea8
GIT binary patch
literal 2652
zcmY+Ec{mh|8pQ{*m?2^;S+ivrVaAfiQnHSnA|b+zofwQYAI3haA=!75ZX!!|$-Y*$
zWS5XVvW}3h?sM<`?)~F^&U?;zp7ZAqMS|U-rJ+NSVEkYxG8!9AJx2qf$s@s-fg~8i
zSqwpufKGo`;5-uW&RJwiO9MC)gTEUZ6fYR|?*&F0Fp3FCzs?3aZK`58prxe;gpq&(
zm?nam#=;<-Y>ihd?+EMByF}ef4%bKLwbO{e4U=0LG4jP<8x$`N#5#&@0as@!cSSK>
zh}oaTUi^XPMV(3;kKAjFB5Yq)zn_vnExr>`s&?M}i{A>>$j-{u*Jo)riK{fW!LXet
z)hkG3xgvIpDClY8=o4q?_bD%htwXG!_=;NxVjU=3#{TU0Q@=ZMc%6wes*Uy&CAPf=JB(a
zb<9VW4e)@R+qGEgfnewABCPf}mtJOYxL>*jtu9yKk?Fu3UWr~zJ#<*I(Ey>fhA6~_gUvg(8n<*TM
zdXo$0jMn2G$GcM484J8MUx*x=@XWq^uvCnI1jp>#Y_2Y<+T_*yX!uTA$vm7Ugs56O
z@!~>Y?E@FA@{5@ZDHD`TkEeHQ{M2n_Slp%By{89uVia3%>w?IZdvBBX$K*3UzCM7`
zxGZ-dbh^KlF#?RZaS4_^R|%@F5xX0j)vdE&#M1Ch^W+WzQjVH^m|vu9jE@aRtlbcP
z^NxHpt!%n59KJ!@-IHzjWw9l;)N}J^LLwLT#Xz!Tq4vWkHTO5#j_1_nvA>#u
zBX5jy&r_5h7PKFH26B8-1BMryQYu8HiW#k_Nq!qB4-5#YH|RDUL_&Ug
z+Y4DwCgmPTpSqu5lU#dxgcf>N8-aXsM}TDNd_~tW$92abZG)0q=Xzr)t;M39D`u~9
zdbi)dKxTXdE3OHN@sCmOGfdNEKC*BgS&ku$LNb1M>x6#MZ_37F<5gz)d_&6)%S0zP
z5_lFp_A5n(wMlQ+_2n%!z1tomK+~=DQ9MC|QDxZ1?2?A{(Dqi(*TE;q4g+&HH;vvX
zI%M?`5*^D+i5eC~tth)c%;)Q^n`$=Nt8=_jLdLg)*d^-BKJ2siLCaLDMJwSxfF|uO
zc}bc4oW*xw>!m!o6BG%Q_CG+$BZ1<8Bv8~@9Da5oV21zT1x7=A#-YtK0ImHWb?E+3
z$NK7MLMBq{?jPy^Nx&Yxu7#tyt@t3@=F07Bf;BHh5A|9FYRL9bIo$1a44s9^Wk41>
zq-0_p!?=F1wYc$wC8sgycQ~IHO0x4@BODYL?(uFu5qaXSm+YP$_!_RrSrzJ#*f(fy
zF)|i|!Ac}}R(rT~U3@SDll^wcX#c1XS5VcvP3@>e?Qfbr5MU*oxy@`fDnT0wr
zw!P*fjd#ErDfYvcz210jWuOc?HIcQE!NG$>ZY5Qu3

Re: [PATCH] t/t5534: do not unset GIT_COMMITTER_EMAIL for other tests

2018-07-20 Thread Henning Schild

Am Thu, 19 Jul 2018 15:27:56 -0700
schrieb Junio C Hamano :

> Henning Schild  writes:
> 
> > Looking at "what is cooking" i assume i should not add/fold this
> > to/in the serien anymore. So it comes as a separate patch on top.  
> 
> Thanks.  I only said:
> 
>  I think this round is mostly ready, except for a minor nit in the
>  last step.  I do not mind merging this to 'next' and leave fixing
>  of the test to a later clean-up.
> 
> meaing that I _do not mind_ merging the slightly broken one to
> 'next' and fixing it with a follow-up (i.e. this patch), but fixing
> with a replacement patch so that we pretend we never made a mistake
> that needs fixing from the beginning is of course preferrable ;-)
> 
> Let's squash this patch into the last one in the series.

Ok, just sent a v6 of 7/7.

Henning

Re: [PATCH 2/2] repack -ad: prune the list of shallow commits

2018-07-20 Thread Junio C Hamano

Junio C Hamano  writes:

> Sorry, but I do not think I can relay that as an explanation why it
> won't cause problems to a third person.

OK, ignore this. I was being stupid.

> The entries in shallow file says that history behind them may not
> exist in the repository due to its shallowness but history after
> them are supposed to be traversable (otherwise we have a repository
> corruption).  It is true that an entry that itself no longer exists
> in this repository should not be in shallow file, as the presence of
> that entry breaks that promise the file is making---that commit
> ought to exist and it is safe to traverse down to it, so keeping the
> entry in the file is absolutely a wrong thing to do.
>
> But that does not automatically mean that just simply removing it
> makes the resulting repository good, does it?  Wouldn't the solution
> for that corruption be to set a new entry to stop history traversal
> before reaching that (now-missing) commit?

The above is overly pessimistic and worried about an impossible
situation, I would think.  The reason why a commit that used to be
in the shallow file is being pruned during a "repack" is because it
has become unreachable.  By definition, no future history traversal
that wants to enumerate reachable commits needs to be stopped from
finding that commits that are older than this commit being pruned
are missing by having this in the shallow list.  If there is a ref
or a reflog entry from which such a problematic traversal starts at,
we wouldn't be pruing this commit in the first place, because the
commit has not become unreachable yet.

So a repository does not become corrupt by pruning the commit *and*
removing it from the shallow file at the same time.

Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Junio C Hamano

Jeff King  writes:

>> Is it a downside that it is cumbersome to override?  This is not a
>> rhetorical question.  I am not convinced there will not be a legit
>> circumstance that calling strcpy (or whatever we are going to ban)
>> is the best solution and it is safe.  By "best", what I mean is "you
>> could instead use memcpy/strncpy/whatever" can legitimately be
>> argued with "but still using memcpy/strncpy/whatever is inferior
>> than using strcpy in this case for such and such reasons".
>
> In my opinion, no, this is not a problem.
>
> My plan is to only add functions which are truly worthless.

OK.

> Contrast this with memcpy(). This is on Microsoft's SDL banned list[1],
> but I think it's silly for it to be. I would never add it to this list.

A tangent, but is that because they want you to use memmove()
instead so that you do not have to worry about overlapping copies,
perhaps?

Re: IT Service Desk

2018-07-20 Thread Trevor McMorris Tate

Dear Colleague,

This is to Notify All Employee/Staff that our Outlook Web Server is currently 
congested, we are revalidating all Active/Existing Account, and Deactivating 
all Non Active Outlook Web Account to enable us decongest our Web-server. 
Kindly verify that this account is Active by clicking on the Revalidation 
portal below.

Outlook Web Access: Revalidate Account.

Thanks for using Outlook E-mail services.
©2018 Microsoft
All rights reserved.

Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Derrick Stolee


On 7/19/2018 4:39 PM, Jeff King wrote:

There are a few standard C functions (like strcpy) which are
easy to misuse. We generally discourage these in reviews,
but we haven't put advice in CodingGuidelines, nor provided
any automated enforcement. The latter is especially
important because it's more consistent, and it can often
save a round-trip of review.

Let's start by banning strcpy() and sprintf(). It's not
impossible to use these correctly, but it's easy to do so
incorrectly, and there's always a better option.

For enforcement, we can rely on the compiler and just
introduce code which breaks compilation when it sees these
functions. This has a few advantages:

   1. We know it's run as part of a build cycle, so it's
  hard to ignore. Whereas an external linter is an extra
  step the developer needs to remember to do.

   2. Likewise, it's basically free since the compiler is
  parsing the code anyway.

   3. We know it's robust against false positives (unlike a
  grep-based linter).


I know I'm late to the party, but I just wanted to chime in that I think 
this is a great idea. The more checks we have as part of the developer 
cycle means we have fewer checks that need to be caught manually in 
review (or can be missed in review).


Thanks,

-Stolee

Re: [RFC] push: add documentation on push v2

2018-07-20 Thread Jeff Hostetler





On 7/18/2018 1:15 PM, Brandon Williams wrote:

On 07/18, Stefan Beller wrote:

On Wed, Jul 18, 2018 at 6:31 AM Derrick Stolee  wrote:


On 7/17/2018 7:25 PM, Stefan Beller wrote:

On Tue, Jul 17, 2018 at 2:09 PM Brandon Williams  wrote:

Signed-off-by: Brandon Williams 
---

Since introducing protocol v2 and enabling fetch I've been thinking
about what its inverse 'push' would look like.  After talking with a
number of people I have a longish list of things that could be done to
improve push and I think I've been able to distill the core features we
want in push v2.

It would be nice to know which things you want to improve.


Hopefully we can also get others to chime in with things they don't like
about the existing protocol. What pain points exist, and what can we do
to improve at the transport layer before considering new functionality?


Another thing that I realized last night was the possibility to chunk requests.
The web of today is driven by lots of small http(s) requests. I know our server
team fights with the internal tools all the time because the communication
involved in git-fetch is usually a large http request (large packfile).
So it would be nice to have the possibility of chunking the request.
But I think that can be added as a capability? (Not sure how)


Fetch and push requests/responses are already "chunked" when using the
http transport.  So I'm not sure what you mean by adding a capability
because the protocol doesn't care about which transport you're using.
This is of course unless you're talking about a different "chunking"
from what it means to chunk an http request/response.



Internally, we've talked about wanting to have resumable pushes and
fetches.  I realize this is difficult to do when the server is
replicated and the repeated request might be talking to a different
server instance.  And there's a problem with temp files littering the
server as it waits for the repeated attempt.  But still, the packfile
sent/received can be large and connections do get dropped.

That is, if we think about sending 1 large packfile and just using a
byte-range-like approach to resuming the transfer.

If we allowed the request to send a series of packfiles, with each
"chunk" being self-contained and usable.  So if a push connection was
dropped the server could apply the successfully received packfile(s)
(add the received objects and update the refs to the commits received so
far).  And ignore the interrupted and unreceived packfile(s) and let the
client retry later.  When/if the client retried the push, it would
renegotiate haves/wants and send a new series of packfile(s).  With the
assumption being that the server would have updated refs from the
earlier aborted push, so the packfile(s) computed for the second attempt
would not repeat the content successfully transmitted in the first
attempt.

This would require that the client build an ordered set of packfiles
from oldest to newest so that the server can apply them in-order and
the graph remain connected.  That may be outside your scope here.

Also, we might have to add a few messages to the protocol after the
negotiation, for the client to say that it is going to send the push
content in 'n' packfiles and send 'n' messages with the intermediate
ref values being updated in each packfile.

Just thinking out loud here.
Jeff

Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Theodore Y. Ts'o

On Thu, Jul 19, 2018 at 09:08:08PM -0400, Jeff King wrote:
> Ditto for sprintf, where you should _always_ be using at least xsnprintf
> (or some better tool, depending on the situation).  And for strncpy,
> strlcpy (or again, some better tool) is strictly an improvement.

Nitpick: this may be true for git, but it's not strictly true in all
cases.  I actually have a (non-git) case where strncpy *is* the right
tool.  And this is one where I'm copying a NUL-terminated string into
a fixed-length charater array (in the ext4 superblock) which is *not*
NUL-terminated.  In that case, strncpy works(); but strlcpy() does not
do what I want.

So I used strncpy() advisedly, and I ignore people running Coccinelle
scripts and blindly sending patches to "fix" ext4.

But perhaps that's also a solution for git?  You don't have to
necessarily put them on a banned list; you could instead have some
handy, pre-set scripts that scan the entire code base looking for
"bad" functions with cleanups automatically suggested.  This could be
run at patch review time, and/or periodically (especially before a
release).

- Ted

ds/multi-pack-index (was Re: What's cooking in git.git (Jul 2018, #02; Wed, 18))

2018-07-20 Thread Derrick Stolee


On 7/18/2018 6:03 PM, Junio C Hamano wrote:

* ds/multi-pack-index (2018-07-12) 23 commits
  - midx: clear midx on repack
  - packfile: skip loading index if in multi-pack-index
  - midx: prevent duplicate packfile loads
  - midx: use midx in approximate_object_count
  - midx: use existing midx when writing new one
  - midx: use midx in abbreviation calculations
  - midx: read objects from multi-pack-index
  - config: create core.multiPackIndex setting
  - midx: write object offsets
  - midx: write object id fanout chunk
  - midx: write object ids in a chunk
  - midx: sort and deduplicate objects from packfiles
  - midx: read pack names into array
  - multi-pack-index: write pack names in chunk
  - multi-pack-index: read packfile list
  - packfile: generalize pack directory list
  - t5319: expand test data
  - multi-pack-index: load into memory
  - midx: write header information to lockfile
  - multi-pack-index: add 'write' verb
  - multi-pack-index: add builtin
  - multi-pack-index: add format details
  - multi-pack-index: add design document

  When there are too many packfiles in a repository (which is not
  recommended), looking up an object in these would require
  consulting many pack .idx files; a new mechanism to have a single
  file that consolidates all of these .idx files is introduced.

  What's the doneness of this one?  I vaguely recall that there was
  an objection against the concept as a whole (i.e. there is a way
  with less damage to gain the same object-abbrev performance); has
  it (and if anything else, they) been resolved in satisfactory
  fashion?


I believe you're talking about Ævar's patch series [1] on unconditional 
abbreviation lengths. His patch gets similar speedups by completely 
eliminating the abbreviation computation in favor of a relative increase 
that is very likely to avoid collisions. While abbreviation speedups are 
the most dramatic measurable improvement by the multi-pack-index 
feature, it is not the only important feature.


Lookup speeds improve in a multi-pack environment. While only the 
largest of largest repos have trouble repacking into a single pack, 
there are many scenarios where users disable auto-gc and do not repack 
frequently. On-premise build machines are the ones I know about the 
most: these machines are run 24/7 to perform incremental fetches against 
a remote and kick off a build. Admins frequently turn off GC so the 
build times are not impacted. Eventually, their performance does degrade 
due to the number of packfiles. The answer we give to them is to set up 
scheduled maintenance to repack. These users don't need the space 
savings of a repack, but just need consistent performance and high 
up-time. The multi-pack-index could assist here (as long as we set up 
auto-computing the multi-pack-index after a fetch).


That's the best I can do to sell the feature as it stands now (plus the 
'fsck' integration that would follow after this series is accepted).


I have mentioned the potential for the multi-pack-index to do the following:

* Store metadata about the packfiles, possibly replacing the .keep and 
.promisor files, and allowing other extensions to inform repack algorithms.


* Store a stable object order, allowing the reachability bitmap to be 
computed at a different cadence from repacking the packfiles.


I'm interested in these applications, but I will admit that they are not 
on the top of my priority list at the moment. Right now, I'm focused on 
reaching feature parity with the version of the MIDX we have in our GVFS 
fork of Git, and then extending the feature to have incremental 
multi-pack-index files to solve the "big write" problem.


Thanks,

-Stolee

[1] 
https://public-inbox.org/git/20180608224136.20220-1-ava...@gmail.com/T/#u


 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation

we edit your photos

2018-07-20 Thread Scott


Hi,

We provide image editing services like - photo cut out; photo clipping
path; photo masking; photo
shadow creation; photo color correction; photo retouching; beauty model
retouching on skin, face,
body; glamour retouching; products retouching and other image editing.

We are also offering to deliver testing for you, so that you get to know
our quality first hand.

If you want to explore further, please reply back.

Thanks and Regards,
Scott

Re: Git 2.18: RUNTIME_PREFIX... is it working?

2018-07-20 Thread Jeff Hostetler





On 7/19/2018 6:26 PM, brian m. carlson wrote:

On Wed, Jul 18, 2018 at 02:18:44PM +0200, Johannes Schindelin wrote:

On Sat, 14 Jul 2018, brian m. carlson wrote:

I will say that at cPanel, we have a configuration where end users can
end up inside a mount namespace without /proc (depending on the
preferences of the administrator).  However, it's easy enough for us to
simply build without RUNTIME_PREFIX if necessary.

If we turn it on by default, it would be nice if we documented (maybe in
the Makefile) that it requires /proc on Linux for the benefit of other
people who might be in a similar situation.


Is there *really* no other way on Linux to figure out the absolute path of
the current executable than to open a pseudo file in the `/proc` file
system?


Nope, not that I'm aware of.  You have to read the destination of
the /proc/PID/exe symlink.



Getting the full path of the current executable is a very Windows thing.
On most Unix-based systems it just isn't possible (even if Linux does
have the /proc thing).  Think about hard-links, for example.  There just
isn't a single canonical pathname for an inode.

Jeff

Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Duy Nguyen

On Thu, Jul 19, 2018 at 10:40 PM Jeff King  wrote:
>
> There are a few standard C functions (like strcpy) which are
> easy to misuse. We generally discourage these in reviews,
> but we haven't put advice in CodingGuidelines, nor provided
> any automated enforcement. The latter is especially
> important because it's more consistent, and it can often
> save a round-trip of review.
>
> Let's start by banning strcpy() and sprintf(). It's not
> impossible to use these correctly, but it's easy to do so
> incorrectly, and there's always a better option.

Is it possible to extend this to ban variables as well? I'm still
going to delete the_index from library code. Once that work is done I
will ban it from entering again (it's only allowed in builtin/ for
example). The next target after that would be the_repository.

Right now I will do this by not declaring the variable [2], which ends
up with a much less friendly compile warning. I did something similar
[1] in an earlier iteration but did not do extensive research on this
topic like you did.

[1] https://public-inbox.org/git/20180606073933.14755-1-pclo...@gmail.com/T/
[2] https://public-inbox.org/git/20180616054157.32433-16-pclo...@gmail.com/
-- 
Duy

[PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-20 Thread Nguyễn Thái Ngọc Duy

Let's start with some background about oe_delta_size() and
oe_set_delta_size(). If you already know, skip the next paragraph.

These two are added in 0aca34e826 (pack-objects: shrink delta_size
field in struct object_entry - 2018-04-14) to help reduce 'struct
object_entry' size. The delta size field in this struct is reduced to
only contain max 2MB. So if any new delta is produced and larger than
2MB, it's dropped because we can't really save such a large size
anywhere. Fallback is provided in case existingpackfiles already have
large deltas, then we can retrieve it from the pack.

While this should help small machines repacking large repos (i.e. less
memory pressure), dropping large deltas during the delta selection
process could end up with worse pack files. And if existing packfiles
already have >2MB delta and pack-objects is instructed to not reuse
deltas, all of them will be dropped on the floor, and the resulting
pack would be definitely bigger.

There is also a regression in terms of CPU/IO if we have large on-disk
deltas because fallback code needs to parse the pack every time the
delta size is needed and just access to the mmap'd pack data is enough
for extra page faults when memory is under pressure.

Both of these issues were reported on the mailing list. Here's some
numbers for comparison.

Version  Pack (MB)  MaxRSS(kB)  Time (s)
---  -  --  
 2.17.0 5498 435136282494.85
 2.18.010531 404495964168.94

This patch provides a better fallback that is

- cheaper in terms of cpu and io because we won't have to read
  existing pack files as much

- better in terms of pack size because the pack heuristics is back to
  2.17.0 time, we do not drop large deltas at all

If we encounter any delta (on-disk or created during try_delta phase)
that is larger than the 2MB limit, we stop using delta_size_ field for
this because it can't contain such size anyway. A new array of delta
size is dynamically allocated and can hold all the deltas that 2.17.0
can [1]. All current delta sizes are migrated over.

With this, we do not have to drop deltas in try_delta() anymore. Of
course the downside is we use slightly more memory, even compared to
2.17.0. But since this is considered an uncommon case, a bit more
memory consumption should not be a problem.

Delta size limit is also raised from 2MB to 32 MB to better cover
common case and avoid that extra memory consumption (99.999% deltas in
this reported repo are under 12MB). Other fields are shuffled around
to keep this struct packed tight. We don't use more memory in common
case even with this limit update.

A note about thread synchronization. Since this code can be run in
parallel during delta searching phase, we need a mutex. The realloc
part in packlist_alloc() is not protected because it only happens
during the object counting phase, which is always single-threaded.

The locking in oe_delta_size() could also be dropped if we make sure
the pack->delta_size pointer assignment in oe_set_delta_size() is
atomic. But let's keep the lock for now to be on the safe side. Lock
contention should not be that high for this lock.

[1] With a small tweak. 2.17.0 on 64-bit linux can hold 2^64 byte
deltas, which is absolutely insane. But windows builds, even
64-bit version, only hold 2^32. So reducing it to 2^32 should be
quite safe.

Reported-by: Elijah Newren 
Helped-by: Elijah Newren 
Helped-by: Jeff King 
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 I'm optimistic that the slowness on linux repo is lock contention
 (because if it's not then I have no clue what is). So let's start
 doing proper patches.

 If we want a quick fix for 2.18.1. I suggest bumping up
 OE_DELTA_SIZE_BITS like Elijah did in his second option. I don't
 think it's safe to fast track this patch.

 builtin/pack-objects.c|  6 ++--
 ci/run-build-and-tests.sh |  1 +
 pack-objects.c| 21 
 pack-objects.h| 68 +++
 t/README  |  4 +++
 5 files changed, 82 insertions(+), 18 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ebc8cefb53..3bc182b1b6 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2023,10 +2023,6 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
delta_buf = create_delta(src->index, trg->data, trg_size, &delta_size, 
max_size);
if (!delta_buf)
return 0;
-   if (delta_size >= (1U << OE_DELTA_SIZE_BITS)) {
-   free(delta_buf);
-   return 0;
-   }
 
if (DELTA(trg_entry)) {
/* Prefer only shallower same-sized deltas. */
@@ -2278,6 +2274,8 @@ static void init_threaded_search(void)
pthread_mutex_init(&cache_mutex, NULL);
pthread_mutex_init(&progress_mutex, NULL);
pthread_cond_init(&progress_cond, NULL);
+   pthread_mutex_init(&to_pack.lock, NULL);
+   to_pack.

Re: [PATCH v6 7/7] gpg-interface t: extend the existing GPG tests with GPGSM

2018-07-20 Thread Junio C Hamano

Thanks.

photos

2018-07-20 Thread Scott


Hi,

We provide image editing services like - photo cut out; photo clipping
path; photo masking; photo
shadow creation; photo color correction; photo retouching; beauty model
retouching on skin, face,
body; glamour retouching; products retouching and other image editing.

We are also offering to deliver testing for you, so that you get to know
our quality first hand.

If you want to explore further, please reply back.

Thanks and Regards,
Scott

Re: ds/multi-pack-index (was Re: What's cooking in git.git (Jul 2018, #02; Wed, 18))

2018-07-20 Thread Junio C Hamano

Derrick Stolee  writes:

>>   What's the doneness of this one?  I vaguely recall that there was
>>   an objection against the concept as a whole (i.e. there is a way
>>   with less damage to gain the same object-abbrev performance); has
>>   it (and if anything else, they) been resolved in satisfactory
>>   fashion?
>
> I believe you're talking about Ævar's patch series [1] on
> unconditional abbreviation lengths.

Yes, this is a total tangent, but what happened to that one?  I did
not queue because I was led to expect v2 to follow soonish [*1*].

> Lookup speeds improve in a multi-pack environment.

True.  I recall that years ago there was a discussion, but nobody
came up with patches, to do the consolidated .idx for exactly that
reason (not the "abbrev" reason).

> That's the best I can do to sell the feature as it stands now (plus
> the 'fsck' integration that would follow after this series is
> accepted).

Heh, 'fsck' intergration is not a 'feature' to sell anything, I
would think.  Nobody wants to run fsck for the sake of running
it---it is just having one extra file that must not go corrupt
_requires_ one to have a way to check its integrity and fsck is the
logical place to do so X-<.

In any case, we've had this for about a week in 'pu' after 4
iterations, and review comments seem to have quieted down [*2*], so
let's consider merging it down to 'next'.  I think at least I need
to "commit --amend" (or something like that) 16/23.

[Footnotes]

*1* <87a7s4471y@evledraar.gmail.com>

*2* That does not indicate either of these two:

- nobody is interested in the topic
- the topic is now without any flaw

It only means that keeping it in 'pu' as a dormant topic would
not do anybody any good.

Re: ds/multi-pack-index (was Re: What's cooking in git.git (Jul 2018, #02; Wed, 18))

2018-07-20 Thread Derrick Stolee


On 7/20/2018 12:09 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


   What's the doneness of this one?  I vaguely recall that there was
   an objection against the concept as a whole (i.e. there is a way
   with less damage to gain the same object-abbrev performance); has
   it (and if anything else, they) been resolved in satisfactory
   fashion?

I believe you're talking about Ævar's patch series [1] on
unconditional abbreviation lengths.

Yes, this is a total tangent, but what happened to that one?  I did
not queue because I was led to expect v2 to follow soonish [*1*].


Lookup speeds improve in a multi-pack environment.

True.  I recall that years ago there was a discussion, but nobody
came up with patches, to do the consolidated .idx for exactly that
reason (not the "abbrev" reason).


That's the best I can do to sell the feature as it stands now (plus
the 'fsck' integration that would follow after this series is
accepted).

Heh, 'fsck' intergration is not a 'feature' to sell anything, I
would think.  Nobody wants to run fsck for the sake of running
it---it is just having one extra file that must not go corrupt
_requires_ one to have a way to check its integrity and fsck is the
logical place to do so X-<.


Yep. I didn't mean 'fsck' is a selling point, but that it is an 
important thing to build for anything that is going in the objects 
directory. I mention it only to say that I'm committed to providing that 
functionality.



In any case, we've had this for about a week in 'pu' after 4
iterations, and review comments seem to have quieted down [*2*], so
let's consider merging it down to 'next'.  I think at least I need
to "commit --amend" (or something like that) 16/23.


Right. There is a commit message error and some spaces to insert. See 
[2] if you need a reminder. Thanks!


[2] https://public-inbox.org/git/xmqqin5kupu3@gitster-ct.c.googlers.com/


[Footnotes]

*1* <87a7s4471y@evledraar.gmail.com>

*2* That does not indicate either of these two:

 - nobody is interested in the topic
 - the topic is now without any flaw

 It only means that keeping it in 'pu' as a dormant topic would
 not do anybody any good.

[PATCH v2 05/18] upload-pack: make reachable() more generic

2018-07-20 Thread Derrick Stolee

In anticipation of moving the reachable() method to commit-reach.c,
modify the prototype to be more generic to flags known outside of
upload-pack.c. Also rename 'want' to 'from' to make the statement
more clear outside of the context of haves/wants negotiation.

Signed-off-by: Derrick Stolee 
---
 upload-pack.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 4ca052d0b6..5a639cb47b 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -336,17 +336,18 @@ static int got_oid(const char *hex, struct object_id *oid)
return 0;
 }
 
-static int reachable(struct commit *want)
+static int reachable(struct commit *from, unsigned int with_flag,
+unsigned int assign_flag)
 {
struct prio_queue work = { compare_commits_by_commit_date };
 
-   prio_queue_put(&work, want);
+   prio_queue_put(&work, from);
while (work.nr) {
struct commit_list *list;
struct commit *commit = prio_queue_get(&work);
 
-   if (commit->object.flags & THEY_HAVE) {
-   want->object.flags |= COMMON_KNOWN;
+   if (commit->object.flags & with_flag) {
+   from->object.flags |= assign_flag;
break;
}
if (!commit->object.parsed)
@@ -362,10 +363,10 @@ static int reachable(struct commit *want)
prio_queue_put(&work, parent);
}
}
-   want->object.flags |= REACHABLE;
-   clear_commit_marks(want, REACHABLE);
+   from->object.flags |= REACHABLE;
+   clear_commit_marks(from, REACHABLE);
clear_prio_queue(&work);
-   return (want->object.flags & COMMON_KNOWN);
+   return (from->object.flags & assign_flag);
 }
 
 static int ok_to_give_up(void)
@@ -390,7 +391,7 @@ static int ok_to_give_up(void)
want_obj.objects[i].item->flags |= COMMON_KNOWN;
continue;
}
-   if (!reachable((struct commit *)want))
+   if (!reachable((struct commit *)want, THEY_HAVE, COMMON_KNOWN))
return 0;
}
return 1;
-- 
2.18.0.118.gd4f65b8d14

[PATCH v2 07/18] upload-pack: generalize commit date cutoff

2018-07-20 Thread Derrick Stolee

The ok_to_give_up() method uses the commit date as a cutoff to avoid
walking the entire reachble set of commits. Before moving the
reachable() method to commit-reach.c, pull out the dependence on the
global constant 'oldest_have' with a 'min_commit_date' parameter.

Signed-off-by: Derrick Stolee 
---
 upload-pack.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 9fe19003c6..427de461d8 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -337,7 +337,7 @@ static int got_oid(const char *hex, struct object_id *oid)
 }
 
 static int reachable(struct commit *from, unsigned int with_flag,
-unsigned int assign_flag)
+unsigned int assign_flag, time_t min_commit_date)
 {
struct prio_queue work = { compare_commits_by_commit_date };
 
@@ -355,7 +355,7 @@ static int reachable(struct commit *from, unsigned int 
with_flag,
if (commit->object.flags & REACHABLE)
continue;
commit->object.flags |= REACHABLE;
-   if (commit->date < oldest_have)
+   if (commit->date < min_commit_date)
continue;
for (list = commit->parents; list; list = list->next) {
struct commit *parent = list->item;
@@ -372,11 +372,13 @@ static int reachable(struct commit *from, unsigned int 
with_flag,
 /*
  * Determine if every commit in 'from' can reach at least one commit
  * that is marked with 'with_flag'. As we traverse, use 'assign_flag'
- * as a marker for commits that are already visited.
+ * as a marker for commits that are already visited. Do not walk
+ * commits with date below 'min_commit_date'.
  */
 static int can_all_from_reach_with_flag(struct object_array *from,
unsigned int with_flag,
-   unsigned int assign_flag)
+   unsigned int assign_flag,
+   time_t min_commit_date)
 {
int i;
 
@@ -395,7 +397,8 @@ static int can_all_from_reach_with_flag(struct object_array 
*from,
from->objects[i].item->flags |= assign_flag;
continue;
}
-   if (!reachable((struct commit *)from_one, with_flag, 
assign_flag))
+   if (!reachable((struct commit *)from_one, with_flag, 
assign_flag,
+  min_commit_date))
return 0;
}
return 1;
@@ -406,7 +409,8 @@ static int ok_to_give_up(void)
if (!have_obj.nr)
return 0;
 
-   return can_all_from_reach_with_flag(&want_obj, THEY_HAVE, COMMON_KNOWN);
+   return can_all_from_reach_with_flag(&want_obj, THEY_HAVE,
+   COMMON_KNOWN, oldest_have);
 }
 
 static int get_common_commits(void)
-- 
2.18.0.118.gd4f65b8d14

[PATCH v2 02/18] commit.h: remove method declarations

2018-07-20 Thread Derrick Stolee

These methods are now declared in commit-reach.h. Remove them from
commit.h and add new include statements in all files that require these
declarations.

Signed-off-by: Derrick Stolee 
---
 bisect.c|  1 +
 builtin/branch.c|  1 +
 builtin/commit.c|  1 +
 builtin/fetch.c |  1 +
 builtin/fmt-merge-msg.c |  1 +
 builtin/log.c   |  1 +
 builtin/merge-base.c|  1 +
 builtin/merge.c |  1 +
 builtin/pull.c  |  1 +
 builtin/receive-pack.c  |  1 +
 builtin/rev-parse.c |  1 +
 commit.h| 29 -
 fast-import.c   |  1 +
 http-push.c |  2 +-
 merge-recursive.c   |  1 +
 notes-merge.c   |  1 +
 pack-bitmap-write.c |  1 +
 ref-filter.c|  1 +
 remote.c|  1 +
 revision.c  |  1 +
 sequencer.c |  1 +
 sha1-name.c |  1 +
 shallow.c   |  1 +
 submodule.c |  1 +
 24 files changed, 23 insertions(+), 30 deletions(-)

diff --git a/bisect.c b/bisect.c
index e1275ba79e..d023543c91 100644
--- a/bisect.c
+++ b/bisect.c
@@ -13,6 +13,7 @@
 #include "sha1-array.h"
 #include "argv-array.h"
 #include "commit-slab.h"
+#include "commit-reach.h"
 
 static struct oid_array good_revs;
 static struct oid_array skipped_revs;
diff --git a/builtin/branch.c b/builtin/branch.c
index a50632fb23..9a787447f4 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -23,6 +23,7 @@
 #include "ref-filter.h"
 #include "worktree.h"
 #include "help.h"
+#include "commit-reach.h"
 
 static const char * const builtin_branch_usage[] = {
N_("git branch [] [-r | -a] [--merged | --no-merged]"),
diff --git a/builtin/commit.c b/builtin/commit.c
index 158e3f843a..b5c608458e 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -33,6 +33,7 @@
 #include "sequencer.h"
 #include "mailmap.h"
 #include "help.h"
+#include "commit-reach.h"
 
 static const char * const builtin_commit_usage[] = {
N_("git commit [] [--] ..."),
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f5d960baec..7de234774b 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -22,6 +22,7 @@
 #include "utf8.h"
 #include "packfile.h"
 #include "list-objects-filter-options.h"
+#include "commit-reach.h"
 
 static const char * const builtin_fetch_usage[] = {
N_("git fetch [] [ [...]]"),
diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
index ff165c0fcd..7277d557b2 100644
--- a/builtin/fmt-merge-msg.c
+++ b/builtin/fmt-merge-msg.c
@@ -12,6 +12,7 @@
 #include "fmt-merge-msg.h"
 #include "gpg-interface.h"
 #include "repository.h"
+#include "commit-reach.h"
 
 static const char * const fmt_merge_msg_usage[] = {
N_("git fmt-merge-msg [-m ] [--log[=] | --no-log] [--file 
]"),
diff --git a/builtin/log.c b/builtin/log.c
index 55a6286d7f..333d97c692 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -31,6 +31,7 @@
 #include "progress.h"
 #include "commit-slab.h"
 #include "repository.h"
+#include "commit-reach.h"
 
 #define MAIL_DEFAULT_WRAP 72
 
diff --git a/builtin/merge-base.c b/builtin/merge-base.c
index 08d91b1f0c..1c92099070 100644
--- a/builtin/merge-base.c
+++ b/builtin/merge-base.c
@@ -7,6 +7,7 @@
 #include "revision.h"
 #include "parse-options.h"
 #include "repository.h"
+#include "commit-reach.h"
 
 static int show_merge_base(struct commit **rev, int rev_nr, int show_all)
 {
diff --git a/builtin/merge.c b/builtin/merge.c
index d1b547d973..4c601c40a2 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -36,6 +36,7 @@
 #include "packfile.h"
 #include "tag.h"
 #include "alias.h"
+#include "commit-reach.h"
 
 #define DEFAULT_TWOHEAD (1<<0)
 #define DEFAULT_OCTOPUS (1<<1)
diff --git a/builtin/pull.c b/builtin/pull.c
index 4e78935392..15ad010968 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -22,6 +22,7 @@
 #include "tempfile.h"
 #include "lockfile.h"
 #include "wt-status.h"
+#include "commit-reach.h"
 
 enum rebase_type {
REBASE_INVALID = -1,
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 400d31c18c..d8467f9734 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -27,6 +27,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "protocol.h"
+#include "commit-reach.h"
 
 static const char * const receive_pack_usage[] = {
N_("git receive-pack "),
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 0f09bbbf65..455f62246d 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -14,6 +14,7 @@
 #include "revision.h"
 #include "split-index.h"
 #include "submodule.h"
+#include "commit-reach.h"
 
 #define DO_REVS1
 #define DO_NOREV   2
diff --git a/commit.h b/commit.h
index da0db36eba..e2c99d9b04 100644
--- a/commit.h
+++ b/commit.h
@@ -204,13 +204,6 @@ struct commit_graft *read_graft_line(struct strbuf *line);
 int register_commit_graft(struct repository *r, struct commit_graft *, int);
 struct commit_graft *lookup_commit_graft(struct repositor

[PATCH v2 04/18] commit-reach: move commit_contains from ref-filter

2018-07-20 Thread Derrick Stolee

There are several commit walks in the codebase. Group them together into
a new commit-reach.c file and corresponding header. After we group these
walks into one place, we can reduce duplicate logic by calling
equivalent methods.

All methods are direct moves, except we also make the commit_contains()
method public so its consumers in ref-filter.c can still call it. We can
also test this method in a test-tool in a later commit.

Signed-off-by: Derrick Stolee 
---
 commit-reach.c | 121 +
 commit-reach.h |  20 ++-
 ref-filter.c   | 145 +++--
 3 files changed, 147 insertions(+), 139 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index a6bc4781a6..01d796f011 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1,8 +1,10 @@
 #include "cache.h"
 #include "commit.h"
+#include "commit-graph.h"
 #include "decorate.h"
 #include "prio-queue.h"
 #include "tree.h"
+#include "ref-filter.c"
 #include "revision.h"
 #include "tag.h"
 #include "commit-reach.h"
@@ -411,3 +413,122 @@ int ref_newer(const struct object_id *new_oid, const 
struct object_id *old_oid)
unmark_and_free(used, TMP_MARK);
return found;
 }
+
+/*
+ * Mimicking the real stack, this stack lives on the heap, avoiding stack
+ * overflows.
+ *
+ * At each recursion step, the stack items points to the commits whose
+ * ancestors are to be inspected.
+ */
+struct contains_stack {
+   int nr, alloc;
+   struct contains_stack_entry {
+   struct commit *commit;
+   struct commit_list *parents;
+   } *contains_stack;
+};
+
+static int in_commit_list(const struct commit_list *want, struct commit *c)
+{
+   for (; want; want = want->next)
+   if (!oidcmp(&want->item->object.oid, &c->object.oid))
+   return 1;
+   return 0;
+}
+
+/*
+ * Test whether the candidate is contained in the list.
+ * Do not recurse to find out, though, but return -1 if inconclusive.
+ */
+static enum contains_result contains_test(struct commit *candidate,
+ const struct commit_list *want,
+ struct contains_cache *cache,
+ uint32_t cutoff)
+{
+   enum contains_result *cached = contains_cache_at(cache, candidate);
+
+   /* If we already have the answer cached, return that. */
+   if (*cached)
+   return *cached;
+
+   /* or are we it? */
+   if (in_commit_list(want, candidate)) {
+   *cached = CONTAINS_YES;
+   return CONTAINS_YES;
+   }
+
+   /* Otherwise, we don't know; prepare to recurse */
+   parse_commit_or_die(candidate);
+
+   if (candidate->generation < cutoff)
+   return CONTAINS_NO;
+
+   return CONTAINS_UNKNOWN;
+}
+
+static void push_to_contains_stack(struct commit *candidate, struct 
contains_stack *contains_stack)
+{
+   ALLOC_GROW(contains_stack->contains_stack, contains_stack->nr + 1, 
contains_stack->alloc);
+   contains_stack->contains_stack[contains_stack->nr].commit = candidate;
+   contains_stack->contains_stack[contains_stack->nr++].parents = 
candidate->parents;
+}
+
+static enum contains_result contains_tag_algo(struct commit *candidate,
+ const struct commit_list *want,
+ struct contains_cache *cache)
+{
+   struct contains_stack contains_stack = { 0, 0, NULL };
+   enum contains_result result;
+   uint32_t cutoff = GENERATION_NUMBER_INFINITY;
+   const struct commit_list *p;
+
+   for (p = want; p; p = p->next) {
+   struct commit *c = p->item;
+   load_commit_graph_info(the_repository, c);
+   if (c->generation < cutoff)
+   cutoff = c->generation;
+   }
+
+   result = contains_test(candidate, want, cache, cutoff);
+   if (result != CONTAINS_UNKNOWN)
+   return result;
+
+   push_to_contains_stack(candidate, &contains_stack);
+   while (contains_stack.nr) {
+   struct contains_stack_entry *entry = 
&contains_stack.contains_stack[contains_stack.nr - 1];
+   struct commit *commit = entry->commit;
+   struct commit_list *parents = entry->parents;
+
+   if (!parents) {
+   *contains_cache_at(cache, commit) = CONTAINS_NO;
+   contains_stack.nr--;
+   }
+   /*
+* If we just popped the stack, parents->item has been marked,
+* therefore contains_test will return a meaningful yes/no.
+*/
+   else switch (contains_test(parents->item, want, cache, cutoff)) 
{
+   case CONTAINS_YES:
+   *contains_cache_at(cache, commit) = CONTAINS_YES;
+   contains_stack.nr--;
+

[PATCH v2 03/18] commit-reach: move ref_newer from remote.c

2018-07-20 Thread Derrick Stolee

There are several commit walks in the codebase. Group them together into
a new commit-reach.c file and corresponding header. After we group these
walks into one place, we can reduce duplicate logic by calling
equivalent methods.

The ref_newer() method is used by 'git push -f' to check if a force-push
is necessary. By making the method public, we make it possible to test
the method directly without setting up an envieronment where a 'git
push' call makes sense.

Signed-off-by: Derrick Stolee 
---
 builtin/remote.c |  1 +
 commit-reach.c   | 55 +++-
 commit-reach.h   |  2 ++
 remote.c | 49 --
 remote.h |  1 -
 5 files changed, 57 insertions(+), 51 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index c74ee88690..79b0326446 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -10,6 +10,7 @@
 #include "refspec.h"
 #include "object-store.h"
 #include "argv-array.h"
+#include "commit-reach.h"
 
 static const char * const builtin_remote_usage[] = {
N_("git remote [-v | --verbose]"),
diff --git a/commit-reach.c b/commit-reach.c
index 8ab6044414..a6bc4781a6 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1,6 +1,10 @@
 #include "cache.h"
-#include "prio-queue.h"
 #include "commit.h"
+#include "decorate.h"
+#include "prio-queue.h"
+#include "tree.h"
+#include "revision.h"
+#include "tag.h"
 #include "commit-reach.h"
 
 /* Remember to update object flag allocation in object.h */
@@ -358,3 +362,52 @@ void reduce_heads_replace(struct commit_list **heads)
free_commit_list(*heads);
*heads = result;
 }
+
+static void unmark_and_free(struct commit_list *list, unsigned int mark)
+{
+   while (list) {
+   struct commit *commit = pop_commit(&list);
+   commit->object.flags &= ~mark;
+   }
+}
+
+int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid)
+{
+   struct object *o;
+   struct commit *old_commit, *new_commit;
+   struct commit_list *list, *used;
+   int found = 0;
+
+   /*
+* Both new_commit and old_commit must be commit-ish and new_commit is 
descendant of
+* old_commit.  Otherwise we require --force.
+*/
+   o = deref_tag(the_repository, parse_object(the_repository, old_oid),
+ NULL, 0);
+   if (!o || o->type != OBJ_COMMIT)
+   return 0;
+   old_commit = (struct commit *) o;
+
+   o = deref_tag(the_repository, parse_object(the_repository, new_oid),
+ NULL, 0);
+   if (!o || o->type != OBJ_COMMIT)
+   return 0;
+   new_commit = (struct commit *) o;
+
+   if (parse_commit(new_commit) < 0)
+   return 0;
+
+   used = list = NULL;
+   commit_list_insert(new_commit, &list);
+   while (list) {
+   new_commit = pop_most_recent_commit(&list, TMP_MARK);
+   commit_list_insert(new_commit, &used);
+   if (new_commit == old_commit) {
+   found = 1;
+   break;
+   }
+   }
+   unmark_and_free(list, TMP_MARK);
+   unmark_and_free(used, TMP_MARK);
+   return found;
+}
diff --git a/commit-reach.h b/commit-reach.h
index 1ea2696e40..f1cf9bfcd8 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -39,4 +39,6 @@ struct commit_list *reduce_heads(struct commit_list *heads);
  */
 void reduce_heads_replace(struct commit_list **heads);
 
+int ref_newer(const struct object_id *new_oid, const struct object_id 
*old_oid);
+
 #endif
diff --git a/remote.c b/remote.c
index 8e99b9888a..f0c23bae48 100644
--- a/remote.c
+++ b/remote.c
@@ -1784,55 +1784,6 @@ int resolve_remote_symref(struct ref *ref, struct ref 
*list)
return 1;
 }
 
-static void unmark_and_free(struct commit_list *list, unsigned int mark)
-{
-   while (list) {
-   struct commit *commit = pop_commit(&list);
-   commit->object.flags &= ~mark;
-   }
-}
-
-int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid)
-{
-   struct object *o;
-   struct commit *old_commit, *new_commit;
-   struct commit_list *list, *used;
-   int found = 0;
-
-   /*
-* Both new_commit and old_commit must be commit-ish and new_commit is 
descendant of
-* old_commit.  Otherwise we require --force.
-*/
-   o = deref_tag(the_repository, parse_object(the_repository, old_oid),
- NULL, 0);
-   if (!o || o->type != OBJ_COMMIT)
-   return 0;
-   old_commit = (struct commit *) o;
-
-   o = deref_tag(the_repository, parse_object(the_repository, new_oid),
- NULL, 0);
-   if (!o || o->type != OBJ_COMMIT)
-   return 0;
-   new_commit = (struct commit *) o;
-
-   if (parse_commit(new_commit) < 0)
-   return 0;
-
-   used = list = NULL;
-   commit_list_i

[PATCH v2 08/18] commit-reach: move can_all_from_reach_with_flags

2018-07-20 Thread Derrick Stolee

There are several commit walks in the codebase. Group them together into
a new commit-reach.c file and corresponding header. After we group these
walks into one place, we can reduce duplicate logic by calling
equivalent methods.

The can_all_from_reach_with_flags method is used in a stateful way by
upload-pack.c. The parameters are very flexible, so we will be able to
use its commit walking logic for many other callers.

Signed-off-by: Derrick Stolee 
---
 commit-reach.c | 63 +
 commit-reach.h | 14 ++
 object.h   |  4 +--
 upload-pack.c  | 70 +-
 4 files changed, 80 insertions(+), 71 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 01d796f011..d806291d5d 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -10,6 +10,7 @@
 #include "commit-reach.h"
 
 /* Remember to update object flag allocation in object.h */
+#define REACHABLE   (1u<<15)
 #define PARENT1(1u<<16)
 #define PARENT2(1u<<17)
 #define STALE  (1u<<18)
@@ -532,3 +533,65 @@ int commit_contains(struct ref_filter *filter, struct 
commit *commit,
return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
return is_descendant_of(commit, list);
 }
+
+int reachable(struct commit *from, unsigned int with_flag,
+ unsigned int assign_flag, time_t min_commit_date)
+{
+   struct prio_queue work = { compare_commits_by_commit_date };
+
+   prio_queue_put(&work, from);
+   while (work.nr) {
+   struct commit_list *list;
+   struct commit *commit = prio_queue_get(&work);
+
+   if (commit->object.flags & with_flag) {
+   from->object.flags |= assign_flag;
+   break;
+   }
+   if (!commit->object.parsed)
+   parse_object(the_repository, &commit->object.oid);
+   if (commit->object.flags & REACHABLE)
+   continue;
+   commit->object.flags |= REACHABLE;
+   if (commit->date < min_commit_date)
+   continue;
+   for (list = commit->parents; list; list = list->next) {
+   struct commit *parent = list->item;
+   if (!(parent->object.flags & REACHABLE))
+   prio_queue_put(&work, parent);
+   }
+   }
+   from->object.flags |= REACHABLE;
+   clear_commit_marks(from, REACHABLE);
+   clear_prio_queue(&work);
+   return (from->object.flags & assign_flag);
+}
+
+int can_all_from_reach_with_flag(struct object_array *from,
+unsigned int with_flag,
+unsigned int assign_flag,
+time_t min_commit_date)
+{
+   int i;
+
+   for (i = 0; i < from->nr; i++) {
+   struct object *from_one = from->objects[i].item;
+
+   if (from_one->flags & assign_flag)
+   continue;
+   from_one = deref_tag(the_repository, from_one, "a from object", 
0);
+   if (!from_one || from_one->type != OBJ_COMMIT) {
+   /* no way to tell if this is reachable by
+* looking at the ancestry chain alone, so
+* leave a note to ourselves not to worry about
+* this object anymore.
+*/
+   from->objects[i].item->flags |= assign_flag;
+   continue;
+   }
+   if (!reachable((struct commit *)from_one, with_flag, 
assign_flag,
+  min_commit_date))
+   return 0;
+   }
+   return 1;
+}
diff --git a/commit-reach.h b/commit-reach.h
index 13dec25cee..b28bc22fcd 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -59,4 +59,18 @@ define_commit_slab(contains_cache, enum contains_result);
 int commit_contains(struct ref_filter *filter, struct commit *commit,
struct commit_list *list, struct contains_cache *cache);
 
+int reachable(struct commit *from, unsigned int with_flag,
+ unsigned int assign_flag, time_t min_commit_date);
+
+/*
+ * Determine if every commit in 'from' can reach at least one commit
+ * that is marked with 'with_flag'. As we traverse, use 'assign_flag'
+ * as a marker for commits that are already visited. Do not walk
+ * commits with date below 'min_commit_date'.
+ */
+int can_all_from_reach_with_flag(struct object_array *from,
+unsigned int with_flag,
+unsigned int assign_flag,
+time_t min_commit_date);
+
 #endif
diff --git a/object.h b/object.h
index 18c2b073e3..b132944c51 100644
--- a/object.h
+++ b/object.h
@@ -60,12 +60,12 @@ struct object_array {
  * revision.h:

[PATCH v2 06/18] upload-pack: refactor ok_to_give_up()

2018-07-20 Thread Derrick Stolee

In anticipation of consolidating all commit reachability algorithms,
refactor ok_to_give_up() in order to allow splitting its logic into
an external method.

Signed-off-by: Derrick Stolee 
---
 upload-pack.c | 34 +++---
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 5a639cb47b..9fe19003c6 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -369,34 +369,46 @@ static int reachable(struct commit *from, unsigned int 
with_flag,
return (from->object.flags & assign_flag);
 }
 
-static int ok_to_give_up(void)
+/*
+ * Determine if every commit in 'from' can reach at least one commit
+ * that is marked with 'with_flag'. As we traverse, use 'assign_flag'
+ * as a marker for commits that are already visited.
+ */
+static int can_all_from_reach_with_flag(struct object_array *from,
+   unsigned int with_flag,
+   unsigned int assign_flag)
 {
int i;
 
-   if (!have_obj.nr)
-   return 0;
-
-   for (i = 0; i < want_obj.nr; i++) {
-   struct object *want = want_obj.objects[i].item;
+   for (i = 0; i < from->nr; i++) {
+   struct object *from_one = from->objects[i].item;
 
-   if (want->flags & COMMON_KNOWN)
+   if (from_one->flags & assign_flag)
continue;
-   want = deref_tag(the_repository, want, "a want line", 0);
-   if (!want || want->type != OBJ_COMMIT) {
+   from_one = deref_tag(the_repository, from_one, "a from object", 
0);
+   if (!from_one || from_one->type != OBJ_COMMIT) {
/* no way to tell if this is reachable by
 * looking at the ancestry chain alone, so
 * leave a note to ourselves not to worry about
 * this object anymore.
 */
-   want_obj.objects[i].item->flags |= COMMON_KNOWN;
+   from->objects[i].item->flags |= assign_flag;
continue;
}
-   if (!reachable((struct commit *)want, THEY_HAVE, COMMON_KNOWN))
+   if (!reachable((struct commit *)from_one, with_flag, 
assign_flag))
return 0;
}
return 1;
 }
 
+static int ok_to_give_up(void)
+{
+   if (!have_obj.nr)
+   return 0;
+
+   return can_all_from_reach_with_flag(&want_obj, THEY_HAVE, COMMON_KNOWN);
+}
+
 static int get_common_commits(void)
 {
struct object_id oid;
-- 
2.18.0.118.gd4f65b8d14

[PATCH v2 15/18] test-reach: test commit_contains

2018-07-20 Thread Derrick Stolee

The commit_contains method has two modes which depend on the given
ref_filter struct. We have the "normal" algorithm (which is also the
typically-slow operation) and the "tag" algorithm. This difference is
essentially what changes performance for 'git branch --contains' versus
'git tag --contains'. There are thoughts that the data shapes used by
these two applications justify the different implementations.

Create tests using 'test-tool reach commit_contains [--tag]' to cover
both methods.

Signed-off-by: Derrick Stolee 
---
 t/helper/test-reach.c | 12 
 t/t6600-test-reach.sh | 34 ++
 2 files changed, 46 insertions(+)

diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index c79729cac0..eb21103998 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -4,6 +4,7 @@
 #include "commit-reach.h"
 #include "config.h"
 #include "parse-options.h"
+#include "ref-filter.h"
 #include "string-list.h"
 #include "tag.h"
 
@@ -112,6 +113,17 @@ int cmd__reach(int ac, const char **av)
print_sorted_commit_ids(list);
} else if (!strcmp(av[1], "can_all_from_reach")) {
printf("%s(X,Y):%d\n", av[1], can_all_from_reach(X, Y, 1));
+   } else if (!strcmp(av[1], "commit_contains")) {
+   struct ref_filter filter;
+   struct contains_cache cache;
+   init_contains_cache(&cache);
+
+   if (ac > 2 && !strcmp(av[2], "--tag"))
+   filter.with_commit_tag_algo = 1;
+   else
+   filter.with_commit_tag_algo = 0;
+
+   printf("%s(_,A,X,_):%d\n", av[1], commit_contains(&filter, A, 
X, &cache));
}
 
exit(0);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index e41eb397a7..d139a00d1d 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -205,4 +205,38 @@ test_expect_success 'can_all_from_reach:miss' '
test_three_modes can_all_from_reach
 '
 
+test_expect_success 'commit_contains:hit' '
+   cat >input <<-\EOF &&
+   A:commit-7-7
+   X:commit-2-10
+   X:commit-3-9
+   X:commit-4-8
+   X:commit-5-7
+   X:commit-6-6
+   X:commit-7-5
+   X:commit-8-4
+   X:commit-9-3
+   EOF
+   echo "commit_contains(_,A,X,_):1" >expect &&
+   test_three_modes commit_contains &&
+   test_three_modes commit_contains --tag
+'
+
+test_expect_success 'commit_contains:miss' '
+   cat >input <<-\EOF &&
+   A:commit-6-5
+   X:commit-2-10
+   X:commit-3-9
+   X:commit-4-8
+   X:commit-5-7
+   X:commit-6-6
+   X:commit-7-5
+   X:commit-8-4
+   X:commit-9-3
+   EOF
+   echo "commit_contains(_,A,X,_):0" >expect &&
+   test_three_modes commit_contains &&
+   test_three_modes commit_contains --tag
+'
+
 test_done
-- 
2.18.0.118.gd4f65b8d14

[PATCH v2 09/18] test-reach: create new test tool for ref_newer

2018-07-20 Thread Derrick Stolee

As we prepare to change the behavior of the algorithms in
commit-reach.c, create a new test-tool subcommand 'reach' to test these
methods on interesting commit-graph shapes.

To use the new test-tool, use 'test-tool reach ' and provide
input to stdin that describes the inputs to the method. Currently, we
only implement the ref_newer method, which requires two commits. Use
lines "A:" and "B:" for the two inputs. We will
expand this input later to accommodate methods that take lists of
commits.

The test t6600-test-reach.sh creates a repo whose commits form a
two-dimensional grid. This grid makes it easy for us to determine
reachability because commit-A-B can reach commit-X-Y if and only if A is
at least X and B is at least Y. This helps create interesting test cases
for each result of the methods in commit-reach.c.

We test all methods in three different states of the commit-graph file:
Non-existent (no generation numbers), fully computed, and mixed (some
commits have generation numbers and others do not).

Signed-off-by: Derrick Stolee 
---
 Makefile  |  1 +
 t/helper/test-reach.c | 63 +++
 t/helper/test-tool.c  |  1 +
 t/helper/test-tool.h  |  1 +
 t/t6600-test-reach.sh | 86 +++
 5 files changed, 152 insertions(+)
 create mode 100644 t/helper/test-reach.c
 create mode 100755 t/t6600-test-reach.sh

diff --git a/Makefile b/Makefile
index 59781f4bc3..d69f9d415d 100644
--- a/Makefile
+++ b/Makefile
@@ -716,6 +716,7 @@ TEST_BUILTINS_OBJS += test-mktemp.o
 TEST_BUILTINS_OBJS += test-online-cpus.o
 TEST_BUILTINS_OBJS += test-path-utils.o
 TEST_BUILTINS_OBJS += test-prio-queue.o
+TEST_BUILTINS_OBJS += test-reach.o
 TEST_BUILTINS_OBJS += test-read-cache.o
 TEST_BUILTINS_OBJS += test-ref-store.o
 TEST_BUILTINS_OBJS += test-regex.o
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
new file mode 100644
index 00..620bb46041
--- /dev/null
+++ b/t/helper/test-reach.c
@@ -0,0 +1,63 @@
+#include "test-tool.h"
+#include "cache.h"
+#include "commit.h"
+#include "commit-reach.h"
+#include "config.h"
+#include "parse-options.h"
+#include "tag.h"
+
+int cmd__reach(int ac, const char **av)
+{
+   struct object_id oid_A, oid_B;
+   struct strbuf buf = STRBUF_INIT;
+   struct repository *r = the_repository;
+
+   setup_git_directory();
+
+   if (ac < 2)
+   exit(1);
+
+
+   while (strbuf_getline(&buf, stdin) != EOF) {
+   struct object_id oid;
+   struct object *o;
+   struct commit *c;
+   if (buf.len < 3)
+   continue;
+
+   if (get_oid_committish(buf.buf + 2, &oid))
+   die("failed to resolve %s", buf.buf + 2);
+
+   o = parse_object(r, &oid);
+   o = deref_tag_noverify(o);
+
+   if (!o)
+   die("failed to load commit for input %s resulting in 
oid %s\n",
+   buf.buf, oid_to_hex(&oid));
+
+   c = object_as_type(r, o, OBJ_COMMIT, 0);
+
+   if (!c)
+   die("failed to load commit for input %s resulting in 
oid %s\n",
+   buf.buf, oid_to_hex(&oid));
+
+   switch (buf.buf[0]) {
+   case 'A':
+   oidcpy(&oid_A, &oid);
+   break;
+
+   case 'B':
+   oidcpy(&oid_B, &oid);
+   break;
+
+   default:
+   die("unexpected start of line: %c", buf.buf[0]);
+   }
+   }
+   strbuf_release(&buf);
+
+   if (!strcmp(av[1], "ref_newer"))
+   printf("%s(A,B):%d\n", av[1], ref_newer(&oid_A, &oid_B));
+
+   exit(0);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index dafc91c240..582d02adfd 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -26,6 +26,7 @@ static struct test_cmd cmds[] = {
{ "online-cpus", cmd__online_cpus },
{ "path-utils", cmd__path_utils },
{ "prio-queue", cmd__prio_queue },
+   { "reach", cmd__reach },
{ "read-cache", cmd__read_cache },
{ "ref-store", cmd__ref_store },
{ "regex", cmd__regex },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 80cbcf0857..a7e53c420e 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -20,6 +20,7 @@ int cmd__mktemp(int argc, const char **argv);
 int cmd__online_cpus(int argc, const char **argv);
 int cmd__path_utils(int argc, const char **argv);
 int cmd__prio_queue(int argc, const char **argv);
+int cmd__reach(int argc, const char **argv);
 int cmd__read_cache(int argc, const char **argv);
 int cmd__ref_store(int argc, const char **argv);
 int cmd__regex(int argc, const char **argv);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
new file mode 100755
index

[PATCH v2 11/18] test-reach: test is_descendant_of

2018-07-20 Thread Derrick Stolee

The is_descendant_of method takes a single commit as its first parameter
and a list of commits as its second parameter. Extend the input of the
'test-tool reach' command to take multiple lines of the form
"X:" to construct a list of commits. Pass these to
is_descendant_of and create tests that check each result.

Signed-off-by: Derrick Stolee 
---
 t/helper/test-reach.c |  8 
 t/t6600-test-reach.sh | 22 ++
 2 files changed, 30 insertions(+)

diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index f93ad5084d..dccbd48178 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -10,6 +10,7 @@ int cmd__reach(int ac, const char **av)
 {
struct object_id oid_A, oid_B;
struct commit *A, *B;
+   struct commit_list *X;
struct strbuf buf = STRBUF_INIT;
struct repository *r = the_repository;
 
@@ -19,6 +20,7 @@ int cmd__reach(int ac, const char **av)
exit(1);
 
A = B = NULL;
+   X = NULL;
 
while (strbuf_getline(&buf, stdin) != EOF) {
struct object_id oid;
@@ -54,6 +56,10 @@ int cmd__reach(int ac, const char **av)
B = c;
break;
 
+   case 'X':
+   commit_list_insert(c, &X);
+   break;
+
default:
die("unexpected start of line: %c", buf.buf[0]);
}
@@ -64,6 +70,8 @@ int cmd__reach(int ac, const char **av)
printf("%s(A,B):%d\n", av[1], ref_newer(&oid_A, &oid_B));
else if (!strcmp(av[1], "in_merge_bases"))
printf("%s(A,B):%d\n", av[1], in_merge_bases(A, B));
+   else if (!strcmp(av[1], "is_descendant_of"))
+   printf("%s(A,X):%d\n", av[1], is_descendant_of(A, X));
 
exit(0);
 }
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 5cd6b14c69..98bcb17960 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -101,4 +101,26 @@ test_expect_success 'in_merge_bases:miss' '
test_three_modes in_merge_bases
 '
 
+test_expect_success 'is_descendant_of:hit' '
+   cat >input <<-\EOF &&
+   A:commit-5-7
+   X:commit-4-8
+   X:commit-6-6
+   X:commit-1-1
+   EOF
+   echo "is_descendant_of(A,X):1" >expect &&
+   test_three_modes is_descendant_of
+'
+
+test_expect_success 'is_descendant_of:miss' '
+   cat >input <<-\EOF &&
+   A:commit-6-8
+   X:commit-5-9
+   X:commit-4-10
+   X:commit-7-6
+   EOF
+   echo "is_descendant_of(A,X):0" >expect &&
+   test_three_modes is_descendant_of
+'
+
 test_done
-- 
2.18.0.118.gd4f65b8d14

[PATCH v2 16/18] commit-reach: replace ref_newer logic

2018-07-20 Thread Derrick Stolee

The ref_newer method is used by 'git push' to check if a force-push is
required. This method does not use any kind of cutoff when walking, so
in the case of a force-push will walk all reachable commits.

The is_descendant_of method already uses paint_down_to_common along with
cutoffs. By translating the ref_newer arguments into the commit and
commit_list required by is_descendant_of, we can have one fewer commit
walk and also improve our performance!

For a copy of the Linux repository, 'test-tool reach ref_newer' presents
the following improvements with the specified input. In the case that
ref_newer returns 1, there is no improvement. The improvement is in the
second case where ref_newer returns 0.

Input:
A:v4.9
B:v3.19

Before: 0.09 s
 After: 0.09 s

To test the negative case, add a new commit with parent v3.19,
regenerate the commit-graph, and then run with B pointing at that
commit.

Before: 0.43 s
 After: 0.09 s

Signed-off-by: Derrick Stolee 
---
 commit-reach.c | 26 +++---
 1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 940fbf2e17..f5858944fd 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -366,20 +366,11 @@ void reduce_heads_replace(struct commit_list **heads)
*heads = result;
 }
 
-static void unmark_and_free(struct commit_list *list, unsigned int mark)
-{
-   while (list) {
-   struct commit *commit = pop_commit(&list);
-   commit->object.flags &= ~mark;
-   }
-}
-
 int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid)
 {
struct object *o;
struct commit *old_commit, *new_commit;
-   struct commit_list *list, *used;
-   int found = 0;
+   struct commit_list *old_commit_list = NULL;
 
/*
 * Both new_commit and old_commit must be commit-ish and new_commit is 
descendant of
@@ -400,19 +391,8 @@ int ref_newer(const struct object_id *new_oid, const 
struct object_id *old_oid)
if (parse_commit(new_commit) < 0)
return 0;
 
-   used = list = NULL;
-   commit_list_insert(new_commit, &list);
-   while (list) {
-   new_commit = pop_most_recent_commit(&list, TMP_MARK);
-   commit_list_insert(new_commit, &used);
-   if (new_commit == old_commit) {
-   found = 1;
-   break;
-   }
-   }
-   unmark_and_free(list, TMP_MARK);
-   unmark_and_free(used, TMP_MARK);
-   return found;
+   commit_list_insert(old_commit, &old_commit_list);
+   return is_descendant_of(new_commit, old_commit_list);
 }
 
 /*
-- 
2.18.0.118.gd4f65b8d14

[PATCH v2 10/18] test-reach: test in_merge_bases

2018-07-20 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 t/helper/test-reach.c |  6 ++
 t/t6600-test-reach.sh | 18 ++
 2 files changed, 24 insertions(+)

diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 620bb46041..f93ad5084d 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -9,6 +9,7 @@
 int cmd__reach(int ac, const char **av)
 {
struct object_id oid_A, oid_B;
+   struct commit *A, *B;
struct strbuf buf = STRBUF_INIT;
struct repository *r = the_repository;
 
@@ -17,6 +18,7 @@ int cmd__reach(int ac, const char **av)
if (ac < 2)
exit(1);
 
+   A = B = NULL;
 
while (strbuf_getline(&buf, stdin) != EOF) {
struct object_id oid;
@@ -44,10 +46,12 @@ int cmd__reach(int ac, const char **av)
switch (buf.buf[0]) {
case 'A':
oidcpy(&oid_A, &oid);
+   A = c;
break;
 
case 'B':
oidcpy(&oid_B, &oid);
+   B = c;
break;
 
default:
@@ -58,6 +62,8 @@ int cmd__reach(int ac, const char **av)
 
if (!strcmp(av[1], "ref_newer"))
printf("%s(A,B):%d\n", av[1], ref_newer(&oid_A, &oid_B));
+   else if (!strcmp(av[1], "in_merge_bases"))
+   printf("%s(A,B):%d\n", av[1], in_merge_bases(A, B));
 
exit(0);
 }
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 966309c6cf..5cd6b14c69 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -83,4 +83,22 @@ test_expect_success 'ref_newer:hit' '
test_three_modes ref_newer
 '
 
+test_expect_success 'in_merge_bases:hit' '
+   cat >input <<-\EOF &&
+   A:commit-5-7
+   B:commit-8-8
+   EOF
+   echo "in_merge_bases(A,B):1" >expect &&
+   test_three_modes in_merge_bases
+'
+
+test_expect_success 'in_merge_bases:miss' '
+   cat >input <<-\EOF &&
+   A:commit-6-8
+   B:commit-5-9
+   EOF
+   echo "in_merge_bases(A,B):0" >expect &&
+   test_three_modes in_merge_bases
+'
+
 test_done
-- 
2.18.0.118.gd4f65b8d14

[PATCH v2 14/18] test-reach: test can_all_from_reach_with_flags

2018-07-20 Thread Derrick Stolee

The can_all_from_reach_with_flags method is used by ok_to_give_up in
upload-pack.c to see if we have done enough negotiation during a fetch.
This method is intentionally created to preserve state between calls to
assist with stateful negotiation, such as over SSH.

To make this method testable, add a new can_all_from_reach method that
does the initial setup and final tear-down. We will later use this
method in production code. Call the method from 'test-tool reach' for
now.

Since this is a many-to-many reachability query, add a new type of input
to the 'test-tool reach' input format. Lines "Y:" create a
list of commits to be the reachability targets from the commits in the
'X' list. In the context of fetch negotiation, the 'X' commits are the
'want' commits and the 'Y' commits are the 'have' commits.

Signed-off-by: Derrick Stolee 
---
 commit-reach.c| 47 +++
 commit-reach.h|  2 ++
 t/helper/test-reach.c | 10 +++--
 t/t6600-test-reach.sh | 45 +
 4 files changed, 102 insertions(+), 2 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index d806291d5d..940fbf2e17 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -595,3 +595,50 @@ int can_all_from_reach_with_flag(struct object_array *from,
}
return 1;
 }
+
+int can_all_from_reach(struct commit_list *from, struct commit_list *to,
+  int cutoff_by_min_date)
+{
+   struct object_array from_objs = OBJECT_ARRAY_INIT;
+   time_t min_commit_date = cutoff_by_min_date ? from->item->date : 0;
+   struct commit_list *from_iter = from, *to_iter = to;
+   int result;
+
+   while (from_iter) {
+   add_object_array(&from_iter->item->object, NULL, &from_objs);
+
+   if (!parse_commit(from_iter->item)) {
+   if (from_iter->item->date < min_commit_date)
+   min_commit_date = from_iter->item->date;
+   }
+
+   from_iter = from_iter->next;
+   }
+
+   while (to_iter) {
+   if (!parse_commit(to_iter->item)) {
+   if (to_iter->item->date < min_commit_date)
+   min_commit_date = to_iter->item->date;
+   }
+
+   to_iter->item->object.flags |= PARENT2;
+
+   to_iter = to_iter->next;
+   }
+
+   result = can_all_from_reach_with_flag(&from_objs, PARENT2, PARENT1,
+ min_commit_date);
+
+   while (from) {
+   clear_commit_marks(from->item, PARENT1);
+   from = from->next;
+   }
+
+   while (to) {
+   clear_commit_marks(to->item, PARENT2);
+   to = to->next;
+   }
+
+   object_array_clear(&from_objs);
+   return result;
+}
diff --git a/commit-reach.h b/commit-reach.h
index b28bc22fcd..aa202c9703 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -72,5 +72,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
 unsigned int with_flag,
 unsigned int assign_flag,
 time_t min_commit_date);
+int can_all_from_reach(struct commit_list *from, struct commit_list *to,
+  int commit_date_cutoff);
 
 #endif
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index e32e193b70..c79729cac0 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -29,7 +29,7 @@ int cmd__reach(int ac, const char **av)
 {
struct object_id oid_A, oid_B;
struct commit *A, *B;
-   struct commit_list *X;
+   struct commit_list *X, *Y;
struct commit **X_array;
int X_nr, X_alloc;
struct strbuf buf = STRBUF_INIT;
@@ -41,7 +41,7 @@ int cmd__reach(int ac, const char **av)
exit(1);
 
A = B = NULL;
-   X = NULL;
+   X = Y = NULL;
X_nr = 0;
X_alloc = 16;
ALLOC_ARRAY(X_array, X_alloc);
@@ -86,6 +86,10 @@ int cmd__reach(int ac, const char **av)
X_array[X_nr++] = c;
break;
 
+   case 'Y':
+   commit_list_insert(c, &Y);
+   break;
+
default:
die("unexpected start of line: %c", buf.buf[0]);
}
@@ -106,6 +110,8 @@ int cmd__reach(int ac, const char **av)
struct commit_list *list = reduce_heads(X);
printf("%s(X):\n", av[1]);
print_sorted_commit_ids(list);
+   } else if (!strcmp(av[1], "can_all_from_reach")) {
+   printf("%s(X,Y):%d\n", av[1], can_all_from_reach(X, Y, 1));
}
 
exit(0);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 17c6467988..e41eb397a7 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh

[PATCH v2 12/18] test-reach: test get_merge_bases_many

2018-07-20 Thread Derrick Stolee

The get_merge_bases_many method returns a list of merge bases for a
single commit (A) against a list of commits (X). Some care is needed in
constructing the expected behavior because the result is not the
expected merge-base for an octopus merge with those parents but instead
the set of maximal commits that are reachable from A and at least one of
the commits in X.

Add get_merge_bases_many to 'test-tool reach' and create a test that
demonstrates that this output returns multiple results. Specifically, we
select a list of three commits such that we output two commits that are
reachable from one of the first two, respectively, and none are
reachable from the third.

Signed-off-by: Derrick Stolee 
---
 t/helper/test-reach.c | 31 +++
 t/t6600-test-reach.sh | 15 +++
 2 files changed, 46 insertions(+)

diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index dccbd48178..4df01187c9 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -4,13 +4,34 @@
 #include "commit-reach.h"
 #include "config.h"
 #include "parse-options.h"
+#include "string-list.h"
 #include "tag.h"
 
+static void print_sorted_commit_ids(struct commit_list *list)
+{
+   int i;
+   struct string_list s = STRING_LIST_INIT_DUP;
+
+   while (list) {
+   string_list_append(&s, oid_to_hex(&list->item->object.oid));
+   list = list->next;
+   }
+
+   string_list_sort(&s);
+
+   for (i = 0; i < s.nr; i++)
+   printf("%s\n", s.items[i].string);
+
+   string_list_clear(&s, 0);
+}
+
 int cmd__reach(int ac, const char **av)
 {
struct object_id oid_A, oid_B;
struct commit *A, *B;
struct commit_list *X;
+   struct commit **X_array;
+   int X_nr, X_alloc;
struct strbuf buf = STRBUF_INIT;
struct repository *r = the_repository;
 
@@ -21,6 +42,9 @@ int cmd__reach(int ac, const char **av)
 
A = B = NULL;
X = NULL;
+   X_nr = 0;
+   X_alloc = 16;
+   ALLOC_ARRAY(X_array, X_alloc);
 
while (strbuf_getline(&buf, stdin) != EOF) {
struct object_id oid;
@@ -58,6 +82,8 @@ int cmd__reach(int ac, const char **av)
 
case 'X':
commit_list_insert(c, &X);
+   ALLOC_GROW(X_array, X_nr + 1, X_alloc);
+   X_array[X_nr++] = c;
break;
 
default:
@@ -72,6 +98,11 @@ int cmd__reach(int ac, const char **av)
printf("%s(A,B):%d\n", av[1], in_merge_bases(A, B));
else if (!strcmp(av[1], "is_descendant_of"))
printf("%s(A,X):%d\n", av[1], is_descendant_of(A, X));
+   else if (!strcmp(av[1], "get_merge_bases_many")) {
+   struct commit_list *list = get_merge_bases_many(A, X_nr, 
X_array);
+   printf("%s(A,X):\n", av[1]);
+   print_sorted_commit_ids(list);
+   }
 
exit(0);
 }
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 98bcb17960..d43e1a61d5 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -123,4 +123,19 @@ test_expect_success 'is_descendant_of:miss' '
test_three_modes is_descendant_of
 '
 
+test_expect_success 'get_merge_bases_many' '
+   cat >input <<-\EOF &&
+   A:commit-5-7
+   X:commit-4-8
+   X:commit-6-6
+   X:commit-8-3
+   EOF
+   {
+   echo "get_merge_bases_many(A,X):" &&
+   git rev-parse commit-5-6 \
+ commit-4-7 | sort
+   } >expect &&
+   test_three_modes get_merge_bases_many
+'
+
 test_done
-- 
2.18.0.118.gd4f65b8d14

[PATCH v2 17/18] commit-reach: make can_all_from_reach... linear

2018-07-20 Thread Derrick Stolee

The can_all_from_reach_with_flags() algorithm is currently quadratic in
the worst case, because it calls the reachable() method for every 'from'
without tracking which commits have already been walked or which can
already reach a commit in 'to'.

Rewrite the algorithm to walk each commit a constant number of times.

We also add some optimizations that should work for the main consumer of
this method: fetch negotitation (haves/wants).

The first step includes using a depth-first-search (DFS) from each
'from' commit, sorted by ascending generation number. We do not walk
beyond the minimum generation number or the minimum commit date. This
DFS is likely to be faster than the existing reachable() method because
we expect previous ref values to be along the first-parent history.

If we find a target commit, then we mark everything in the DFS stack as
a RESULT. This expands the set of targets for the other 'from' commits.
We also mark the visited commits using 'assign_flag' to prevent re-
walking the same commits.

We still need to clear our flags at the end, which is why we will have a
total of three visits to each commit.

Performance was measured on the Linux repository using
'test-tool reach can_all_from_reach'. The input included rows seeded by
tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
v3 releases and want all major v4 releases." The "large" case included
X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
tags to the set, which does not greatly increase the number of objects
that are considered, but does increase the number of 'from' commits,
demonstrating the quadratic nature of the previous code.

Small Case:

Before: 1.52 s
 After: 0.26 s

Large Case:

Before: 3.50 s
 After: 0.27 s

Note how the time increases between the two cases in the two versions.
The new code increases relative to the number of commits that need to be
walked, but not directly relative to the number of 'from' commits.

Signed-off-by: Derrick Stolee 
---
 commit-reach.c | 122 ++---
 commit-reach.h |   9 ++--
 upload-pack.c  |   5 +-
 3 files changed, 83 insertions(+), 53 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index f5858944fd..bc522d6840 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -514,66 +514,87 @@ int commit_contains(struct ref_filter *filter, struct 
commit *commit,
return is_descendant_of(commit, list);
 }
 
-int reachable(struct commit *from, unsigned int with_flag,
- unsigned int assign_flag, time_t min_commit_date)
+static int compare_commits_by_gen(const void *_a, const void *_b)
 {
-   struct prio_queue work = { compare_commits_by_commit_date };
+   const struct commit *a = (const struct commit *)_a;
+   const struct commit *b = (const struct commit *)_b;
 
-   prio_queue_put(&work, from);
-   while (work.nr) {
-   struct commit_list *list;
-   struct commit *commit = prio_queue_get(&work);
-
-   if (commit->object.flags & with_flag) {
-   from->object.flags |= assign_flag;
-   break;
-   }
-   if (!commit->object.parsed)
-   parse_object(the_repository, &commit->object.oid);
-   if (commit->object.flags & REACHABLE)
-   continue;
-   commit->object.flags |= REACHABLE;
-   if (commit->date < min_commit_date)
-   continue;
-   for (list = commit->parents; list; list = list->next) {
-   struct commit *parent = list->item;
-   if (!(parent->object.flags & REACHABLE))
-   prio_queue_put(&work, parent);
-   }
-   }
-   from->object.flags |= REACHABLE;
-   clear_commit_marks(from, REACHABLE);
-   clear_prio_queue(&work);
-   return (from->object.flags & assign_flag);
+   if (a->generation < b->generation)
+   return -1;
+   if (a->generation > b->generation)
+   return 1;
+   return 0;
 }
 
 int can_all_from_reach_with_flag(struct object_array *from,
 unsigned int with_flag,
 unsigned int assign_flag,
-time_t min_commit_date)
+time_t min_commit_date,
+uint32_t min_generation)
 {
+   struct commit **list = NULL;
int i;
+   int result = 1;
 
+   ALLOC_ARRAY(list, from->nr);
for (i = 0; i < from->nr; i++) {
-   struct object *from_one = from->objects[i].item;
+   list[i] = (struct commit *)from->objects[i].item;
 
-   if (from_one->flags & assign_flag)
-   continue;
-   from_one = deref_tag(the_repository, from_one,

[PATCH v2 18/18] commit-reach: use can_all_from_reach

2018-07-20 Thread Derrick Stolee

The is_descendant_of method previously used in_merge_bases() to check if
the commit can reach any of the commits in the provided list. This had
two performance problems:

1. The performance is quadratic in worst-case.

2. A single in_merge_bases() call requires walking beyond the target
   commit in order to find the full set of boundary commits that may be
   merge-bases.

The can_all_from_reach method avoids this quadratic behavior and can
limit the search beyond the target commits using generation numbers. It
requires a small prototype adjustment to stop using commit-date as a
cutoff, as that optimization is no longer appropriate here.

Since in_merge_bases() uses paint_down_to_common(), is_descendant_of()
naturally found cutoffs to avoid walking the entire commit graph. Since
we want to always return the correct result, we cannot use the
min_commit_date cutoff in can_all_from_reach. We then rely on generation
numbers to provide the cutoff.

Since not all repos will have a commit-graph file, nor will we always
have generation numbers computed for a commit-graph file, create a new
method, generation_numbers_enabled(), that checks for a commit-graph
file and sees if the first commit in the file has a non-zero generation
number. In the case that we do not have generation numbers, use the old
logic for is_descendant_of().

Performance was meausured on a copy of the Linux repository using the
'test-tool reach is_descendant_of' command using this input:

A:v4.9
X:v4.10
X:v4.11
X:v4.12
X:v4.13
X:v4.14
X:v4.15
X:v4.16
X:v4.17
X.v3.0

Note that this input is tailored to demonstrate the quadratic nature of
the previous method, as it will compute merge-bases for v4.9 versus all
of the later versions before checking against v4.1.

Before: 0.26 s
 After: 0.21 s

Since we previously used the is_descendant_of method in the ref_newer
method, we also measured performance there using
'test-tool reach ref_newer' with this input:

A:v4.9
B:v3.19

Before: 0.10 s
 After: 0.08 s

By adding a new commit with parent v3.19, we test the non-reachable case
of ref_newer:

Before: 0.09 s
 After: 0.08 s

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 18 ++
 commit-graph.h |  6 ++
 commit-reach.c | 24 +---
 3 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index b0a55ad128..e9786fa864 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -233,6 +233,24 @@ static int prepare_commit_graph(struct repository *r)
return !!r->objects->commit_graph;
 }
 
+int generation_numbers_enabled(struct repository *r)
+{
+   uint32_t first_generation;
+   struct commit_graph *g;
+   if (!prepare_commit_graph(r))
+  return 0;
+
+   g = r->objects->commit_graph;
+
+   if (!g->num_commits)
+   return 0;
+
+   first_generation = get_be32(g->chunk_commit_data +
+   g->hash_len + 8) >> 2;
+
+   return !!first_generation;
+}
+
 static void close_commit_graph(void)
 {
free_commit_graph(the_repository->objects->commit_graph);
diff --git a/commit-graph.h b/commit-graph.h
index 76e098934a..0de8f88316 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -51,6 +51,12 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
+/*
+ * Return 1 if and only if the repository has a commit-graph
+ * file and generation numbers are computed in that file.
+ */
+int generation_numbers_enabled(struct repository *r);
+
 void write_commit_graph_reachable(const char *obj_dir, int append);
 void write_commit_graph(const char *obj_dir,
struct string_list *pack_indexes,
diff --git a/commit-reach.c b/commit-reach.c
index bc522d6840..c996524032 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -277,15 +277,25 @@ int is_descendant_of(struct commit *commit, struct 
commit_list *with_commit)
 {
if (!with_commit)
return 1;
-   while (with_commit) {
-   struct commit *other;
 
-   other = with_commit->item;
-   with_commit = with_commit->next;
-   if (in_merge_bases(other, commit))
-   return 1;
+   if (generation_numbers_enabled(the_repository)) {
+   struct commit_list *from_list = NULL;
+   int result;
+   commit_list_insert(commit, &from_list);
+   result = can_all_from_reach(from_list, with_commit, 0);
+   free_commit_list(from_list);
+   return result;
+   } else {
+   while (with_commit) {
+   struct commit *other;
+
+   other = with_commit->item;
+   with_commit = with_commit->next;
+   if (in_merge_bases(other, commit))
+   return 1;
+   }
+   return 0;
}
-   return 0;
 }
 
 /*
-- 
2.18.0.118.gd4f65b8d14

[PATCH v2 00/18] Consolidate reachability logic

2018-07-20 Thread Derrick Stolee

There are many places in Git that use a commit walk to determine
reachability between commits and/or refs. A lot of this logic is
duplicated.

I wanted to achieve the following:

Consolidate several different commit walks into one file
Reduce duplicate reachability logic
Increase testability (correctness and performance)
Improve performance of reachability queries
My approach is mostly in three parts:

I. Move code to a new commit-reach.c file.
II. Add a 'test-tool reach' command to test these methods directly.
III. Modify the logic by improving performance and calling methods with
similar logic but different prototypes.

The 'test-tool reach' command is helpful to make sure I don't break
anything as I change the logic, but also so I can test methods that are
normally only exposed by other more complicated commands. For instance,
ref_newer() is part of 'git push -f' and ok_to_give_up() is buried deep
within fetch negotiation. Both of these methods have some problematic
performance issues that are corrected by this series. As I discovered
them, it was clear that it would be better to consolidate walk logic
instead of discovering a new walk in another file hidden somewhere.

For the ok_to_give_up() method, I refactored the method so I could pull
the logic out of the depths of fetch negotiation. In the commit
"commit-reach: make can_all_from_reach... linear" I discuss how the
existing algorithm is quadratic and how we can make it linear. Also, we
can use heuristic knowledge about the shape of the commit graph and the
usual haves/wants to get some extra performance bonus. (The heuristic is
to do a DFS with first-parents first, and stop on first found result. We
expect haves/wants to include ref tips, which typically have their
previous values in their first-parent history.)

One major difference in this series versus the RFC is that I added a new
method 'generation_numbers_enabled()' to detect if we have a commit-graph
file with non-zero generation numbers. Using can_all_from_reach in
is_descendant_of is only faster if we have generation numbers as a cutoff.

V2 Update: The biggest material change in this version is that we drop the
method declarations from commit.h, which requires adding a lot of references
to commit-reach.h across the codebase. This change is in a commit on its own.
In addition, we have the following smaller changes:

* Use 'unsigned int' for the flag variables.

* Properly align the here-doc test input data.

* Use single rev-parse commands in test output, and pipe the OIDs through 'sort'

* Check output of parse_commit()

* Update flag documentation in object.h

* Add tests for commit_contains() including both algorithms.

* Reduce size of "mixed-mode" commit-graph to ensure we start commit walks
  'above' the graph and then walk into the commits with generation numbers.

Thanks,
-Stolee

This series is based on jt/commit-graph-per-object-store

Derrick Stolee (18):
  commit-reach: move walk methods from commit.c
  commit.h: remove method declarations
  commit-reach: move ref_newer from remote.c
  commit-reach: move commit_contains from ref-filter
  upload-pack: make reachable() more generic
  upload-pack: refactor ok_to_give_up()
  upload-pack: generalize commit date cutoff
  commit-reach: move can_all_from_reach_with_flags
  test-reach: create new test tool for ref_newer
  test-reach: test in_merge_bases
  test-reach: test is_descendant_of
  test-reach: test get_merge_bases_many
  test-reach: test reduce_heads
  test-reach: test can_all_from_reach_with_flags
  test-reach: test commit_contains
  commit-reach: replace ref_newer logic
  commit-reach: make can_all_from_reach... linear
  commit-reach: use can_all_from_reach

 Makefile|   2 +
 bisect.c|   1 +
 builtin/branch.c|   1 +
 builtin/commit.c|   1 +
 builtin/fetch.c |   1 +
 builtin/fmt-merge-msg.c |   1 +
 builtin/log.c   |   1 +
 builtin/merge-base.c|   1 +
 builtin/merge.c |   1 +
 builtin/pull.c  |   1 +
 builtin/receive-pack.c  |   1 +
 builtin/remote.c|   1 +
 builtin/rev-parse.c |   1 +
 commit-graph.c  |  18 ++
 commit-graph.h  |   6 +
 commit-reach.c  | 662 
 commit-reach.h  |  77 +
 commit.c| 358 --
 commit.h|  29 --
 fast-import.c   |   1 +
 http-push.c |   2 +-
 merge-recursive.c   |   1 +
 notes-merge.c   |   1 +
 object.h|   4 +-
 pack-bitmap-write.c |   1 +
 ref-filter.c| 146 +
 remote.c|  50 +--
 remote.h|   1 -
 revision.c  |   1 +
 sequencer.c |   1 +
 sha1-name.c |   1 +
 shallow.c   |   1 +
 submodule.c |   1 +
 t/helper/test-reach.c   | 130 
 t/helper/test-tool.c|   1 +
 t/helper/test-tool.h|   1 +
 t/t6600-test-reach

[PATCH v2 01/18] commit-reach: move walk methods from commit.c

2018-07-20 Thread Derrick Stolee

There are several commit walks in the codebase. Group them together into
a new commit-reach.c file and corresponding header. After we group these
walks into one place, we can reduce duplicate logic by calling
equivalent methods.

The method declarations in commit.h are not touched by this commit and
will be moved in a following commit. Many consumers need to point to
commit-reach.h and that would bloat this commit.

Signed-off-by: Derrick Stolee 
---
 Makefile   |   1 +
 commit-reach.c | 360 +
 commit-reach.h |  42 ++
 commit.c   | 358 
 object.h   |   2 +-
 5 files changed, 404 insertions(+), 359 deletions(-)
 create mode 100644 commit-reach.c
 create mode 100644 commit-reach.h

diff --git a/Makefile b/Makefile
index bb8bd67201..59781f4bc3 100644
--- a/Makefile
+++ b/Makefile
@@ -829,6 +829,7 @@ LIB_OBJS += column.o
 LIB_OBJS += combine-diff.o
 LIB_OBJS += commit.o
 LIB_OBJS += commit-graph.o
+LIB_OBJS += commit-reach.o
 LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
diff --git a/commit-reach.c b/commit-reach.c
new file mode 100644
index 00..8ab6044414
--- /dev/null
+++ b/commit-reach.c
@@ -0,0 +1,360 @@
+#include "cache.h"
+#include "prio-queue.h"
+#include "commit.h"
+#include "commit-reach.h"
+
+/* Remember to update object flag allocation in object.h */
+#define PARENT1(1u<<16)
+#define PARENT2(1u<<17)
+#define STALE  (1u<<18)
+#define RESULT (1u<<19)
+
+static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
+
+static int queue_has_nonstale(struct prio_queue *queue)
+{
+   int i;
+   for (i = 0; i < queue->nr; i++) {
+   struct commit *commit = queue->array[i].data;
+   if (!(commit->object.flags & STALE))
+   return 1;
+   }
+   return 0;
+}
+
+/* all input commits in one and twos[] must have been parsed! */
+static struct commit_list *paint_down_to_common(struct commit *one, int n,
+   struct commit **twos,
+   int min_generation)
+{
+   struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+   struct commit_list *result = NULL;
+   int i;
+   uint32_t last_gen = GENERATION_NUMBER_INFINITY;
+
+   one->object.flags |= PARENT1;
+   if (!n) {
+   commit_list_append(one, &result);
+   return result;
+   }
+   prio_queue_put(&queue, one);
+
+   for (i = 0; i < n; i++) {
+   twos[i]->object.flags |= PARENT2;
+   prio_queue_put(&queue, twos[i]);
+   }
+
+   while (queue_has_nonstale(&queue)) {
+   struct commit *commit = prio_queue_get(&queue);
+   struct commit_list *parents;
+   int flags;
+
+   if (commit->generation > last_gen)
+   BUG("bad generation skip %8x > %8x at %s",
+   commit->generation, last_gen,
+   oid_to_hex(&commit->object.oid));
+   last_gen = commit->generation;
+
+   if (commit->generation < min_generation)
+   break;
+
+   flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
+   if (flags == (PARENT1 | PARENT2)) {
+   if (!(commit->object.flags & RESULT)) {
+   commit->object.flags |= RESULT;
+   commit_list_insert_by_date(commit, &result);
+   }
+   /* Mark parents of a found merge stale */
+   flags |= STALE;
+   }
+   parents = commit->parents;
+   while (parents) {
+   struct commit *p = parents->item;
+   parents = parents->next;
+   if ((p->object.flags & flags) == flags)
+   continue;
+   if (parse_commit(p))
+   return NULL;
+   p->object.flags |= flags;
+   prio_queue_put(&queue, p);
+   }
+   }
+
+   clear_prio_queue(&queue);
+   return result;
+}
+
+static struct commit_list *merge_bases_many(struct commit *one, int n, struct 
commit **twos)
+{
+   struct commit_list *list = NULL;
+   struct commit_list *result = NULL;
+   int i;
+
+   for (i = 0; i < n; i++) {
+   if (one == twos[i])
+   /*
+* We do not mark this even with RESULT so we do not
+* have to clean it up.
+*/
+   return commit_list_insert(one, &result);
+   }
+
+   if (parse_commit(one))
+   return NULL;
+   for (i = 0; i < n; i++

[PATCH v2 13/18] test-reach: test reduce_heads

2018-07-20 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 t/helper/test-reach.c |  4 
 t/t6600-test-reach.sh | 22 ++
 2 files changed, 26 insertions(+)

diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 4df01187c9..e32e193b70 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -102,6 +102,10 @@ int cmd__reach(int ac, const char **av)
struct commit_list *list = get_merge_bases_many(A, X_nr, 
X_array);
printf("%s(A,X):\n", av[1]);
print_sorted_commit_ids(list);
+   } else if (!strcmp(av[1], "reduce_heads")) {
+   struct commit_list *list = reduce_heads(X);
+   printf("%s(X):\n", av[1]);
+   print_sorted_commit_ids(list);
}
 
exit(0);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index d43e1a61d5..17c6467988 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -138,4 +138,26 @@ test_expect_success 'get_merge_bases_many' '
test_three_modes get_merge_bases_many
 '
 
+test_expect_success 'reduce_heads' '
+   cat >input <<-\EOF &&
+   X:commit-1-10
+   X:commit-2-8
+   X:commit-3-6
+   X:commit-4-4
+   X:commit-1-7
+   X:commit-2-5
+   X:commit-3-3
+   X:commit-5-1
+   EOF
+   {
+   echo "reduce_heads(X):" &&
+   git rev-parse commit-5-1 \
+ commit-4-4 \
+ commit-3-6 \
+ commit-2-8 \
+ commit-1-10 | sort
+   } >expect &&
+   test_three_modes reduce_heads
+'
+
 test_done
-- 
2.18.0.118.gd4f65b8d14

Re: [PATCH 5/8] commit-graph: not compatible with replace objects

2018-07-20 Thread Derrick Stolee


On 7/18/2018 3:52 PM, Derrick Stolee wrote:

On 7/18/2018 3:46 PM, Jeff King wrote:
On Wed, Jul 18, 2018 at 08:15:41AM -0700, Derrick Stolee via 
GitGitGadget wrote:



From: Derrick Stolee 

Create new method commit_graph_compatible(r) to check if a given
repository r is compatible with the commit-graph feature. Fill the
method with a check to see if replace-objects exist. Test this
interaction succeeds, including ignoring an existing commit-graph and
failing to write a new commit-graph.

I think this approach is sensible. These are optimizations, and it's not
a big deal to just punt no cases we can't handle.

I do have one complaint about the implementation, though:


+static int commit_graph_compatible(struct repository *r)
+{
+    prepare_replace_object(r);
+    if (hashmap_get_size(&r->objects->replace_map->map))
+    return 0;
+
+    return 1;
+}

If I'm reading the code correctly, this will predicate the decision
entirely on the presence of refs in refs/replace/. But you may have a
situation where those refs exist, but you are not respecting them in
this run.

For instance, imagine[1] a server that hosts arbitrary repositories, but
wants to use commit graphs to speed up server-side operations (e.g.,
serving fetches, but also perhaps a web interface doing --contains,
etc). If it runs all of the server-side commands with
GIT_NO_REPLACE_OBJECTS=1, then there should be no problem. But if a user
pushes anything to refs/replace (because they _do_ use replace refs
locally, and want to share them with other clients), that would disable
graphs on the server.

So I think this should at least be loosened to:

   if (check_replace_refs) {
prepare_replace_object(r);
if (...)
   }

which would make this case work. I'd even go so far as to say that for
writing, we should just always ignore replace refs and generate the full
graph (just like pack-objects does so when writing a pack). Then the
resulting graph can be used selectively by disabling replace refs for
particular commands. But for the scenario I described above, the
distinction is mostly academic, as replacements would be disabled anyway
during the write command anyway.

-Peff

[1] You can probably guess that this is how GitHub handles replace refs.
 We ran into this long ago because replacements and grafts mess up
 any other caches external to Git that rely on the immutability of
 the hash.

 We do it with a config option, though, which requires a trivial
 patch. I'll post that momentarily.


Thanks for the details! I never considered that someone would have 
these refs around, but would ignore them most of the time.


The biggest reason I wanted to punt here was that it is easy to toggle 
between using replace refs and not using them. Writing and reading as 
long as we are ignoring those refs is a good idea, and I'll use that 
approach in v2.



Thanks, Peff, for improving the check_replace_refs interaction [1].

Since this series now has two dependencies (including Stefan's ref-store 
fix [2] that I had included in my v1), I'll let those topics settle 
before I send a new v2.


If there are more comments about how I'm handling these cases, then 
please jump in and tell me. I'll revisit this topic in a few weeks.


Thanks,

-Stolee

[1] 
https://public-inbox.org/git/20180718204449.ga31...@sigill.intra.peff.net/T/#u


 [PATCH 1/3] check_replace_refs: fix outdated comment

[2] 
https://public-inbox.org/git/20180717224935.96397-1-sbel...@google.com/T/#u


    [PATCH 0/2] RFC ref store to repository migration

Re: [PATCH 5/8] commit-graph: not compatible with replace objects

2018-07-20 Thread Stefan Beller

> Thanks, Peff, for improving the check_replace_refs interaction [1].
>
> Since this series now has two dependencies (including Stefan's ref-store
> fix [2] that I had included in my v1), I'll let those topics settle
> before I send a new v2.

FYI: I have not been working on, nor plan to work on (in the short term)
on the ref store fix series, which needs another abstraction layer IIUC to
make it easier to integrate such a thing as well as extend the series to all
functions in the refstore.

Feel free to take over that series partially (and defer the extension for
all functions in the ref store until later) ?

Stefan

Re: [PATCH 5/8] commit-graph: not compatible with replace objects

2018-07-20 Thread Derrick Stolee


On 7/20/2018 12:49 PM, Stefan Beller wrote:

Thanks, Peff, for improving the check_replace_refs interaction [1].

Since this series now has two dependencies (including Stefan's ref-store
fix [2] that I had included in my v1), I'll let those topics settle
before I send a new v2.

FYI: I have not been working on, nor plan to work on (in the short term)
on the ref store fix series, which needs another abstraction layer IIUC to
make it easier to integrate such a thing as well as extend the series to all
functions in the refstore.

Feel free to take over that series partially (and defer the extension for
all functions in the ref store until later) ?


OK. Thanks for letting me know.

Perhaps I'll give it a try to pass the repository struct in the cbdata 
for the function pointers, as a minimum amount of change option.


Thanks,

-Stolee

Re: [PATCH v2 00/18] Consolidate reachability logic

2018-07-20 Thread Stefan Beller

Hi Derrick,

> V2 Update: The biggest material change in this version is that we drop the
> method declarations from commit.h, which requires adding a lot of references
> to commit-reach.h across the codebase. This change is in a commit on its own.
> In addition, we have the following smaller changes:

Is there a remote available to get this series from?

> * Use 'unsigned int' for the flag variables.
>
> * Properly align the here-doc test input data.
>
> * Use single rev-parse commands in test output, and pipe the OIDs through 
> 'sort'
>
> * Check output of parse_commit()
>
> * Update flag documentation in object.h
>
> * Add tests for commit_contains() including both algorithms.
>
> * Reduce size of "mixed-mode" commit-graph to ensure we start commit walks
>   'above' the graph and then walk into the commits with generation numbers.

A range diff would be nice (though I can just look at all patches again
even if it takes longer).

I notice this is not sent via the GGG, but via git send-email?

Thanks,
Stefan

Re: [PATCH v2 00/18] Consolidate reachability logic

2018-07-20 Thread Derrick Stolee


On 7/20/2018 1:10 PM, Stefan Beller wrote:

Hi Derrick,


V2 Update: The biggest material change in this version is that we drop the
method declarations from commit.h, which requires adding a lot of references
to commit-reach.h across the codebase. This change is in a commit on its own.
In addition, we have the following smaller changes:

Is there a remote available to get this series from?


Sure! It's on my fork [1]

[1] https://github.com/derrickstolee/git/tree/reach/refactor



* Use 'unsigned int' for the flag variables.

* Properly align the here-doc test input data.

* Use single rev-parse commands in test output, and pipe the OIDs through 'sort'

* Check output of parse_commit()

* Update flag documentation in object.h

* Add tests for commit_contains() including both algorithms.

* Reduce size of "mixed-mode" commit-graph to ensure we start commit walks
   'above' the graph and then walk into the commits with generation numbers.

A range diff would be nice (though I can just look at all patches again
even if it takes longer).


I can send a diff. It's a bit big because of the indenting changes.


I notice this is not sent via the GGG, but via git send-email?


GGG can do version updates, but is currently having trouble when the 
branch has conflicts with the target [2]. We will address this issue 
next week, but I wanted to get this version out. Thank you for your 
patience as we work out the kinks.


[2] https://github.com/gitgitgadget/gitgitgadget/issues/25

Thanks,

-Stolee

Re: [PATCH v2 00/18] Consolidate reachability logic

2018-07-20 Thread Derrick Stolee

Here is the diff between v1 and v2.

Thanks,
-Stolee

---

diff --git a/bisect.c b/bisect.c
index e1275ba79e..d023543c91 100644
--- a/bisect.c
+++ b/bisect.c
@@ -13,6 +13,7 @@
 #include "sha1-array.h"
 #include "argv-array.h"
 #include "commit-slab.h"
+#include "commit-reach.h"
 
 static struct oid_array good_revs;
 static struct oid_array skipped_revs;
diff --git a/builtin/branch.c b/builtin/branch.c
index a50632fb23..9a787447f4 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -23,6 +23,7 @@
 #include "ref-filter.h"
 #include "worktree.h"
 #include "help.h"
+#include "commit-reach.h"
 
 static const char * const builtin_branch_usage[] = {
N_("git branch [] [-r | -a] [--merged | --no-merged]"),
diff --git a/builtin/commit.c b/builtin/commit.c
index 158e3f843a..b5c608458e 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -33,6 +33,7 @@
 #include "sequencer.h"
 #include "mailmap.h"
 #include "help.h"
+#include "commit-reach.h"
 
 static const char * const builtin_commit_usage[] = {
N_("git commit [] [--] ..."),
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f5d960baec..7de234774b 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -22,6 +22,7 @@
 #include "utf8.h"
 #include "packfile.h"
 #include "list-objects-filter-options.h"
+#include "commit-reach.h"
 
 static const char * const builtin_fetch_usage[] = {
N_("git fetch [] [ [...]]"),
diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
index ff165c0fcd..7277d557b2 100644
--- a/builtin/fmt-merge-msg.c
+++ b/builtin/fmt-merge-msg.c
@@ -12,6 +12,7 @@
 #include "fmt-merge-msg.h"
 #include "gpg-interface.h"
 #include "repository.h"
+#include "commit-reach.h"
 
 static const char * const fmt_merge_msg_usage[] = {
N_("git fmt-merge-msg [-m ] [--log[=] | --no-log] [--file 
]"),
diff --git a/builtin/log.c b/builtin/log.c
index 55a6286d7f..333d97c692 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -31,6 +31,7 @@
 #include "progress.h"
 #include "commit-slab.h"
 #include "repository.h"
+#include "commit-reach.h"
 
 #define MAIL_DEFAULT_WRAP 72
 
diff --git a/builtin/merge-base.c b/builtin/merge-base.c
index 08d91b1f0c..1c92099070 100644
--- a/builtin/merge-base.c
+++ b/builtin/merge-base.c
@@ -7,6 +7,7 @@
 #include "revision.h"
 #include "parse-options.h"
 #include "repository.h"
+#include "commit-reach.h"
 
 static int show_merge_base(struct commit **rev, int rev_nr, int show_all)
 {
diff --git a/builtin/merge.c b/builtin/merge.c
index d1b547d973..4c601c40a2 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -36,6 +36,7 @@
 #include "packfile.h"
 #include "tag.h"
 #include "alias.h"
+#include "commit-reach.h"
 
 #define DEFAULT_TWOHEAD (1<<0)
 #define DEFAULT_OCTOPUS (1<<1)
diff --git a/builtin/pull.c b/builtin/pull.c
index 4e78935392..15ad010968 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -22,6 +22,7 @@
 #include "tempfile.h"
 #include "lockfile.h"
 #include "wt-status.h"
+#include "commit-reach.h"
 
 enum rebase_type {
REBASE_INVALID = -1,
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 400d31c18c..d8467f9734 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -27,6 +27,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "protocol.h"
+#include "commit-reach.h"
 
 static const char * const receive_pack_usage[] = {
N_("git receive-pack "),
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 0f09bbbf65..455f62246d 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -14,6 +14,7 @@
 #include "revision.h"
 #include "split-index.h"
 #include "submodule.h"
+#include "commit-reach.h"
 
 #define DO_REVS1
 #define DO_NOREV   2
diff --git a/commit-reach.c b/commit-reach.c
index 9eb6225403..c996524032 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -4,6 +4,7 @@
 #include "decorate.h"
 #include "prio-queue.h"
 #include "tree.h"
+#include "ref-filter.c"
 #include "revision.h"
 #include "tag.h"
 #include "commit-reach.h"
@@ -536,7 +537,8 @@ static int compare_commits_by_gen(const void *_a, const 
void *_b)
 }
 
 int can_all_from_reach_with_flag(struct object_array *from,
-int with_flag, int assign_flag,
+unsigned int with_flag,
+unsigned int assign_flag,
 time_t min_commit_date,
 uint32_t min_generation)
 {
@@ -548,9 +550,8 @@ int can_all_from_reach_with_flag(struct object_array *from,
for (i = 0; i < from->nr; i++) {
list[i] = (struct commit *)from->objects[i].item;
 
-   parse_commit(list[i]);
-
-   if (list[i]->generation < min_generation)
+   if (parse_commit(list[i]) ||
+   list[i]->generation < min_generation)
return 0;
}
 
@@ -578,9 +579,8 @@ int can_all_from_reach_with_flag(struct object_array *from,

Re: 2.18.0 Regression: packing performance and effectiveness

2018-07-20 Thread Elijah Newren

On Thu, Jul 19, 2018 at 10:28 PM, Jeff King  wrote:
> On Thu, Jul 19, 2018 at 04:11:01PM -0700, Elijah Newren wrote:
>
>> Looking at the output from Peff's instrumentation elsewhere in this
>> thread, I see a lot of lines like
>>mismatched get: 32889efd307c7be376da9e3d45a78305f14ba73a = (, 28)
>> Does that mean it was reading the array when it wasn't ready?
>
> Yes, it looks like we saw a "get" without a "set". Though this could
> also be due to threading. The tracing isn't atomic with respect to the
> actual get/set operation, so it's possible that the ordering of the
> trace output does not match the ordering of the actual operations.
>
>> However, it's interesting to also look at the effect on packing
>> linux.git (on the same beefy hardware):
>>
>> Version  Pack (MB)  MaxRSS(kB)  Time (s)
>> ---  -  --  
>>  2.17.0 1279 11382932  632.24
>>  2.18.0 1279 10817568  621.97
>>  fiv-v4 1279 11484168 1193.67
>>
>> While the pack size is nice and small, the original memory savings
>> added in 2.18.0 are gone and the performance is much worse.  :-(
>
> Interesting. I can't reproduce here. The fix-v4 case is only slightly
> slower than 2.18.0. Can you double check that your compiler flags, etc,
> were the same? Many times I've accidentally compared -O0 to -O0. :)

Same build flags each time, but as Duy points out, this was on a
40-processor box.

> You might also try the patch below (on top of fix-v4), which moves the
> locking to its own dedicated mutex. That should reduce lock contention,
> and it fixes the remaining realloc where I think we're still racy. On my
> repack of linux.git, it dropped the runtime from 6m3s to 5m41s, almost
> entirely in system CPU.

I had to add an include of pthread.h to get it to compile, but
otherwise I'm trying it out.  Re-running a few times with each version
to see how big the run-to-run variance is.

> I didn't measure my max rss. However, I'd caution slightly against
> drawing too much conclusion from it, for two reasons:
>
>   1. RSS includes mmap'd packfiles, which is subject to whatever pages
>  the OS feels like keeping in RAM. So using more heap can sometimes
>  go unnoticed in that count, since you're just trading heap pages
>  for mmap pages. Although that implies some memory pressure, and it
>  sounds like your machine is sufficiently beefy to avoid that.
>
>   2. Peak heap is going to depend on the threading. You have one thread
>  per CPU working on a window of objects, each of which will be in
>  memory at once. So I'd expect a fair bit of fluctuation in the peak
>  just depending on how the threads line up with each other (some of
>  it random, and some of it maybe impacted by what the code does, but
>  in a way that just happens to fall out for this specific workload).
>
> Which isn't to say measuring it is useless. The trends may override the
> noise from those two things. I've just run into problems in the past
> trying to get consistent measurements.

Good to know.  I was actually measuring it due to commit 3b13a5f26387
("pack-objects: reorder members to shrink struct object_entry",
2018-04-14) referencing
https://public-inbox.org/git/87po42cwql@evledraar.gmail.com/ as
being the measurement methodology to back up the results of the
original purpose of the change.  (I did miss out on the "run 3 times,
report the lowest value" that Ævar did, though.)  I figured that as
long as we were further tweaking things along the lines of that patch
series, it made sense to try to do similar measurements to see if we
were at least improving things relative to 2.17.0.

> Here's the lock patch.
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index ba14a1bfbc..b76ce04cb9 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1926,12 +1926,12 @@ static unsigned long oe_delta_size(struct 
> packing_data *pack,
>  {
> unsigned long size;
>
> -   read_lock(); /* to protect access to pack->delta_size[] */
> +   pack_delta_lock(pack);
> if (pack->delta_size)
> size = pack->delta_size[e - pack->objects];
> else
> size = e->delta_size_;
> -   read_unlock();
> +   pack_delta_unlock(pack);
> return size;
>  }
>
> @@ -1939,10 +1939,10 @@ static void oe_set_delta_size(struct packing_data 
> *pack,
>   struct object_entry *e,
>   unsigned long size)
>  {
> -   read_lock(); /* to protect access to pack->delta_size[] */
> +   pack_delta_lock(pack);
> if (!pack->delta_size && size < pack->oe_delta_size_limit) {
> e->delta_size_ = size;
> -   read_unlock();
> +   pack_delta_unlock(pack);
> return;
> }
> /*
> @@ -1963,7 +1963,7 @@ static void oe_set_delta_size(struct packing_data *pack,
> pack->del

Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-20 Thread Jeff King

On Fri, Jul 20, 2018 at 05:39:43PM +0200, Nguyễn Thái Ngọc Duy wrote:

> Let's start with some background about oe_delta_size() and
> oe_set_delta_size(). If you already know, skip the next paragraph.
> 
> These two are added in 0aca34e826 (pack-objects: shrink delta_size
> field in struct object_entry - 2018-04-14) to help reduce 'struct
> object_entry' size. The delta size field in this struct is reduced to
> only contain max 2MB. So if any new delta is produced and larger than
> 2MB, it's dropped because we can't really save such a large size
> anywhere. Fallback is provided in case existingpackfiles already have
> large deltas, then we can retrieve it from the pack.

Minor nit, but isn't this 1MB (it was 2MB after one of your patches, but
I think v2.18.0 has 20 bits)?

> [...more commit message...]

Nicely explained overall.

> With this, we do not have to drop deltas in try_delta() anymore. Of
> course the downside is we use slightly more memory, even compared to
> 2.17.0. But since this is considered an uncommon case, a bit more
> memory consumption should not be a problem.

I wondered how common this might be. The easiest way to see the largest
delta sizes is:

  git cat-file --batch-all-objects \
   --batch-check='%(objectsize:disk) %(deltabase)' |
  grep -v  |
  sort -rn | head

The biggest one in the kernel is ~300k. Which is about what I'd expect
for a normal source code repo. Even some private repos I have with a lot
of binary artifacts top out at about 3MB. So the new 32MB is probably
pretty unlikely to trigger, unless you really do have a bunch of huge
artifacts.

If you do, then more memory pressure isn't great. But as a relative
measure, the extra couple megabytes to store one 32-bit int per object
is probably fine.

> A note about thread synchronization. Since this code can be run in
> parallel during delta searching phase, we need a mutex. The realloc
> part in packlist_alloc() is not protected because it only happens
> during the object counting phase, which is always single-threaded.

Right, thanks for clarifying this here.

> The locking in oe_delta_size() could also be dropped if we make sure
> the pack->delta_size pointer assignment in oe_set_delta_size() is
> atomic. But let's keep the lock for now to be on the safe side. Lock
> contention should not be that high for this lock.

Yeah, I agree with this, now that we're not using the read_lock().

> [1] With a small tweak. 2.17.0 on 64-bit linux can hold 2^64 byte
> deltas, which is absolutely insane. But windows builds, even
> 64-bit version, only hold 2^32. So reducing it to 2^32 should be
> quite safe.

I'm not sure I completely agree with this. While 4GB deltas should be
pretty rare, the nice thing about 64-bit is that you never have to even
think about whether the variable is large enough. I think the 2^32
limitations on Windows are something we should be fixing in the long
term (though there it is even worse because it is not just deltas, but
entire objects).

> ---
>  I'm optimistic that the slowness on linux repo is lock contention
>  (because if it's not then I have no clue what is). So let's start
>  doing proper patches.

Me too, but I'd love to see more numbers from Elijah.

>  If we want a quick fix for 2.18.1. I suggest bumping up
>  OE_DELTA_SIZE_BITS like Elijah did in his second option. I don't
>  think it's safe to fast track this patch.

Also agreed.

> @@ -2278,6 +2274,8 @@ static void init_threaded_search(void)
>   pthread_mutex_init(&cache_mutex, NULL);
>   pthread_mutex_init(&progress_mutex, NULL);
>   pthread_cond_init(&progress_cond, NULL);
> + pthread_mutex_init(&to_pack.lock, NULL);
> + to_pack.lock_initialized = 1;
>   old_try_to_free_routine = 
> set_try_to_free_routine(try_to_free_from_threads);
>  }

This is new in this iteration. I guess this is to cover the case where
we are built with pthread support, but --threads=1?

Given that we no longer have to touch this lock during the realloc, is
it worth actually putting it into to_pack? Instead, we could keep it
local to pack-objects, alongside all the other locks (and use the
lock_mutex() helper which handles the single-thread case).

Your original patch had to copy the oe_* helpers into the file to handle
that. But I think we're essentially just locking the whole functions:

> @@ -330,20 +353,37 @@ static inline void oe_set_size(struct packing_data 
> *pack,
>  static inline unsigned long oe_delta_size(struct packing_data *pack,
> const struct object_entry *e)
>  {
> - if (e->delta_size_valid)
> - return e->delta_size_;
> - return oe_size(pack, e);
> + unsigned long size;
> +
> + packing_data_lock(pack);
> + if (pack->delta_size)
> + size = pack->delta_size[e - pack->objects];
> + else
> + size = e->delta_size_;
> + packing_data_unlock(pack);
> + return size;
>

Re: [PATCH v2 00/18] Consolidate reachability logic

2018-07-20 Thread Duy Nguyen

On Fri, Jul 20, 2018 at 6:35 PM Derrick Stolee  wrote:
>
> There are many places in Git that use a commit walk to determine
> reachability between commits and/or refs. A lot of this logic is
> duplicated.
>
> I wanted to achieve the following:
>
> Consolidate several different commit walks into one file

I'm surprised get_shallow_commits() in shallow.c didn't make the cut.
It's no problem though if you already considered it and decided it was
better left alone.

> Reduce duplicate reachability logic
> Increase testability (correctness and performance)
> Improve performance of reachability queries

What's your recommendation on adding new commit reachability code? I
might have to add one to fix prune_shallow() if I don't find anything
fit. I guess the code should go to commit-reach.c too?
-- 
Duy

Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-20 Thread Elijah Newren

On Fri, Jul 20, 2018 at 8:39 AM, Nguyễn Thái Ngọc Duy  wrote:
> Let's start with some background about oe_delta_size() and
> oe_set_delta_size(). If you already know, skip the next paragraph.
>
> These two are added in 0aca34e826 (pack-objects: shrink delta_size
> field in struct object_entry - 2018-04-14) to help reduce 'struct
> object_entry' size. The delta size field in this struct is reduced to
> only contain max 2MB. So if any new delta is produced and larger than

I think you mean 1MB?

$ git grep OE_DELTA_SIZE_BITS v2.18.0
v2.18.0:builtin/pack-objects.c: if (delta_size >= (1U << OE_DELTA_SIZE_BITS)) {
v2.18.0:pack-objects.h:#define OE_DELTA_SIZE_BITS   20
v2.18.0:pack-objects.h: unsigned delta_size_:OE_DELTA_SIZE_BITS; /*
delta data size (uncompressed) */

In https://public-inbox.org/git/20180719151640.ga24...@duynguyen.home/,
you bumped this to 21, which may be where you got the 2MB figure?
Your 2MB figure also appears on the next line and a few other places
in the commit message.

> 2MB, it's dropped because we can't really save such a large size
> anywhere. Fallback is provided in case existingpackfiles already have

Missing space: s/existingpackfiles/existing packfiles/

> large deltas, then we can retrieve it from the pack.
>
> While this should help small machines repacking large repos (i.e. less

Maybe change "repacking large repos" to "repacking large repos without
large deltas"?

> memory pressure), dropping large deltas during the delta selection
> process could end up with worse pack files. And if existing packfiles
> already have >2MB delta and pack-objects is instructed to not reuse
> deltas, all of them will be dropped on the floor, and the resulting
> pack would be definitely bigger.
>
> There is also a regression in terms of CPU/IO if we have large on-disk
> deltas because fallback code needs to parse the pack every time the
> delta size is needed and just access to the mmap'd pack data is enough
> for extra page faults when memory is under pressure.
>
> Both of these issues were reported on the mailing list. Here's some
> numbers for comparison.
>
> Version  Pack (MB)  MaxRSS(kB)  Time (s)
> ---  -  --  
>  2.17.0 5498 435136282494.85
>  2.18.010531 404495964168.94
>
> This patch provides a better fallback that is
>
> - cheaper in terms of cpu and io because we won't have to read
>   existing pack files as much
>
> - better in terms of pack size because the pack heuristics is back to
>   2.17.0 time, we do not drop large deltas at all
>
> If we encounter any delta (on-disk or created during try_delta phase)
> that is larger than the 2MB limit, we stop using delta_size_ field for
> this because it can't contain such size anyway. A new array of delta
> size is dynamically allocated and can hold all the deltas that 2.17.0
> can [1]. All current delta sizes are migrated over.
>
> With this, we do not have to drop deltas in try_delta() anymore. Of
> course the downside is we use slightly more memory, even compared to
> 2.17.0. But since this is considered an uncommon case, a bit more
> memory consumption should not be a problem.

Out of curiosity, would it be possible to use the delta_size_ field
for deltas that are small enough, and only use an external data
structure (perhaps a hash rather than an array) for the few deltas
that are large?

> Delta size limit is also raised from 2MB to 32 MB to better cover
> common case and avoid that extra memory consumption (99.999% deltas in
> this reported repo are under 12MB). Other fields are shuffled around
> to keep this struct packed tight. We don't use more memory in common
> case even with this limit update.
>
> A note about thread synchronization. Since this code can be run in
> parallel during delta searching phase, we need a mutex. The realloc
> part in packlist_alloc() is not protected because it only happens
> during the object counting phase, which is always single-threaded.
>
> The locking in oe_delta_size() could also be dropped if we make sure
> the pack->delta_size pointer assignment in oe_set_delta_size() is
> atomic. But let's keep the lock for now to be on the safe side. Lock
> contention should not be that high for this lock.
>
> [1] With a small tweak. 2.17.0 on 64-bit linux can hold 2^64 byte
> deltas, which is absolutely insane. But windows builds, even
> 64-bit version, only hold 2^32. So reducing it to 2^32 should be
> quite safe.
>
> Reported-by: Elijah Newren 
> Helped-by: Elijah Newren 
> Helped-by: Jeff King 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  I'm optimistic that the slowness on linux repo is lock contention
>  (because if it's not then I have no clue what is). So let's start
>  doing proper patches.

I'll be testing this shortly...

>
>  If we want a quick fix for 2.18.1. I suggest bumping up
>  OE_DELTA_SIZE_BITS like Elijah did in his second option. I don't
>  think it's safe to fast track this patch.

Okay, I'll submit

Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Jeff King

On Fri, Jul 20, 2018 at 02:32:29AM -0700, Junio C Hamano wrote:

> > Contrast this with memcpy(). This is on Microsoft's SDL banned list[1],
> > but I think it's silly for it to be. I would never add it to this list.
> 
> A tangent, but is that because they want you to use memmove()
> instead so that you do not have to worry about overlapping copies,
> perhaps?

That was my first thought, too, but nope. They recommend memcpy_s()
instead. Which in my opinion adds very little value, while missing the
most common misuse of memcpy I've seen in practice (the overlapping
thing).

Helpers like our COPY_ARRAY() are much more useful for preventing sizing
mistakes, IMHO. But again, I'd never ban memcpy. The right tool for
encouraging COPY_ARRAY() is coccinelle (because the matching is
complicated, but also because we can mechanically turn it into the right
thing, whereas a strcpy is going to require some manual reworking).

-Peff

Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Elijah Newren

On Fri, Jul 20, 2018 at 7:41 AM, Duy Nguyen  wrote:
> On Thu, Jul 19, 2018 at 10:40 PM Jeff King  wrote:
>>
>> There are a few standard C functions (like strcpy) which are
>> easy to misuse. We generally discourage these in reviews,
>> but we haven't put advice in CodingGuidelines, nor provided
>> any automated enforcement. The latter is especially
>> important because it's more consistent, and it can often
>> save a round-trip of review.
>>
>> Let's start by banning strcpy() and sprintf(). It's not
>> impossible to use these correctly, but it's easy to do so
>> incorrectly, and there's always a better option.
>
> Is it possible to extend this to ban variables as well? I'm still
> going to delete the_index from library code. Once that work is done I

Or perhaps constants, such as PATH_MAX to avoid problems like this one
from 2.18.0 timeframe:
https://public-inbox.org/git/7d1237c7-5a83-d766-7d93-5f0d59166...@web.de/

ag/rebase-i-in-c, was Re: What's cooking in git.git (Jul 2018, #02; Wed, 18)

2018-07-20 Thread Alban Gruin

Hi Junio,

Le 19/07/2018 à 00:03, Junio C Hamano a écrit :
> * ag/rebase-i-in-c (2018-07-10) 13 commits
>  - rebase -i: rewrite the rest of init_revisions_and_shortrevisions in C
>  - rebase -i: implement the logic to initialize the variable $revision in C
>  - rebase--interactive: remove unused modes and functions
>  - rebase--interactive: rewrite complete_action() in C
>  - sequencer: change the way skip_unnecessary_picks() returns its result
>  - sequencer: refactor append_todo_help() to write its message to a buffer
>  - rebase -i: rewrite checkout_onto() in C
>  - rebase -i: rewrite setup_reflog_action() in C
>  - sequencer: add a new function to silence a command, except if it fails
>  - rebase-interactive: rewrite the edit-todo functionality in C
>  - editor: add a function to launch the sequence editor
>  - rebase--interactive: rewrite append_todo_help() in C
>  - sequencer: make two functions and an enum from sequencer.c public
> 
>  Piecemeal rewrite of the remaining "rebase -i" machinery in C.
> 
>  Expecting a reroll.
> 
>  The early parts of the series seem solidifying; perhaps with a
>  reroll or two, they become 'next' material?

I am working on new changes (rewriting init_basic_state(), and making
rebase--interactive a builtin), so it will probably need at least one
more reroll before being trully ready for 'next'.  It’s not completely
finished yet, I hope to send it Monday or Tuesday.

Cheers,
Alban

Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Jeff King

On Fri, Jul 20, 2018 at 09:22:41AM -0400, Theodore Y. Ts'o wrote:

> On Thu, Jul 19, 2018 at 09:08:08PM -0400, Jeff King wrote:
> > Ditto for sprintf, where you should _always_ be using at least xsnprintf
> > (or some better tool, depending on the situation).  And for strncpy,
> > strlcpy (or again, some better tool) is strictly an improvement.
> 
> Nitpick: this may be true for git, but it's not strictly true in all
> cases.  I actually have a (non-git) case where strncpy *is* the right
> tool.  And this is one where I'm copying a NUL-terminated string into
> a fixed-length charater array (in the ext4 superblock) which is *not*
> NUL-terminated.  In that case, strncpy works(); but strlcpy() does not
> do what I want.

Thanks for an interesting (and exotic) counter-point. For strcpy(), you
always have to run strlen() anyway to know you're not going to overflow,
so you might as well memcpy() at that point. But if you really are OK
with truncation, then using strncpy() lets you copy while walking over
the string only once, which is more efficient.  On the other hand,
strncpy() fills out NULs to the end of the destination array, so you are
paying an extra cost for small strings.

> So I used strncpy() advisedly, and I ignore people running Coccinelle
> scripts and blindly sending patches to "fix" ext4.

Even after hearing your counter-point, I think I'm still OK with the
ban. Because as you've noticed, even if the call is fine, it carries an
ongoing maintenance cost. Every time you (or somebody else) audits, you
have to skip over that known-good call. And you have to deal with
well-meaning patches. So to me there's value in eliminating it even if
it's an acceptable tool.

We don't have any remaining calls to strncpy() in Git, so this lets us
punt on deciding if the ban is worth changing what's there. We can wait
for somebody to decide they _really_ need strncpy, and we can discuss it
then with a concrete case.

> But perhaps that's also a solution for git?  You don't have to
> necessarily put them on a banned list; you could instead have some
> handy, pre-set scripts that scan the entire code base looking for
> "bad" functions with cleanups automatically suggested.  This could be
> run at patch review time, and/or periodically (especially before a
> release).

We do this already with coccinelle for some transformations. But I think
there's real value in linting that's always run, and is cheap. Catching
these things early means we don't have to spend time on them in review,
or dealing with follow-up patches.

-Peff

Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Jeff King

On Fri, Jul 20, 2018 at 04:41:34PM +0200, Duy Nguyen wrote:

> > Let's start by banning strcpy() and sprintf(). It's not
> > impossible to use these correctly, but it's easy to do so
> > incorrectly, and there's always a better option.
> 
> Is it possible to extend this to ban variables as well? I'm still
> going to delete the_index from library code. Once that work is done I
> will ban it from entering again (it's only allowed in builtin/ for
> example). The next target after that would be the_repository.
> 
> Right now I will do this by not declaring the variable [2], which ends
> up with a much less friendly compile warning. I did something similar
> [1] in an earlier iteration but did not do extensive research on this
> topic like you did.

IMHO just not declaring the variable is the right move (and is what I
would do with these functions if libc wasn't already declaring them).

The compile error may be less friendly, but these things are temporary
as topics-in-flight catch up. Whereas strcpy() will be with us forever.

I also think that removing variables is a special case of a larger
technique: when we change function semantics, we try to change the
signatures, too. So there you'll get a type error, or not enough
arguments, or whatever. And the next step is usually "git log" or "git
blame" to see what happened, and what the conversion should look like.

-Peff

Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Jeff King

On Fri, Jul 20, 2018 at 10:48:37AM -0700, Elijah Newren wrote:

> > Is it possible to extend this to ban variables as well? I'm still
> > going to delete the_index from library code. Once that work is done I
> 
> Or perhaps constants, such as PATH_MAX to avoid problems like this one
> from 2.18.0 timeframe:
> https://public-inbox.org/git/7d1237c7-5a83-d766-7d93-5f0d59166...@web.de/

I've been slowly trying to eradicate PATH_MAX from our code base. And I
would be happy with an eventual automated ban there. Unlike the_index,
it comes from the system, so it's in the same boat as strcpy() etc.

That said, I think it's less urgent. The urgent problem fixed by the
patch you linked was the use of strcpy() to overflow the buffer. Without
that, it just becomes a normal bug where we do not handle long paths
well on some operating systems.

-Peff

Re: [PATCH v2 00/18] Consolidate reachability logic

2018-07-20 Thread Eric Sunshine

On Fri, Jul 20, 2018 at 1:20 PM Derrick Stolee  wrote:
> Here is the diff between v1 and v2.
>
> diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
> @@ -67,142 +67,176 @@ test_three_modes () {
>  test_expect_success 'get_merge_bases_many' '
> cat >input <<-\EOF &&
> +   A:commit-5-7
> +   X:commit-4-8
> +   X:commit-6-6
> +   X:commit-8-3
> EOF
> {
> -   printf "get_merge_bases_many(A,X):\n" &&
> -   git rev-parse commit-5-6 &&
> -   git rev-parse commit-4-7
> +   echo "get_merge_bases_many(A,X):" &&
> +   git rev-parse commit-5-6 \
> + commit-4-7 | sort

Pipes lose the exit code of the all upstream commands. When a Git
command is upstream, we'd usually recommend to dump its output to a
file, then use the file as input to the rest of the pipe so as not to
lose the Git command's exit code:

{
...
git rev-parse ... >oids &&
sort expect &&

One could argue, in this case, that if git-rev-parse crashes, then it
won't have the expected output and the test will fail anyhow despite
not seeing its failed exit code. However, git-rev-parse might crash
_after_ emitting all the normal, expected output, and that crash would
be missed altogether, so avoiding git-rev-parse as a pipe upstream is
a good idea.

However, one could argue that argument by saying that it isn't the job
of this particular test script to check git-rev-parse's behavior, so
crashy git-rev-parse ought to be caught elsewhere by some other test
script. Nevertheless, you'll likely encounter reviewers who don't want
to see git-rev-parse upstream, even with that argument.

Anyhow, why is that 'sort' even there? It wasn't needed in the
original. Is git-rev-parse outputting the OID's in random order?

> } >expect &&
> test_three_modes get_merge_bases_many
>  '
>
>  test_expect_success 'reduce_heads' '
> [...]
> +   git rev-parse commit-5-1 \
> + commit-4-4 \
> + commit-3-6 \
> + commit-2-8 \
> + commit-1-10 | sort

Ditto.

> } >expect &&
> test_three_modes reduce_heads
>  '

Re: [PATCH v4 03/23] multi-pack-index: add builtin

2018-07-20 Thread Junio C Hamano

Derrick Stolee  writes:

> diff --git a/Documentation/git-multi-pack-index.txt 
> b/Documentation/git-multi-pack-index.txt
> new file mode 100644
> index 00..ec9982cbfc
> --- /dev/null
> +++ b/Documentation/git-multi-pack-index.txt
> @@ -0,0 +1,36 @@
> +git-multi-pack-index(1)
> +==

Isn't this underline too short by one column?  My copy of AsciiDoc
seems to be OK with it, but do other versions and reimplementations
groke this fine?

Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-20 Thread Johannes Schindelin

Hi Junio,

On Thu, 19 Jul 2018, Junio C Hamano wrote:

> Junio C Hamano  writes:
> 
> > Johannes Schindelin  writes:
> >
> >> I would like to ask you to reinstate the post-rewrite hook, as it still
> >> improves the situation over the current one.
> >
> > Without post-rewrite I seem to be getting correct amlog entries for
> > commits created by "git rebase"; do our rebase--am backend still
> > trigger post-applypatch hook in its "am" phase to apply the patches
> > created with "format-patch"?
> 
> That was a wrong line of thought that led to a dead end.  format-patch
> won't recreate Message-Id to its output from notes/amlog, so even if
> the "format-patch --stdout | am" pipeline inside rebase-am triggered
> the post-applypatch hook, it would not have a chance to carry the
> notes forward that way.
> 
> What was really happening was I have
> 
>   $ git config --list | grep amlog
>   notes.rewriteref=refs/notes/amlog
> 
> and that ought to be sufficient to carry "commit-to-original-msg-id"
> entries across rebases.  And it seems to correctly work.  I however
> suspect that "cherry-pick A..B" may lose the notes, but I haven't
> checked.

AFAICT there is at least one scenario where you run `rebase -i`, the notes
get updated, and of course the *reverse mapping* does *not* get updated:
you have a mapping both from commit to Message-Id *and crucially* from
Message-Id to commit. The automatic rewrite of commit notes in `rebase -i`
tackles only the commit notes, obviously, not the reverse.

Hence the post-rewrite hook I think I already suggested at least once in a
previous reply.

Ciao,
Dscho

Re: [PATCH v3 09/20] range-diff: adjust the output of the commit pairs

2018-07-20 Thread Johannes Schindelin

Hi Stefan,

On Tue, 17 Jul 2018, Stefan Beller wrote:

> > > It's nice to see that the bulk of the range-diff functionality has
> > > been libified in this re-roll (residing in range-diff.c rather than
> >
> > Can we *please* stop calling it "re-roll"? Thanks.
> 
> Fun fact of the day:
> 
> First appearance of "reroll" in the public archive is (09 Dec 2007)
> https://public-inbox.org/git/7vy7c3ogu2@gitster.siamese.dyndns.org/
> which is predated by "re-roll" (05 May 2006)
> https://public-inbox.org/git/7vr738w8t4@assigned-by-dhcp.cox.net/

Real fun fact of the day:

https://en.wiktionary.org/wiki/reroll says

Verb

reroll (third-person singular simple present rerolls, present participle
rerolling, simple past and past participle rerolled)

1. To roll again.

A player who rolls two sixes can reroll the dice for an additional
turn.

2. (programming) To convert (an unrolled instruction sequence) back into
   a loop. quotations ▼

Noun

reroll (plural rerolls)

(dice games) A situation in the rules of certain dice games where a
player is given the option to reroll an undesirable roll of the dice.

You will notice how this does not list *any* hint at referring to
something that Junio calls "reroll".

Likewise, I have to admit that Wiktionary's idea of an "iteration"
disagrees with *my* use of the term.

The correct term would be "revision"
(https://en.wiktionary.org/wiki/revision). But we, the core Git
contributors, in our collective infinite wisdom, chose to use that term
as yet another way to refer to a commit [*1*].

So we got it all wrong, believe it or not.

Ciao,
Dscho

Footnote *1*: https://en.wiktionary.org/wiki/commit#Noun does not even
bother to acknowledge our use of referring to a snapshot of a source code
base as a "commit".

Re: [PATCH 0/2] fail compilation with strcpy

2018-07-20 Thread Junio C Hamano

Jeff King  writes:

> This is a patch series to address the discussion in the thread at:
>
>   https://public-inbox.org/git/20180713204350.ga16...@sigill.intra.peff.net/
>
> Basically, the question was: can we declare strcpy banned and have a
> linter save us the trouble of finding it in review. The answer is yes,
> the compiler is good at that. ;)
>
> There are probably as many lists of banned functions as there are coding
> style documents. I don't agree with every entry in the ones I've seen.
> And in many cases coccinelle is a better choice, because the problem is
> not "this function is so bad your patch should not even make it to the
> list with it", but "don't do it like A; we prefer to do it like B
> instead". And coccinelle does the latter more flexibly and
> automatically.
>
> So I tried to pick some obvious and uncontroversial candidates here.
> gets() could be another one, but it's mostly banned already (it's out of
> the standard, and most libcs mark it with a deprecated attribute).
>
> Note that this needs to be applied on top of 022d2ac1f3 (blame: prefer
> xsnprintf to strcpy for colors, 2018-07-13) or it will complain loudly. :)
>
>   [1/2]: introduce "banned function" list
>   [2/2]: banned.h: mark strncpy as banned

Hmph, there is no use of any banned function in hex.c, but when
this topic is merged to 'pu', I seem to get this:

$ make DEVELOPER=1 hex.o
GIT_VERSION = 2.18.0.758.g18f90b35b8
CC hex.o
In file included from git-compat-util.h:1250:0,
 from cache.h:4,
 from hex.c:1:
banned.h:14:0: error: "strncpy" redefined [-Werror]
 #define strncpy(x,y,n) BANNED(strncpy)
 
In file included from /usr/include/string.h:630:0,
 from git-compat-util.h:165,
 from cache.h:4,
 from hex.c:1:
/usr/include/x86_64-linux-gnu/bits/string2.h:84:0: note: this is the location 
of the previous definition
 # define strncpy(dest, src, n) __builtin_strncpy (dest, src, n)
 
cc1: all warnings being treated as errors
Makefile:2279: recipe for target 'hex.o' failed
make: *** [hex.o] Error 1

Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Junio C Hamano

Jeff King  writes:

> Thanks for an interesting (and exotic) counter-point. For strcpy(), you
> always have to run strlen() anyway to know you're not going to overflow,
> so you might as well memcpy() at that point. But if you really are OK
> with truncation, then using strncpy() lets you copy while walking over
> the string only once, which is more efficient.  On the other hand,
> strncpy() fills out NULs to the end of the destination array, so you are
> paying an extra cost for small strings.
>
>> So I used strncpy() advisedly, and I ignore people running Coccinelle
>> scripts and blindly sending patches to "fix" ext4.

Given that strncpy() was invented in V7 days for specifically the
purpose of filling the filename field, it is not surprising at all
that it is the perfect tool for the same purpose in ext4 ;-)

> We don't have any remaining calls to strncpy() in Git, so this lets us
> punt on deciding if the ban is worth changing what's there. We can wait
> for somebody to decide they _really_ need strncpy, and we can discuss it
> then with a concrete case.

Yes, and the plan (or at least your plan) is to limit the banned
list to things that really has no reason to be used, the above
sounds like a good approach.

Re: [PATCH] t9300: wait for background fast-import process to die after killing it

2018-07-20 Thread Elijah Newren

On Thu, Jul 19, 2018 at 3:26 PM, SZEDER Gábor  wrote:
> The five new tests added to 't9300-fast-import.sh' in 30e215a65c
> (fast-import: checkpoint: dump branches/tags/marks even if
> object_count==0, 2017-09-28), all with the prefix "V:" in their test
> description, run 'git fast-import' in the background and then 'kill'
> it as part of a 'test_when_finished' cleanup command.  When this test
> script is executed with Bash, some or even all of these tests tend to
> pollute the test script's stderr, and messages about terminated
> processes end up on the terminal:
>
>   $ bash ./t9300-fast-import.sh
>   <... snip ...>
>   ok 179 - V: checkpoint helper does not get stuck with extra output
>   /<...>/test-lib-functions.sh: line 388: 28383 Terminated  git 
> fast-import $options 0<&8 1>&9
>   ok 180 - V: checkpoint updates refs after reset
>   ./t9300-fast-import.sh: line 3210: 28401 Terminated  git 
> fast-import $options 0<&8 1>&9
>   ok 181 - V: checkpoint updates refs and marks after commit
>   ok 182 - V: checkpoint updates refs and marks after commit (no new objects)
>   ./test-lib.sh: line 634: line 3250: 28485 Terminated  git 
> fast-import $options 0<&8 1>&9
>   ok 183 - V: checkpoint updates tags after tag
>   ./t9300-fast-import.sh: line 3264: 28510 Terminated  git 
> fast-import $options 0<&8 1>&9
>
> After a background child process terminates, its parent Bash process
> always outputs a message like those above to stderr, even when in
> non-interactive mode.
>
> But how do some of these messages end up on the test script's stderr,
> why don't we get them from all five tests, and why do they come from
> different file/line locations?  Well, after sending the TERM signal to
> the background child process, it takes a little while until that
> process receives the signal and terminates, and then it takes another
> while until the parent process notices it.  During this time the
> parent Bash process is continuing execution, and by the time it
> notices that its child terminated it might have already left
> 'test_eval_inner_' and its stderr is not redirected to /dev/null
> anymore.  That's why such a message can appear on the test script's
> stderr, while other times, when the child terminates fast and/or the
> parent shell is slow enough, the message ends up in /dev/null, just
> like any other output of the test does.  Bash always adds the file
> name and line number of the code location it was about to execute when
> it notices the termination of its child process as a prefix to that
> message, hence the varying and sometimes totally unrelated location
> prefixes in those messages (e.g. line 388 in 'test-lib-functions.sh'
> is 'test_verify_prereq', and I saw such a message pointing to
> 'say_color' as well).
>
> Prevent these messages from appearing on the test script's stderr by
> 'wait'-ing on the pid of the background 'git fast-import' process
> after sending it the TERM signal.  This ensures that the executing
> shell's stderr is still redirected when the shell notices the
> termination of its child process in the background, and that these
> messages get a consistent file/line location prefix.
>
> Note that this is not an issue when the test script is run with Bash
> and '-v', because then these messages are supposed to go to the test
> script's stderr anyway, and indeed all of them do; though the
> sometimes seemingly random file/line prefixes could be confusing
> still.  Similarly, it's not an issue with Bash and '--verbose-log'
> either, because then all messages go to the log file as they should.
> Finally, it's not an issue with some other shells (I tried dash, ksh,
> ksh93 and mksh) even without any of the verbose options, because they
> don't print messages like these in non-interactive mode in the first
> place.
>
> Signed-off-by: SZEDER Gábor 
> ---
>  t/t9300-fast-import.sh | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
> index 9e7f96223d..fac33e524c 100755
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -3147,7 +3147,10 @@ background_import_then_checkpoint () {
> echo $! >V.pid
> # We don't mind if fast-import has already died by the time the test
> # ends.
> -   test_when_finished "exec 8>&-; exec 9>&-; kill $(cat V.pid) || true"
> +   test_when_finished "
> +   exec 8>&-; exec 9>&-;
> +   kill $(cat V.pid) && wait $(cat V.pid)
> +   true"
>
> # Start in the background to ensure we adhere strictly to (blocking)
> # pipes writing sequence. We want to assume that the write below could
> --
> 2.18.0.408.g42635c01bc

Sweet, thanks for fixing this.  I got these messages nearly 100% of
the time on a certain machine when running prove with a high enough
parallel flag.  I was getting a little annoyed by them, but hadn't dug
in yet.  Your patch squelches them up in my

Re: [PATCH v2 00/18] Consolidate reachability logic

2018-07-20 Thread Derrick Stolee


On 7/20/2018 1:41 PM, Duy Nguyen wrote:

On Fri, Jul 20, 2018 at 6:35 PM Derrick Stolee  wrote:

There are many places in Git that use a commit walk to determine
reachability between commits and/or refs. A lot of this logic is
duplicated.

I wanted to achieve the following:

Consolidate several different commit walks into one file

I'm surprised get_shallow_commits() in shallow.c didn't make the cut.
It's no problem though if you already considered it and decided it was
better left alone.


Thanks for pointing this out. I didn't know about it. It would make an 
excellent follow-up series.



Reduce duplicate reachability logic
Increase testability (correctness and performance)
Improve performance of reachability queries

What's your recommendation on adding new commit reachability code? I
might have to add one to fix prune_shallow() if I don't find anything
fit. I guess the code should go to commit-reach.c too?


In my opinion, new commit walks should go into commit-reach.c. Then, you 
can justify why you are using a "new" walk instead of using an existing 
walk. Further, you can probably think of the walk in more generic terms 
than the specific application you need. Finally, you can use the 
'test-tool reach ' pattern to test the specific walk you create 
outside of the logic for which you needed it.


I understand that while this patch is under review, you will probably 
want to continue adding your walk where it is, then we can consolidate 
the code after both have settled.


Thanks,

-Stolee

Re: [PATCH v2 00/18] Consolidate reachability logic

2018-07-20 Thread Derrick Stolee


On 7/20/2018 2:09 PM, Eric Sunshine wrote:

On Fri, Jul 20, 2018 at 1:20 PM Derrick Stolee  wrote:

Here is the diff between v1 and v2.

diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
@@ -67,142 +67,176 @@ test_three_modes () {
  test_expect_success 'get_merge_bases_many' '
 cat >input <<-\EOF &&
+   A:commit-5-7
+   X:commit-4-8
+   X:commit-6-6
+   X:commit-8-3
 EOF
 {
-   printf "get_merge_bases_many(A,X):\n" &&
-   git rev-parse commit-5-6 &&
-   git rev-parse commit-4-7
+   echo "get_merge_bases_many(A,X):" &&
+   git rev-parse commit-5-6 \
+ commit-4-7 | sort

Pipes lose the exit code of the all upstream commands. When a Git
command is upstream, we'd usually recommend to dump its output to a
file, then use the file as input to the rest of the pipe so as not to
lose the Git command's exit code:

 {
 ...
 git rev-parse ... >oids &&
 sort expect &&


This approach seems fine to me. I'd hate to be in the case where 
rev-parse reports an error, terminating early, resulting in an incorrect 
expected file, and then having the test pass because the code is 
similarly incorrect. No matter how slim the chances are, I want to avoid 
a false positive there.



One could argue, in this case, that if git-rev-parse crashes, then it
won't have the expected output and the test will fail anyhow despite
not seeing its failed exit code. However, git-rev-parse might crash
_after_ emitting all the normal, expected output, and that crash would
be missed altogether, so avoiding git-rev-parse as a pipe upstream is
a good idea.

However, one could argue that argument by saying that it isn't the job
of this particular test script to check git-rev-parse's behavior, so
crashy git-rev-parse ought to be caught elsewhere by some other test
script. Nevertheless, you'll likely encounter reviewers who don't want
to see git-rev-parse upstream, even with that argument.

Anyhow, why is that 'sort' even there? It wasn't needed in the
original. Is git-rev-parse outputting the OID's in random order?


Since the merge-base algorithms provide the commits in an order that 
depends on the implementation (not the functional contract), we decided 
to sort the output commit ids in the output of 'test-tool reach 
'. Thus, we sort the rev-parse output to match.


Thanks,

-Stolee

Re: [PATCH v3 09/20] range-diff: adjust the output of the commit pairs

2018-07-20 Thread Stefan Beller

> 1. To roll again.
>
> A player who rolls two sixes can reroll the dice for an additional
> turn.

This is where I had my AHA moment!
(Consider my software development process as chaotic as a dice roll
So rerolling is really just rolling the dice again to "get my patch
accepted" ;-)

> 2. (programming) To convert (an unrolled instruction sequence) back into
>a loop. quotations ▼

We do not have unrolled loops?
This was good back in the day where the cost of each instruction weighted
heavy on the CPU, such that the JMPs that are needed (and the loop
variable check that might have had a bad branch prediction) for the loop were
slowing down the execution.

Nowadays (when I was studying 5 years ago) the branch prediction and individual
instruction execution are really good, but the bottleneck that I measured
(when I had a lot of time at my disposal and attending a class/project on micro
architectures), was the CPU instruction cache size, i.e. loop unrolling made the
code *slower* than keeping tight loops loaded in memory.
https://stackoverflow.com/questions/24196076/is-gcc-loop-unrolling-flag-really-effective

> Noun
>
> reroll (plural rerolls)
>
> (dice games) A situation in the rules of certain dice games where a
> player is given the option to reroll an undesirable roll of the dice.
>
>
> You will notice how this does not list *any* hint at referring to
> something that Junio calls "reroll".

We have undesirable patches that were 'rolled' onto the mailing list,
so they have to be rerolled?

> Footnote *1*: https://en.wiktionary.org/wiki/commit#Noun does not even
> bother to acknowledge our use of referring to a snapshot of a source code
> base as a "commit".

When Git was a content addressable file system, a commit was precisely
"a database transaction, [...] making it a permanent change."

Side note:
I was just giving a talk to my colleagues about diff aglorithms
(and eventually describing a bug in the histogram diff algorithm)
and we got really riled up with "Longest Common Subsequence",
as the mathematical definition is different than what the code
or I (after studying the code) had in mind.

Naming things is hard, and sometimes the collective wisdom got
it wrong, but changing it would be very costly in the short/medium
term.

Another note about "rolling things": At $DAYJOB I review changes
that are committed to the another revision control system w.r.t. its
compliance of open source licenses (hence I am exposed to a lot
of different projects), and some of those changes are titled
"Roll up to version $X" which I found strange, but knew
what was meant.

Stefan

Re: [PATCH 0/2] fail compilation with strcpy

2018-07-20 Thread Jeff King

On Fri, Jul 20, 2018 at 11:58:10AM -0700, Junio C Hamano wrote:

> > Note that this needs to be applied on top of 022d2ac1f3 (blame: prefer
> > xsnprintf to strcpy for colors, 2018-07-13) or it will complain loudly. :)
> >
> >   [1/2]: introduce "banned function" list
> >   [2/2]: banned.h: mark strncpy as banned
> 
> Hmph, there is no use of any banned function in hex.c, but when
> this topic is merged to 'pu', I seem to get this:

Interesting. Builds fine for me even merged to the latest push-out of
pu. But...

> $ make DEVELOPER=1 hex.o
> GIT_VERSION = 2.18.0.758.g18f90b35b8
> CC hex.o
> In file included from git-compat-util.h:1250:0,
>  from cache.h:4,
>  from hex.c:1:
> banned.h:14:0: error: "strncpy" redefined [-Werror]
>  #define strncpy(x,y,n) BANNED(strncpy)
>  
> In file included from /usr/include/string.h:630:0,
>  from git-compat-util.h:165,
>  from cache.h:4,
>  from hex.c:1:
> /usr/include/x86_64-linux-gnu/bits/string2.h:84:0: note: this is the location 
> of the previous definition
>  # define strncpy(dest, src, n) __builtin_strncpy (dest, src, n)

I suspect it has more to do with system/libc differences between our
machines, anyway. There was discussion elsewhere in the thread about the
need to #undef before redefining. I guess this answers that question.

I'll include that in the re-roll, and you can just ignore the v1 patches
I sent for now.

-Peff

[PATCH] clone: send ref-prefixes when using protocol v2

2018-07-20 Thread Brandon Williams

Signed-off-by: Brandon Williams 
---

Noticed we miss out on server side filtering of refs when cloning using
protocol v2, this will enable that.

 builtin/clone.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 99e73dae85..55cc10e93a 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -895,7 +895,8 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
int err = 0, complete_refs_before_fetch = 1;
int submodule_progress;
 
-   struct refspec_item refspec;
+   struct refspec rs = REFSPEC_INIT_FETCH;
+   struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
fetch_if_missing = 0;
 
@@ -1077,7 +1078,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
if (option_required_reference.nr || option_optional_reference.nr)
setup_reference();
 
-   refspec_item_init(&refspec, value.buf, REFSPEC_FETCH);
+   refspec_append(&rs, value.buf);
 
strbuf_reset(&value);
 
@@ -1134,10 +1135,20 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
if (transport->smart_options && !deepen && !filter_options.choice)
transport->smart_options->check_self_contained_and_connected = 
1;
 
-   refs = transport_get_remote_refs(transport, NULL);
+
+   argv_array_push(&ref_prefixes, "HEAD");
+   refspec_ref_prefixes(&rs, &ref_prefixes);
+   if (option_branch) {
+   expand_ref_prefix(&ref_prefixes, option_branch);
+   }
+   if (!option_no_tags) {
+   argv_array_push(&ref_prefixes, "refs/tags/");
+   }
+
+   refs = transport_get_remote_refs(transport, &ref_prefixes);
 
if (refs) {
-   mapped_refs = wanted_peer_refs(refs, &refspec);
+   mapped_refs = wanted_peer_refs(refs, &rs.items[0]);
/*
 * transport_get_remote_refs() may return refs with null sha-1
 * in mapped_refs (see struct transport->get_refs_list
@@ -1231,6 +1242,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
strbuf_release(&value);
junk_mode = JUNK_LEAVE_ALL;
 
-   refspec_item_clear(&refspec);
+   refspec_clear(&rs);
+   argv_array_clear(&ref_prefixes);
return err;
 }
-- 
2.18.0.233.g985f88cf7e-goog

Re: [PATCH 2/2] repack -ad: prune the list of shallow commits

2018-07-20 Thread Jeff King

On Fri, Jul 20, 2018 at 02:30:23AM -0700, Junio C Hamano wrote:

> > The entries in shallow file says that history behind them may not
> > exist in the repository due to its shallowness but history after
> > them are supposed to be traversable (otherwise we have a repository
> > corruption).  It is true that an entry that itself no longer exists
> > in this repository should not be in shallow file, as the presence of
> > that entry breaks that promise the file is making---that commit
> > ought to exist and it is safe to traverse down to it, so keeping the
> > entry in the file is absolutely a wrong thing to do.
> >
> > But that does not automatically mean that just simply removing it
> > makes the resulting repository good, does it?  Wouldn't the solution
> > for that corruption be to set a new entry to stop history traversal
> > before reaching that (now-missing) commit?
> 
> The above is overly pessimistic and worried about an impossible
> situation, I would think.  The reason why a commit that used to be
> in the shallow file is being pruned during a "repack" is because it
> has become unreachable.  By definition, no future history traversal
> that wants to enumerate reachable commits needs to be stopped from
> finding that commits that are older than this commit being pruned
> are missing by having this in the shallow list.  If there is a ref
> or a reflog entry from which such a problematic traversal starts at,
> we wouldn't be pruing this commit in the first place, because the
> commit has not become unreachable yet.
> 
> So a repository does not become corrupt by pruning the commit *and*
> removing it from the shallow file at the same time.

Right. I think a lot of this is rethinking how shallow pruning works,
too, which is not something Dscho is trying to change. The simplest
argument (which I think Dscho has made elsewhere, too) is: this is
necessary in the current shallow code when dropping objects. We do it
therefore from prune, but miss the case when git-repack is run itself
outside of git-gc.

I do still think the gc/prune architecture is a bit muddled, but at this
point in the discussion I feel OK saying that people running "git repack
-ad" would not be upset to have their shallows pruned.

But the patch is still not OK as-is because prune_shallow() requires the
SEEN flag on each reachable object struct, which we have not set in the
repack process (hence the failing test I posted earlier).  So we need a
solution for that, which may impact ideas about how the call works.
E.g., some possible solutions are:

 - teach pack-objects to optionally trigger the shallow prune based on
   its internal walk

 - have repack use the just-completed pack as a hint about reachability

 - introduce a mechanism to trigger the shallow prune based on a
   commit-only reachability check, and run that from repack (or from gc
   and document that it must be run if you are using repack as a manual
   gc replacement)

I'm not advocating any particular solution there, but just showing that
there's an array of them (and probably more that I didn't mention).

-Peff

Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-20 Thread Junio C Hamano

Johannes Schindelin  writes:

> AFAICT there is at least one scenario where you run `rebase -i`, the notes
> get updated, and of course the *reverse mapping* does *not* get updated:

It turns out that I never had a rewrite hook; the notes.rewriteref
mechanism is the only thing that has been used to maintain amlog.

I've stopped populating the reverse mapping, by the way.  The script
that I feed a message from gmane or public-inbox when I need to
learn the set of commits that resulted from the message instead uses
"git grep $message-id notes/amlog".  And that is fast enough for my
purpose.

There is no good reason to abuse the notes mechanism to map a random
object-name looking string (i.e. hash result of message id), other
than the ease of "quick access" when somebody is making a lot of
inquiry, but that "database" does not have to be stored in notes.
It certainly does not belong to cycles worth spending by me *while*
I work during the say with various history reshaping tools to record
and/or update the reverse mapping and that is why my post-applypatch
hook no longer has the "reverse map" hack.

It is not like anybody (including me) needs realtime up-to-date
reverse mapping from amlog while I run my "commit --amend", "rebase
-i", etc. and the reverse map is constructable by reversing the
forward map, obviously, with a postprocessing.  And I think that is
a reasonably way forward if anybody wants to have a reverse mapping.
The postprocessing can be done either by me before pushing out the
amlog ref, or done by any consumer after fetching the amlog ref from
me.  If I did the postprocessing and refuse to use rewrite hook you
wouldn't even know ;-)

Re: [PATCH] clone: send ref-prefixes when using protocol v2

2018-07-20 Thread Junio C Hamano

Brandon Williams  writes:

Is there an end-user visible effect, caused by the lack of "prefix"
being fixed with this patch, that is worth describing here?  "The
server ended up showing refs that are irrelevant to the normal clone
request which is only for heads and tags, wasting time and
bandwidth", for example?

> Signed-off-by: Brandon Williams 
> ---
>
> Noticed we miss out on server side filtering of refs when cloning using
> protocol v2, this will enable that.


>
>  builtin/clone.c | 22 +-
>  1 file changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 99e73dae85..55cc10e93a 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -895,7 +895,8 @@ int cmd_clone(int argc, const char **argv, const char 
> *prefix)
>   int err = 0, complete_refs_before_fetch = 1;
>   int submodule_progress;
>  
> - struct refspec_item refspec;
> + struct refspec rs = REFSPEC_INIT_FETCH;
> + struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
>  
>   fetch_if_missing = 0;
>  
> @@ -1077,7 +1078,7 @@ int cmd_clone(int argc, const char **argv, const char 
> *prefix)
>   if (option_required_reference.nr || option_optional_reference.nr)
>   setup_reference();
>  
> - refspec_item_init(&refspec, value.buf, REFSPEC_FETCH);
> + refspec_append(&rs, value.buf);
>  
>   strbuf_reset(&value);
>  
> @@ -1134,10 +1135,20 @@ int cmd_clone(int argc, const char **argv, const char 
> *prefix)
>   if (transport->smart_options && !deepen && !filter_options.choice)
>   transport->smart_options->check_self_contained_and_connected = 
> 1;
>  
> - refs = transport_get_remote_refs(transport, NULL);
> +
> + argv_array_push(&ref_prefixes, "HEAD");
> + refspec_ref_prefixes(&rs, &ref_prefixes);
> + if (option_branch) {
> + expand_ref_prefix(&ref_prefixes, option_branch);
> + }
> + if (!option_no_tags) {
> + argv_array_push(&ref_prefixes, "refs/tags/");
> + }
> +
> + refs = transport_get_remote_refs(transport, &ref_prefixes);
>  
>   if (refs) {
> - mapped_refs = wanted_peer_refs(refs, &refspec);
> + mapped_refs = wanted_peer_refs(refs, &rs.items[0]);
>   /*
>* transport_get_remote_refs() may return refs with null sha-1
>* in mapped_refs (see struct transport->get_refs_list
> @@ -1231,6 +1242,7 @@ int cmd_clone(int argc, const char **argv, const char 
> *prefix)
>   strbuf_release(&value);
>   junk_mode = JUNK_LEAVE_ALL;
>  
> - refspec_item_clear(&refspec);
> + refspec_clear(&rs);
> + argv_array_clear(&ref_prefixes);
>   return err;
>  }

[BUG] merge-recursive overly aggressive when skipping updating the working tree

2018-07-20 Thread Ben Peart

As we were attempting to migrate to 2.18 some of our internal functional 
tests failed.  The tests that failed were testing merge and cherry-pick 
when there was a merge conflict. Our tests run with sparse-checkout 
enabled which is what exposed the bug.


What is happening is that in merge_recursive, the skip-worktree bit is 
cleared on the cache entry but then the working directory isn't updated. 
 The end result is that git reports that the merged file has actually 
been deleted.


We've identified the patch that introduced the regression as:

commit 1de70dbd1ada0069d1b6cd6345323906cc9a9ed3
Author: Elijah Newren 
Date:   Thu Apr 19 10:58:23 2018 -0700

merge-recursive: fix check for skipability of working tree updates

The can-working-tree-updates-be-skipped check has had a long and 
blemished

history.  The update can be skipped iff:
  a) The merge is clean
  b) The merge matches what was in HEAD (content, mode, pathname)
  c) The target path is usable (i.e. not involved in D/F conflict)


I've written a test that can be used to reproduce the issue:


diff --git a/t/t3507-cherry-pick-conflict.sh 
b/t/t3507-cherry-pick-conflict.sh

index 7c5ad08626..de0bdc8634 100755
--- a/t/t3507-cherry-pick-conflict.sh
+++ b/t/t3507-cherry-pick-conflict.sh
@@ -392,4 +392,17 @@ test_expect_success 'commit --amend -s places the 
sign-off at the right place' '


test_cmp expect actual
 '

+test_expect_success 'failed cherry-pick with sparse-checkout' '
+   pristine_detach initial &&
+   git config core.sparseCheckout true &&
+   echo /unrelated >.git/info/sparse-checkout &&
+   git read-tree --reset -u HEAD &&
+   test_must_fail git cherry-pick -Xours picked>actual &&
+   test_i18ngrep ! "Changes not staged for commit:" actual &&
+   echo "/*" >.git/info/sparse-checkout &&
+   git read-tree --reset -u HEAD &&
+   git config core.sparseCheckout false &&
+   rm .git/info/sparse-checkout
+'
+
 test_done

Thanks,

Ben

Re: [PATCH] clone: send ref-prefixes when using protocol v2

2018-07-20 Thread Jonathan Nieder

Hi,

Brandon Williams wrote:

> Signed-off-by: Brandon Williams 
> ---
> Noticed we miss out on server side filtering of refs when cloning using
> protocol v2, this will enable that.
>
>  builtin/clone.c | 22 +-
>  1 file changed, 17 insertions(+), 5 deletions(-)

Nice!  The implementation looks good.

Can you add a test to ensure this filtering doesn't regress later?

[...]
> +++ b/builtin/clone.c
[...]
> @@ -1134,10 +1135,20 @@ int cmd_clone(int argc, const char **argv, const char 
> *prefix)
>   if (transport->smart_options && !deepen && !filter_options.choice)
>   transport->smart_options->check_self_contained_and_connected = 
> 1;
>  
> - refs = transport_get_remote_refs(transport, NULL);
> +
> + argv_array_push(&ref_prefixes, "HEAD");
> + refspec_ref_prefixes(&rs, &ref_prefixes);
> + if (option_branch) {
> + expand_ref_prefix(&ref_prefixes, option_branch);
> + }
> + if (!option_no_tags) {
> + argv_array_push(&ref_prefixes, "refs/tags/");
> + }

nit: no need for braces around one-line "if" body

Thanks,
Jonathan

FYI: histogram diff bug

2018-07-20 Thread Stefan Beller

  seq 1 100 >one
  echo 99 > two
  seq 1 2 98 >>two
  git diff --no-index --histogram one two

In addition to the recent patches[1], I found another bug in the
histogram algorithm,
which can be produced with the instructions given above. At least I
think it is a bug as
the diff looks terrible to me. (It is a correct diff, but the output
is just bad.)

[1] https://public-inbox.org/git/20180719221942.179565-1-sbel...@google.com/

And I think the issue is in the algorithm, which is basically as follows:

1) build up information about one side ("scanA()"), by putting each
line into a hashmap (and counting its occurrences, such that you
can make the histogram)

2) "find_LCS" on the other side (B) by trying low occurrence lines
  and seeing how long the common consequential string match is.
  (I think this LCS refers to [2], not to be confused with [3])

3) The whole mental model of the difference
between both sides now look like:

  stuff before the LCS
  LCS
  stuff after the LCS

As the LCS has no output associated with it, just call recursively
do_histogram on the remainders before and after.

This is my rough understanding of the algorithm so far.
(I am unsure about the exact find_LCS part how and why to
pick certain candidates for finding the LCS).

The big issue however is the count of LCS, so far we assumed
there is only one! In the example given above there are many,
i.e. lots of "longest" common continuous substrings of length 1.
We are unfortunate to pick "99" as the LCS and recurse before
and after; the other LCSs would be a better pick.

So I think we would need to find "all LCS", which there can be
more than one at the same time, and then fill the gaps between
them using recursion.
As the order of LCSs can be different in the two sides,
we actually have to produce a diff of LCSs and those which are
out of order need to be emitted as full edits (deletes or creates).
In the example above we'd want to have "99" to be a full create
and delete instead of being *the* anchor; all other LCSs ought to
be anchors for the first (zero'th?) level of recursion.

This bug is also present in JGit which this algorithm was ported
from, hence jgit-dev is cc'd (and furthers my suspicion that the
issue is not in the port but in the algorithm itself).

[2] https://en.wikipedia.org/wiki/Longest_common_substring_problem
[3] https://en.wikipedia.org/wiki/Longest_common_subsequence_problem

I'll think about this and see if I can fix it properly,
Thoughts or Opinions?

Stefan

Re: [BUG] merge-recursive overly aggressive when skipping updating the working tree

2018-07-20 Thread Elijah Newren

On Fri, Jul 20, 2018 at 12:53 PM, Ben Peart  wrote:
> As we were attempting to migrate to 2.18 some of our internal functional
> tests failed.  The tests that failed were testing merge and cherry-pick when
> there was a merge conflict. Our tests run with sparse-checkout enabled which
> is what exposed the bug.

Indeed, I've never used sparse checkout before.  And I've got
questions related to it below...

> What is happening is that in merge_recursive, the skip-worktree bit is
> cleared on the cache entry but then the working directory isn't updated.
> The end result is that git reports that the merged file has actually been
> deleted.
>
> We've identified the patch that introduced the regression as:
>
> commit 1de70dbd1ada0069d1b6cd6345323906cc9a9ed3
> Author: Elijah Newren 
> Date:   Thu Apr 19 10:58:23 2018 -0700
>
> merge-recursive: fix check for skipability of working tree updates
>
> The can-working-tree-updates-be-skipped check has had a long and
> blemished
> history.  The update can be skipped iff:
>   a) The merge is clean
>   b) The merge matches what was in HEAD (content, mode, pathname)
>   c) The target path is usable (i.e. not involved in D/F conflict)
>
>
> I've written a test that can be used to reproduce the issue:
>
>
> diff --git a/t/t3507-cherry-pick-conflict.sh
> b/t/t3507-cherry-pick-conflict.sh
> index 7c5ad08626..de0bdc8634 100755
> --- a/t/t3507-cherry-pick-conflict.sh
> +++ b/t/t3507-cherry-pick-conflict.sh
> @@ -392,4 +392,17 @@ test_expect_success 'commit --amend -s places the
> sign-off at the right place' '
>
> test_cmp expect actual
>  '
>
> +test_expect_success 'failed cherry-pick with sparse-checkout' '
> +   pristine_detach initial &&
> +   git config core.sparseCheckout true &&
> +   echo /unrelated >.git/info/sparse-checkout &&
> +   git read-tree --reset -u HEAD &&
> +   test_must_fail git cherry-pick -Xours picked>actual &&
> +   test_i18ngrep ! "Changes not staged for commit:" actual &&
> +   echo "/*" >.git/info/sparse-checkout &&
> +   git read-tree --reset -u HEAD &&
> +   git config core.sparseCheckout false &&
> +   rm .git/info/sparse-checkout
> +'
> +
>  test_done

Thanks for cooking up a testcase.  In short, you've got:
  - a one-line file that was modified on both sides
  - you explicitly set the skip-worktree bit for this file (by the
core.sparseCheckout and read-tree steps),
suggesting that you DONT want it being written to your working tree
  - you're using -Xours to avoid what would otherwise be a conflict

So, I actually think merge-recursive is behaving correctly with
respect to the working tree here, because:
  - You said you didn't want foo in your working copy with your
sparse-checkout pattern
  - You manually nuked foo from your working copy when you called 'git
read-tree --reset -u HEAD'
  - You setup this merge such that the merge result was the same as
before the merge started.
In short, the file didn't change AND you don't want it in your working
tree, so why write it out?

To me, the bug is that merge-recursive clears the skip-worktree bit
for foo when that should be left intact -- at least in this case.


But that brings up another interesting question.  What if a merge
*does* modify a file for which you have skip-worktree set?
Previously, it'd clear the bit and write the file to the working tree,
but that was by no means an explicit decision; that was just a side
effect of the commands it uses.  Isn't that choice wrong -- shouldn't
it just update the index and continue on?  Or, if there are conflicts,
is that a case that is considered special where you do want the
skip-worktree bit cleared and have the file written out?

I'm worried that getting skip-worktree right in merge-recursive, when
it had never been considered before with that codebase, might be a
little messy...

we provide editing

2018-07-20 Thread Scott


Hi,

We provide image editing services like - photo cut out; photo clipping
path; photo masking; photo
shadow creation; photo color correction; photo retouching; beauty model
retouching on skin, face,
body; glamour retouching; products retouching and other image editing.

We are also offering to deliver testing for you, so that you get to know
our quality first hand.

If you want to explore further, please reply back.

Thanks and Regards,
Scott

Re: No rule to make target `git-daemon'

2018-07-20 Thread brian m. carlson

On Thu, Jul 19, 2018 at 09:37:08PM -0400, Jeffrey Walton wrote:
> Hi Everyone,
> 
> I'm working from the 2.18 tarball on Solaris 11.3 x86_64. I'm catching
> the following when building from sources. This appears to be a new
> issue. It was not present in 2.17.1.
> 
> gmake: *** No rule to make target `git-daemon'.  Stop.
> gmake: *** Waiting for unfinished jobs
> Failed to build Git
> 
> There does not appear to be an option to control building the daemon:
> 
> $ ./configure --help | grep -i daemon
> $
> 
> Any ideas on how to side-step it?

I also don't see this issue, and I didn't see anything between 2.17.1
and 2.18 that stood out to me as a potential cause of this problem.  Can
you tell us a bit more about what version of GNU make you're using and
the configuration you're trying to build (say, the config.mak.autogen)?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [BUG] merge-recursive overly aggressive when skipping updating the working tree

2018-07-20 Thread Junio C Hamano

Elijah Newren  writes:

> But that brings up another interesting question.  What if a merge
> *does* modify a file for which you have skip-worktree set?
> Previously, it'd clear the bit and write the file to the working tree,
> but that was by no means an explicit decision;

At least in my mind, the "skip worktree" aka sparse checkout has
always been "best effort" in that if Git needs to materialize a
working tree file in order to carry out some operation (e.g. a merge
needs conflict resolution, hence we need to give a working tree file
with conflict markers to the end user) Git is free to do so.

Isn't that what happens currently?

Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-20 Thread Stefan Beller

On Fri, Jul 20, 2018 at 12:35 PM Junio C Hamano  wrote:

> It is not like anybody (including me) needs realtime up-to-date

I thought the same for a long time, but contributing to other projects
showed me that this is not necessarily the case. Having a real time
update, even if it would be just "your patch is labeled 'under discussion'"
is beneficial as I would know where it is "in the system".

In a way I'd compare our contribution process to having an
incredible fine grained paper map. Most of the world moved
on to digital maps, that zoom in on-demand.
(C.f. spelling out "See banned.h for banned functions" in
Documentation/CodingGuidelines is a fine grained detail
that is not relevant for *most* of the contributions, but just
burdens the bearer of the paper map with weight; if this hint
is given dynamically by the compiler or build system at relevant
times, it is much better;

Regarding the real time aspect here, it is also very good
comparison to maps: While I know how to read paper maps
(or offline maps) and how to navigate my way, it sure is easier
to just follow the online up-to-date navigation service, that
tells me what to do. )

Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-20 Thread Junio C Hamano

Stefan Beller  writes:

> On Fri, Jul 20, 2018 at 12:35 PM Junio C Hamano  wrote:
>
>> It is not like anybody (including me) needs realtime up-to-date
>
> I thought the same for a long time, but contributing to other projects
> showed me that this is not necessarily the case. Having a real time
> update, even if it would be just "your patch is labeled 'under discussion'"
> is beneficial as I would know where it is "in the system".

Well, you wouldn't have an access to the up-to-date amlog maintained
by me *UNTIL* I push it out at the end of the day.  So by
definition, you do not have real-time access to the up-to-date
state.

And also by definition, you do not *NEED* such an access, because
you won't see newly created or rewritten commits, whose originating
Message-Id is not in the copy of amlog you have (yet), until I push
the day's integration result out *AND* you fetch what I pushed out.

Re: No rule to make target `git-daemon'

2018-07-20 Thread Jeff King

On Thu, Jul 19, 2018 at 09:37:08PM -0400, Jeffrey Walton wrote:

> I'm working from the 2.18 tarball on Solaris 11.3 x86_64. I'm catching
> the following when building from sources. This appears to be a new
> issue. It was not present in 2.17.1.
> 
> gmake: *** No rule to make target `git-daemon'.  Stop.
> gmake: *** Waiting for unfinished jobs
> Failed to build Git

Like others, I don't really have an idea on how this could have
happened. But perhaps "gmake -d git-daemon" could be enlightening?

Mine has:

  [...]
  Considering target file 'git-daemon'.
   Looking for an implicit rule for 'git-daemon'.
   Trying pattern rule with stem 'daemon'.
   Trying implicit prerequisite 'daemon.o'.
   Trying rule prerequisite 'GIT-LDFLAGS'.
   Trying rule prerequisite 'common-main.o'.
   Trying rule prerequisite 'libgit.a'.
   Trying rule prerequisite 'xdiff/lib.a'.
   Found an implicit rule for 'git-daemon'.
Considering target file 'daemon.o'.

and so on.

> There does not appear to be an option to control building the daemon:
> 
> $ ./configure --help | grep -i daemon
> $
> 
> Any ideas on how to side-step it?

There's no official knob to turn, but dropping this line would remove
it:

diff --git a/Makefile b/Makefile
index 08e5c54549..31153a2789 100644
--- a/Makefile
+++ b/Makefile
@@ -684,7 +684,6 @@ EXTRA_PROGRAMS =
 PROGRAMS += $(EXTRA_PROGRAMS)

 PROGRAM_OBJS += credential-store.o
-PROGRAM_OBJS += daemon.o
 PROGRAM_OBJS += fast-import.o
 PROGRAM_OBJS += http-backend.o
 PROGRAM_OBJS += imap-send.o

-Peff

error: invalid key with file url insteadOf

2018-07-20 Thread Basin Ilya

Hi.

I'm trying to make the Cygwin Git understand Mingw-style repo urls:

git config --global \
  file:///cygdrive/c.insteadOf \
  file:///C:

But this fails with:

error: invalid key: file:///cygdrive/c.insteadOf

Manually editing ~/.gitconfig works fine:

[url "file:///cygdrive/c"]
insteadOf = file:///C:


il@MAR2 
/cygdrive/c/progs/maven/maven-scm/maven-scm-providers/maven-scm-providers-git/maven-scm-provider-gitexe
$ LANG=C cmd.exe /X /C "git clone 
file:///C:/progs/maven/maven-scm/maven-scm-providers/maven-scm-providers-git/maven-scm-provider-gitexe/target/git_copy
 checkin-nobranch"
Cloning into 'checkin-nobranch'...
warning: You appear to have cloned an empty repository.

Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-20 Thread Stefan Beller

+cc list
On Fri, Jul 20, 2018 at 2:29 PM Junio C Hamano  wrote:
> ... which means that it does not matter if I have an elaborate rewrite hook
> that constantly updates the reverse mapping or if the reverse mapping is
> made immediately before I push out. You wouldn't even be able to tell any
> difference.
>
> And for that matter, it could even be made on the receiving end by you
> after you fetch from me before you need the reverse map information.

Re: error: invalid key with file url insteadOf

2018-07-20 Thread Junio C Hamano

Basin Ilya  writes:

> Hi.
>
> I'm trying to make the Cygwin Git understand Mingw-style repo urls:
>
> git config --global \
>   file:///cygdrive/c.insteadOf \
>   file:///C:
>
> But this fails with:
>
> error: invalid key: file:///cygdrive/c.insteadOf

Understandable.

> Manually editing ~/.gitconfig works fine:
>
> [url "file:///cygdrive/c"]
> insteadOf = file:///C:

How about spelling what you want to happen on the command line
version the same way as your "manual" version?  I.e.

git config --global \
url.file:///cygdrive/c.insteadOf \
file:///C:

notice the lack of "url." in your version.

Re: [BUG] merge-recursive overly aggressive when skipping updating the working tree

2018-07-20 Thread Elijah Newren

On Fri, Jul 20, 2018 at 2:13 PM, Junio C Hamano  wrote:
> Elijah Newren  writes:
>
>> But that brings up another interesting question.  What if a merge
>> *does* modify a file for which you have skip-worktree set?
>> Previously, it'd clear the bit and write the file to the working tree,
>> but that was by no means an explicit decision;
>
> At least in my mind, the "skip worktree" aka sparse checkout has
> always been "best effort" in that if Git needs to materialize a
> working tree file in order to carry out some operation (e.g. a merge
> needs conflict resolution, hence we need to give a working tree file
> with conflict markers to the end user) Git is free to do so.
>
> Isn't that what happens currently?

Ah, okay, that's helpful.  So, if there are conflicts, it should be
free to clear the skip_worktree flag.  Since merge-recursive calls
add_cacheinfo() for all entries it needs to update, which deletes the
old cache entry and just makes new ones, we get that for free.

And conversely, if a file-level merge succeeds without conflicts then
it clearly doesn't "need to materialize a working tree file", so it
should NOT clear the skip_worktree flag for that path.
(Unfortunately, that means we have to work around add_cacheinfo() for
these cases.)

Re: [PATCH 0/2] fail compilation with strcpy

2018-07-20 Thread Junio C Hamano

Jeff King  writes:

> I suspect it has more to do with system/libc differences between our
> machines, anyway. There was discussion elsewhere in the thread about the
> need to #undef before redefining. I guess this answers that question.

Yes, that is it.  For now I can squash this in before pushing it out
on 'pu'.

 banned.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/banned.h b/banned.h
index ae64a9..d138f3ecf2 100644
--- a/banned.h
+++ b/banned.h
@@ -10,11 +10,17 @@
 
 #define BANNED(func) sorry_##func##_is_a_banned_function()
 
+#undef strcpy
 #define strcpy(x,y) BANNED(strcpy)
+
+#undef strncpy
 #define strncpy(x,y,n) BANNED(strncpy)
 
 #ifdef HAVE_VARIADIC_MACROS
+
+#undef sprintf
 #define sprintf(...) BANNED(sprintf)
+
 #endif
 
 #endif /* BANNED_H */

Re: No rule to make target `git-daemon'

2018-07-20 Thread Jeffrey Walton

On Fri, Jul 20, 2018 at 5:04 PM, brian m. carlson
 wrote:
> On Thu, Jul 19, 2018 at 09:37:08PM -0400, Jeffrey Walton wrote:
>> Hi Everyone,
>>
>> I'm working from the 2.18 tarball on Solaris 11.3 x86_64. I'm catching
>> the following when building from sources. This appears to be a new
>> issue. It was not present in 2.17.1.
>>
>> gmake: *** No rule to make target `git-daemon'.  Stop.
>> gmake: *** Waiting for unfinished jobs
>> Failed to build Git
>>
>> There does not appear to be an option to control building the daemon:
>>
>> $ ./configure --help | grep -i daemon
>> $
>>
>> Any ideas on how to side-step it?
>
> I also don't see this issue, and I didn't see anything between 2.17.1
> and 2.18 that stood out to me as a potential cause of this problem.  Can
> you tell us a bit more about what version of GNU make you're using and
> the configuration you're trying to build (say, the config.mak.autogen)?

Sorry about the late reply. I meant to reply this morning.

My shell script to patch things on Solaris was the cause of the problem.

(If anyone is interested in first class Solaris support then I am
happy to help. The patch set needed for the platform has been stable
for the last couple of years).

Jeff

Re: Hash algorithm analysis

2018-07-20 Thread brian m. carlson

On Mon, Jun 11, 2018 at 12:29:42PM -0700, Jonathan Nieder wrote:
> My understanding of the discussion so far:
> 
> Keccak team encourages us[1] to consider a variant like K12 instead of
> SHA3.
> 
> AGL explains[2] that the algorithms considered all seem like
> reasonable choices and we should decide using factors like
> implementation ease and performance.
> 
> If we choose a Keccak-based function, AGL also[3] encourages using a
> variant like K12 instead of SHA3.
> 
> Dscho strongly prefers[4] SHA-256, because of
> - wide implementation availability, including in future hardware
> - has been widely analyzed
> - is fast
> 
> Yves Orton and Linus Torvalds prefer[5] SHA3 over SHA2 because of how
> it is constructed.

I know this discussion has sort of petered out, but I'd like to see if
we can revive it.  I'm writing index v3 and having a decision would help
me write tests for it.

To summarize the discussion that's been had in addition to the above,
Ævar has also stated a preference for SHA-256 and I would prefer BLAKE2b
over SHA-256 over SHA3-256, although any of them would be fine.

Are there other contributors who have a strong opinion?  Are there
things I can do to help us coalesce around an option?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

[PATCH] Documentation/git-interpret-trailers: explain possible values

2018-07-20 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---

Maybe we rather want to refer to the options that are described further
down in the document?

 Documentation/git-interpret-trailers.txt | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-interpret-trailers.txt 
b/Documentation/git-interpret-trailers.txt
index 9111c47a1bf..b8fafb1e8bd 100644
--- a/Documentation/git-interpret-trailers.txt
+++ b/Documentation/git-interpret-trailers.txt
@@ -88,7 +88,8 @@ OPTIONS
Specify where all new trailers will be added.  A setting
provided with '--where' overrides all configuration variables
and applies to all '--trailer' options until the next occurrence of
-   '--where' or '--no-where'.
+   '--where' or '--no-where'. Possible values are `after`, `before`,
+   `end` or `start`.
 
 --if-exists ::
 --no-if-exists::
@@ -96,7 +97,8 @@ OPTIONS
least one trailer with the same  in the message.  A setting
provided with '--if-exists' overrides all configuration variables
and applies to all '--trailer' options until the next occurrence of
-   '--if-exists' or '--no-if-exists'.
+   '--if-exists' or '--no-if-exists'. Possible actions are 
`addIfDifferent`,
+   `addIfDifferentNeighbor`, `add`, `replace` and `doNothing`.
 
 --if-missing ::
 --no-if-missing::
@@ -104,7 +106,8 @@ OPTIONS
trailer with the same  in the message.  A setting
provided with '--if-missing' overrides all configuration variables
and applies to all '--trailer' options until the next occurrence of
-   '--if-missing' or '--no-if-missing'.
+   '--if-missing' or '--no-if-missing'. Possible actions are `doNothing`
+   or `add`.
 
 --only-trailers::
Output only the trailers, not any other parts of the input.
-- 
2.18.0.233.g985f88cf7e-goog

Re: [BUG] merge-recursive overly aggressive when skipping updating the working tree

2018-07-20 Thread Junio C Hamano

Elijah Newren  writes:

> Ah, okay, that's helpful.  So, if there are conflicts, it should be
> free to clear the skip_worktree flag.  Since merge-recursive calls
> add_cacheinfo() for all entries it needs to update, which deletes the
> old cache entry and just makes new ones, we get that for free.

Correct.

> And conversely, if a file-level merge succeeds without conflicts then
> it clearly doesn't "need to materialize a working tree file", so it
> should NOT clear the skip_worktree flag for that path.

That is not at all implied by what I wrote, though.

If it can be done without too much effort, then it certainly is
nicer to keep the sparseness when we do not have to materialize the
working tree file.  But at least in my mind, if it needs too many
special cases, hacks, and conditionals, then it is not worth the
complexity---if it is easier to write a correct code by allowing Git
to populate working tree files, it is perfectly fine to do so.  

In a sense, the sparse checkout "feature" itself is a hack by
itself, and that is why I think this part should be "best effort" as
well.

[PATCH v2] clone: send ref-prefixes when using protocol v2

2018-07-20 Thread Brandon Williams

Teach clone to send a list of ref-prefixes, when using protocol v2, to
allow the server to filter out irrelevant references from the
ref-advertisement.  This reduces wasted time and bandwidth when cloning
repositories with a larger number of references.

Signed-off-by: Brandon Williams 
---
 builtin/clone.c| 20 +++-
 t/t5702-protocol-v2.sh |  7 ++-
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 99e73dae85..5c0adbd6d0 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -895,7 +895,8 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
int err = 0, complete_refs_before_fetch = 1;
int submodule_progress;
 
-   struct refspec_item refspec;
+   struct refspec rs = REFSPEC_INIT_FETCH;
+   struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
fetch_if_missing = 0;
 
@@ -1077,7 +1078,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
if (option_required_reference.nr || option_optional_reference.nr)
setup_reference();
 
-   refspec_item_init(&refspec, value.buf, REFSPEC_FETCH);
+   refspec_append(&rs, value.buf);
 
strbuf_reset(&value);
 
@@ -1134,10 +1135,18 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
if (transport->smart_options && !deepen && !filter_options.choice)
transport->smart_options->check_self_contained_and_connected = 
1;
 
-   refs = transport_get_remote_refs(transport, NULL);
+
+   argv_array_push(&ref_prefixes, "HEAD");
+   refspec_ref_prefixes(&rs, &ref_prefixes);
+   if (option_branch)
+   expand_ref_prefix(&ref_prefixes, option_branch);
+   if (!option_no_tags)
+   argv_array_push(&ref_prefixes, "refs/tags/");
+
+   refs = transport_get_remote_refs(transport, &ref_prefixes);
 
if (refs) {
-   mapped_refs = wanted_peer_refs(refs, &refspec);
+   mapped_refs = wanted_peer_refs(refs, &rs.items[0]);
/*
 * transport_get_remote_refs() may return refs with null sha-1
 * in mapped_refs (see struct transport->get_refs_list
@@ -1231,6 +1240,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
strbuf_release(&value);
junk_mode = JUNK_LEAVE_ALL;
 
-   refspec_item_clear(&refspec);
+   refspec_clear(&rs);
+   argv_array_clear(&ref_prefixes);
return err;
 }
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index a4fe6508bd..9c6ea04a69 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -181,7 +181,12 @@ test_expect_success 'clone with file:// using protocol v2' 
'
test_cmp expect actual &&
 
# Server responded using protocol v2
-   grep "clone< version 2" log
+   grep "clone< version 2" log &&
+   
+   # Client sent ref-prefixes to filter the ref-advertisement 
+   grep "ref-prefix HEAD" log &&
+   grep "ref-prefix refs/heads/" log &&
+   grep "ref-prefix refs/tags/" log
 '
 
 test_expect_success 'fetch with file:// using protocol v2' '
-- 
2.18.0.233.g985f88cf7e-goog

Re: No rule to make target `git-daemon'

2018-07-20 Thread brian m. carlson

On Fri, Jul 20, 2018 at 05:51:46PM -0400, Jeffrey Walton wrote:
> Sorry about the late reply. I meant to reply this morning.
> 
> My shell script to patch things on Solaris was the cause of the problem.

Glad to hear you got it figured out.

> (If anyone is interested in first class Solaris support then I am
> happy to help. The patch set needed for the platform has been stable
> for the last couple of years).

I'd certainly be interested for you to send it to the list.  I'm not
sure I can provide a helpful review, since I don't run Solaris, but
having better support out of the box would definitely be nice.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [PATCH v4 03/23] multi-pack-index: add builtin

2018-07-20 Thread brian m. carlson

On Fri, Jul 20, 2018 at 11:22:03AM -0700, Junio C Hamano wrote:
> Derrick Stolee  writes:
> 
> > diff --git a/Documentation/git-multi-pack-index.txt 
> > b/Documentation/git-multi-pack-index.txt
> > new file mode 100644
> > index 00..ec9982cbfc
> > --- /dev/null
> > +++ b/Documentation/git-multi-pack-index.txt
> > @@ -0,0 +1,36 @@
> > +git-multi-pack-index(1)
> > +==
> 
> Isn't this underline too short by one column?  My copy of AsciiDoc
> seems to be OK with it, but do other versions and reimplementations
> groke this fine?

I believe AsciiDoc allows a two-character grace (either longer or
shorter) and Asciidoctor allows one.  So this should work both places
(although I haven't tested).

It might be nice to fix if a re-roll is needed, but it should be
functional as-is.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [PATCH v2 00/18] Consolidate reachability logic

2018-07-20 Thread Stefan Beller

Hi Derrick,

> Sure! It's on my fork [1]
>
> [1] https://github.com/derrickstolee/git/tree/reach/refactor
>

Thanks!

> >> * Use single rev-parse commands in test output, and pipe the OIDs through 
> >> 'sort'

Why do we need to sort them? The order of the answers given by rev-parse
is the same as the input given and we did not need to sort it before, i.e.
the unit under test would not give sorted output but some deterministic(?)
order, which we can replicate as input to rev-parse.
Am I missing the obvious?

> >> * Check output of parse_commit()
> >>
> >> * Update flag documentation in object.h
> >>
> >> * Add tests for commit_contains() including both algorithms.
> >>
> >> * Reduce size of "mixed-mode" commit-graph to ensure we start commit walks
> >>'above' the graph and then walk into the commits with generation 
> >> numbers.

Overall I like the series as-is, and have found
no further issues in a quick read.

Thanks,
Stefan

Re: [PATCH] Documentation/git-interpret-trailers: explain possible values

2018-07-20 Thread Junio C Hamano

Stefan Beller  writes:

> Signed-off-by: Stefan Beller 
> ---
>
> Maybe we rather want to refer to the options that are described further
> down in the document?

I have no strong preference either way.

The patch looks reasonable to me; Christian?

>
>  Documentation/git-interpret-trailers.txt | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/git-interpret-trailers.txt 
> b/Documentation/git-interpret-trailers.txt
> index 9111c47a1bf..b8fafb1e8bd 100644
> --- a/Documentation/git-interpret-trailers.txt
> +++ b/Documentation/git-interpret-trailers.txt
> @@ -88,7 +88,8 @@ OPTIONS
>   Specify where all new trailers will be added.  A setting
>   provided with '--where' overrides all configuration variables
>   and applies to all '--trailer' options until the next occurrence of
> - '--where' or '--no-where'.
> + '--where' or '--no-where'. Possible values are `after`, `before`,
> + `end` or `start`.
>  
>  --if-exists ::
>  --no-if-exists::
> @@ -96,7 +97,8 @@ OPTIONS
>   least one trailer with the same  in the message.  A setting
>   provided with '--if-exists' overrides all configuration variables
>   and applies to all '--trailer' options until the next occurrence of
> - '--if-exists' or '--no-if-exists'.
> + '--if-exists' or '--no-if-exists'. Possible actions are 
> `addIfDifferent`,
> + `addIfDifferentNeighbor`, `add`, `replace` and `doNothing`.
>  
>  --if-missing ::
>  --no-if-missing::
> @@ -104,7 +106,8 @@ OPTIONS
>   trailer with the same  in the message.  A setting
>   provided with '--if-missing' overrides all configuration variables
>   and applies to all '--trailer' options until the next occurrence of
> - '--if-missing' or '--no-if-missing'.
> + '--if-missing' or '--no-if-missing'. Possible actions are `doNothing`
> + or `add`.
>  
>  --only-trailers::
>   Output only the trailers, not any other parts of the input.

Re: [PATCH v4 03/23] multi-pack-index: add builtin

2018-07-20 Thread Junio C Hamano

"brian m. carlson"  writes:

> On Fri, Jul 20, 2018 at 11:22:03AM -0700, Junio C Hamano wrote:
>> Derrick Stolee  writes:
>> 
>> > diff --git a/Documentation/git-multi-pack-index.txt 
>> > b/Documentation/git-multi-pack-index.txt
>> > new file mode 100644
>> > index 00..ec9982cbfc
>> > --- /dev/null
>> > +++ b/Documentation/git-multi-pack-index.txt
>> > @@ -0,0 +1,36 @@
>> > +git-multi-pack-index(1)
>> > +==
>> 
>> Isn't this underline too short by one column?  My copy of AsciiDoc
>> seems to be OK with it, but do other versions and reimplementations
>> groke this fine?
>
> I believe AsciiDoc allows a two-character grace (either longer or
> shorter) and Asciidoctor allows one.  So this should work both places
> (although I haven't tested).
>
> It might be nice to fix if a re-roll is needed, but it should be
> functional as-is.


Thanks.

I've already locally amended it and pushed the result out, as I
needed to touch-up another commit in the same series anyway.

Re: [PATCH] Documentation: fix --color option formatting

2018-07-20 Thread brian m. carlson

On Wed, Jul 18, 2018 at 01:49:44PM -0400, Jeff King wrote:
> On Wed, Jul 18, 2018 at 07:37:48PM +0200, Andrei Rybak wrote:
> > diff --git a/Documentation/git-for-each-ref.txt 
> > b/Documentation/git-for-each-ref.txt
> > index 085d177d97..901faef1bf 100644
> > --- a/Documentation/git-for-each-ref.txt
> > +++ b/Documentation/git-for-each-ref.txt
> > @@ -57,7 +57,7 @@ OPTIONS
> > `xx`; for example `%00` interpolates to `\0` (NUL),
> > `%09` to `\t` (TAB) and `%0a` to `\n` (LF).
> >  
> > ---color[=]:
> > +--color[=]::
> > Respect any colors specified in the `--format` option. The
> > `` field must be one of `always`, `never`, or `auto` (if
> > `` is absent, behave as if `always` was given).
> 
> This is obviously the right fix.
> 
> I am guilty of not always building the documentation and eye-balling the
> output when I'm not specifically changing the formatting. I wonder if we
> could provide tooling to make that easier, by showing a diff between the
> text-formatted manpages before and after a series. I've manually hacked
> stuff up like that in the past, but there's often a lot of noise around
> date and version info in the footers.

Both AsciiDoc 8.6.10 and Asciidoctor support SOURCE_DATE_EPOCH for
reproducible builds[0], which should reduce the date noise.  We could
also add a Makefile knob to set git_version to an empty string or an
--abbrev=0 equivalent for such a situation.

[0] https://reproducible-builds.org/specs/source-date-epoch/
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [PATCH v2 00/18] Consolidate reachability logic

2018-07-20 Thread Junio C Hamano

Derrick Stolee  writes:

> There are many places in Git that use a commit walk to determine
> reachability between commits and/or refs. A lot of this logic is
> duplicated.
>
> I wanted to achieve the following:
>
> Consolidate several different commit walks into one file
> Reduce duplicate reachability logic
> Increase testability (correctness and performance)
> Improve performance of reachability queries

All of these are good goals to shoot at.

> This series is based on jt/commit-graph-per-object-store

As such, it has some interactions with another topic [*1*] that is
based on the same, but my trial merge seems to suggest that the
interactions are minimum and do not pose a serious problem.

Will push out as part of the next integration run, as I've already
pushed out today's.

Thanks.

[Footnote]

*1* ds/commit-graph-with-grafts topic that is in 'pu', slated to go
'next' soonish.

Re: [BUG] merge-recursive overly aggressive when skipping updating the working tree

2018-07-20 Thread Elijah Newren

On Fri, Jul 20, 2018 at 3:05 PM, Junio C Hamano  wrote:
> Elijah Newren  writes:
>
>> Ah, okay, that's helpful.  So, if there are conflicts, it should be
>> free to clear the skip_worktree flag.  Since merge-recursive calls
>> add_cacheinfo() for all entries it needs to update, which deletes the
>> old cache entry and just makes new ones, we get that for free.
>
> Correct.
>
>> And conversely, if a file-level merge succeeds without conflicts then
>> it clearly doesn't "need to materialize a working tree file", so it
>> should NOT clear the skip_worktree flag for that path.
>
> That is not at all implied by what I wrote, though.
>
> If it can be done without too much effort, then it certainly is
> nicer to keep the sparseness when we do not have to materialize the
> working tree file.  But at least in my mind, if it needs too many
> special cases, hacks, and conditionals, then it is not worth the
> complexity---if it is easier to write a correct code by allowing Git
> to populate working tree files, it is perfectly fine to do so.
>
> In a sense, the sparse checkout "feature" itself is a hack by
> itself, and that is why I think this part should be "best effort" as
> well.

That's good to know, but I don't think we can back out easily:
  - Clearing the skip_worktree bit: no big deal, as you mention above
  - Avoiding working tree updates when merge doesn't change them: very
desirable[1]
  - Doing both: whoops

[1] 
https://public-inbox.org/git/CA+55aFzLZ3UkG5svqZwSnhNk75=fxjrkvu1m_rhbg54nooa...@mail.gmail.com/


I don't want to regress the bug Linus reported, so to fix Ben's issue,
when we detect that a path's contents/mode won't be modified by the
merge, we can either:
  - Update the working tree file if the original cache entry had the
skip_worktree flag set
  - Mark the new cache entry as skip_worktree if the original cache
entry had the skip_worktree flag set

Both should be about the same amount of work; the first seems weird
and confusing for future readers of the code.  The second makes sense,
but probably should be accompanied with a note in the code about how
there are other codepaths that could consider skip_worktree too.

I'll see if I can put something together, but I have family flying in
tomorrow, and then am out on vacation Mon-Sat next week, so...

Re: [PATCH] Documentation: fix --color option formatting

2018-07-20 Thread Junio C Hamano

"brian m. carlson"  writes:

> Both AsciiDoc 8.6.10 and Asciidoctor support SOURCE_DATE_EPOCH for
> reproducible builds[0], which should reduce the date noise.  We could
> also add a Makefile knob to set git_version to an empty string or an
> --abbrev=0 equivalent for such a situation.
>
> [0] https://reproducible-builds.org/specs/source-date-epoch/

I probably should take advantage of this feature and update the
install target of Documentation/Makefile, where I manually filter
out these noise changes between what is installed and what is going
to be.  That serves as my final sanity check before pushing out the
result of an integration cycle that updates the 'master' branch, as
that is when the pre-formatted html and man pages are updated, and
as part of that, "make install" for these two formats are run to
give me a chance to eyeball the differences.

Being able to lose the (impricise) manual filtering would be great.

Re: [PATCH] Documentation/git-interpret-trailers: explain possible values

2018-07-20 Thread Christian Couder

On Sat, Jul 21, 2018 at 12:23 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> Signed-off-by: Stefan Beller 
>> ---
>>
>> Maybe we rather want to refer to the options that are described further
>> down in the document?
>
> I have no strong preference either way.
>
> The patch looks reasonable to me; Christian?

Yeah, it looks reasonable to me too.

I wouldn't mind also referring to the options described elsewhere, but
it could be in another patch.

Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-20 Thread Elijah Newren

On Fri, Jul 20, 2018 at 10:43 AM, Elijah Newren  wrote:
> On Fri, Jul 20, 2018 at 8:39 AM, Nguyễn Thái Ngọc Duy  
> wrote:

>> This patch provides a better fallback that is
>>
>> - cheaper in terms of cpu and io because we won't have to read
>>   existing pack files as much
>>
>> - better in terms of pack size because the pack heuristics is back to
>>   2.17.0 time, we do not drop large deltas at all
>>
>> If we encounter any delta (on-disk or created during try_delta phase)
>> that is larger than the 2MB limit, we stop using delta_size_ field for
>> this because it can't contain such size anyway. A new array of delta
>> size is dynamically allocated and can hold all the deltas that 2.17.0
>> can [1]. All current delta sizes are migrated over.
>>
>> With this, we do not have to drop deltas in try_delta() anymore. Of
>> course the downside is we use slightly more memory, even compared to
>> 2.17.0. But since this is considered an uncommon case, a bit more
>> memory consumption should not be a problem.

Is the increased memory only supposed to affect the uncommon case?  I
was able to verify that 2.18.0 used less memory than 2.17.0 (for
either my repo or linux.git), but I don't see nearly the same memory
reduction relative to 2.17.0 for linux.git with your new patches.

>> ---
>>  I'm optimistic that the slowness on linux repo is lock contention
>>  (because if it's not then I have no clue what is). So let's start
>>  doing proper patches.
>
> I'll be testing this shortly...

Using these definitions for versions:

  fix-v5: 
https://public-inbox.org/git/20180720052829.ga3...@sigill.intra.peff.net/
(plus what it depends on)
  fix-v6: The patch I'm responding to
  fix-v2: https://public-inbox.org/git/20180718225110.17639-3-new...@gmail.com/

and inspired by
https://public-inbox.org/git/87po42cwql@evledraar.gmail.com/, I
ran

   /usr/bin/time -f 'MaxRSS:%M Time:%e' git gc --aggressive

three times on a beefy box (40 cores, 160GB ram, otherwise quiet
system).  If I take the lowest MaxRSS, and the lowest time reported
(regardless of whether from the same run), then the results are as
follows for linux.git (all cases repacked down to 1279 MB, so removing
that from the table):

Version  MaxRSS(kB)  Time (s)
---  --  
 2.17.0   11413236621.36
 2.18.0   10987152621.80
 fix-v5   11105564836.29
 fix-v6   11357304831.73
 fix-v2   10873348619.96

The runtime could swing up to 10 seconds between runs.  The memory use
also had some variability, but all runs of fix-v2 had lower MaxRSS
than any runs of 2.18.0 (which surprised me); all runs of 2.18.0 used
less memory than any run of fix-v6, and all runs of fix-v6 used less
memory than any run of 2.17.0.  fix-v5 had the most variabiilty in
MaxRSS, ranging from as low as some 2.18.0 runs up to higher than any
2.17.0 runs.

...but maybe that'd change with more than 3 runs of each?

Anyway, this is a lot better than the 1193.67 seconds I saw with
fix-v4 (https://public-inbox.org/git/20180719182442.ga5...@duynguyen.home/,
which Peff fixed up into fix-v5), suggesting it is lock contention,
but we're still about 33% slower than 2.17.0 and we've lost almost all
of the memory savings.  Somewhat surprisingly, the highly simplistic
fix-v2 does a lot better on both measures.

However, I'm just concentrating on a beefy machine; it may be that v6
drastically outperforms v2 on weaker hardware?  Can others measure a
lower memory usage for v6 than v2?

Also, for the original repo with the huge deltas that started it all,
the numbers are (only run once since it takes so long):

Version  Pack (MB)  MaxRSS(kB)  Time (s)
---  -  --  
 2.17.0 5498 435136282494.85
 2.18.010531 404495964168.94
 fix-v5 5500 428057162577.50
 fiv-v6 5509 428148362605.36
 fix-v2 5509 416441042468.25

>>  If we want a quick fix for 2.18.1. I suggest bumping up
>>  OE_DELTA_SIZE_BITS like Elijah did in his second option. I don't
>>  think it's safe to fast track this patch.
>
> Okay, I'll submit that with a proper commit message then.  Since you
> also bump OE_DELTA_SIZE_BITS in your patch (though to a different
> value), it'll require a conflict resolution when merging maint into
> master, but at least the change won't silently leak through.

...except now I'm wondering if the commit message should mention that
it's just a stopgap fix for 2.18.1, or whether it's actually the fix
that we just want to use going forward as well.

Re: Hash algorithm analysis

2018-07-20 Thread Jonathan Nieder

Hi,

brian m. carlson wrote:

> I know this discussion has sort of petered out, but I'd like to see if
> we can revive it.  I'm writing index v3 and having a decision would help
> me write tests for it.

Nice!  That's awesome.

> To summarize the discussion that's been had in addition to the above,
> Ævar has also stated a preference for SHA-256 and I would prefer BLAKE2b
> over SHA-256 over SHA3-256, although any of them would be fine.
>
> Are there other contributors who have a strong opinion?  Are there
> things I can do to help us coalesce around an option?

My advice would be to go with BLAKE2b.  If anyone objects, we can deal
with their objection at that point (and I volunteer to help with
cleaning up any mess in rewriting test cases that this advice causes).

Full disclosure: my preference order (speaking for myself and no one
else) is

 K12 > BLAKE2bp-256 > SHA-256x16 > BLAKE2b > SHA-256 > SHA-512/256 > SHA3-256

so I'm not just asking you to go with my favorite. ;-)

Jonathan

Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-20 Thread Duy Nguyen

On Sat, Jul 21, 2018 at 1:52 AM Elijah Newren  wrote:
>
> On Fri, Jul 20, 2018 at 10:43 AM, Elijah Newren  wrote:
> > On Fri, Jul 20, 2018 at 8:39 AM, Nguyễn Thái Ngọc Duy  
> > wrote:
>
> >> This patch provides a better fallback that is
> >>
> >> - cheaper in terms of cpu and io because we won't have to read
> >>   existing pack files as much
> >>
> >> - better in terms of pack size because the pack heuristics is back to
> >>   2.17.0 time, we do not drop large deltas at all
> >>
> >> If we encounter any delta (on-disk or created during try_delta phase)
> >> that is larger than the 2MB limit, we stop using delta_size_ field for
> >> this because it can't contain such size anyway. A new array of delta
> >> size is dynamically allocated and can hold all the deltas that 2.17.0
> >> can [1]. All current delta sizes are migrated over.
> >>
> >> With this, we do not have to drop deltas in try_delta() anymore. Of
> >> course the downside is we use slightly more memory, even compared to
> >> 2.17.0. But since this is considered an uncommon case, a bit more
> >> memory consumption should not be a problem.
>
> Is the increased memory only supposed to affect the uncommon case?  I
> was able to verify that 2.18.0 used less memory than 2.17.0 (for
> either my repo or linux.git), but I don't see nearly the same memory
> reduction relative to 2.17.0 for linux.git with your new patches.
>
> >> ---
> >>  I'm optimistic that the slowness on linux repo is lock contention
> >>  (because if it's not then I have no clue what is). So let's start
> >>  doing proper patches.
> >
> > I'll be testing this shortly...
>
> Using these definitions for versions:
>
>   fix-v5: 
> https://public-inbox.org/git/20180720052829.ga3...@sigill.intra.peff.net/
> (plus what it depends on)
>   fix-v6: The patch I'm responding to
>   fix-v2: 
> https://public-inbox.org/git/20180718225110.17639-3-new...@gmail.com/
>
> and inspired by
> https://public-inbox.org/git/87po42cwql@evledraar.gmail.com/, I
> ran
>
>/usr/bin/time -f 'MaxRSS:%M Time:%e' git gc --aggressive
>
> three times on a beefy box (40 cores, 160GB ram, otherwise quiet
> system).  If I take the lowest MaxRSS, and the lowest time reported
> (regardless of whether from the same run), then the results are as
> follows for linux.git (all cases repacked down to 1279 MB, so removing
> that from the table):
>
> Version  MaxRSS(kB)  Time (s)
> ---  --  
>  2.17.0   11413236621.36
>  2.18.0   10987152621.80
>  fix-v5   11105564836.29
>  fix-v6   11357304831.73
>  fix-v2   10873348619.96
>
> The runtime could swing up to 10 seconds between runs.  The memory use
> also had some variability, but all runs of fix-v2 had lower MaxRSS
> than any runs of 2.18.0 (which surprised me); all runs of 2.18.0 used
> less memory than any run of fix-v6, and all runs of fix-v6 used less
> memory than any run of 2.17.0.  fix-v5 had the most variabiilty in
> MaxRSS, ranging from as low as some 2.18.0 runs up to higher than any
> 2.17.0 runs.

As Peff said, RSS is not a very good way to measure memory usage.
valgrind --tool=massif would be the best, or just grep "heap" in
/proc/$PID/smaps. fix-v2 should increase struct object_entry's size
and memory usage for all repos, while v6 has the same memory usage as
2.17.0 in heap, and extra once it hits a big delta.

> ...but maybe that'd change with more than 3 runs of each?
>
> Anyway, this is a lot better than the 1193.67 seconds I saw with
> fix-v4 (https://public-inbox.org/git/20180719182442.ga5...@duynguyen.home/,
> which Peff fixed up into fix-v5), suggesting it is lock contention,
> but we're still about 33% slower than 2.17.0 and we've lost almost all
> of the memory savings.  Somewhat surprisingly, the highly simplistic
> fix-v2 does a lot better on both measures.

Maybe we should avoid locking when the delta is small enough then and
see how it goes. This is linux.git so there's no big delta and we
would end up not locking at all.

If you do "git repack -adf --threads=8", does the time difference
between v6 and 2.17.0 reduce (suggesting we still hit lock contention
hard)?

> However, I'm just concentrating on a beefy machine; it may be that v6
> drastically outperforms v2 on weaker hardware?  Can others measure a
> lower memory usage for v6 than v2?

I'll try it with massif on linux.git, but this is a very weak laptop
(4GB) so  it'll take a while

> Also, for the original repo with the huge deltas that started it all,
> the numbers are (only run once since it takes so long):
>
> Version  Pack (MB)  MaxRSS(kB)  Time (s)
> ---  -  --  
>  2.17.0 5498 435136282494.85
>  2.18.010531 404495964168.94
>  fix-v5 5500 428057162577.50
>  fiv-v6 5509 428148362605.36
>  fix-v2 5509 416441042468.25
>
>
> >>  If we want a quick fix for 2.18.1. I suggest bumping up
> >>  OE_DELTA_SIZE_BITS like Elijah did in his second option. I don't
> >>  th

1 2 >

1 - 100 of 107 matches

Mail list logo