Re: [PATCH 1/2 v8] pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use

2016-09-13 Thread Kirill Smelkov
On Mon, Sep 12, 2016 at 11:23:18PM -0700, Junio C Hamano wrote:
> Kirill Smelkov  writes:
> 
> > +static int want_found_object(int exclude, struct packed_git *p)
> > +{
> > +   if (exclude)
> > +   return 1;
> > +   if (incremental)
> > +   return 0;
> > +
> > +   /*
> > +* When asked to do --local (do not include an object that appears in a
> > +* pack we borrow from elsewhere) or --honor-pack-keep (do not include
> > +* an object that appears in a pack marked with .keep), finding a pack
> > +* that matches the criteria is sufficient for us to decide to omit it.
> > +* However, even if this pack does not satisfy the criteria, we need to
> > +* make sure no copy of this object appears in _any_ pack that makes us
> > +* to omit the object, so we need to check all the packs.
> > +*
> > +* We can however first check whether these options can possible matter;
> > +* if they do not matter we know we want the object in generated pack.
> > +* Otherwise, we signal "-1" at the end to tell the caller that we do
> > +* not know either way, and it needs to check more packs.
> > +*/
> > +   if (!ignore_packed_keep &&
> > +   (!local || !have_non_local_packs))
> > +   return 1;
> > +
> > +   if (local && !p->pack_local)
> > +   return 0;
> > +   if (ignore_packed_keep && p->pack_local && p->pack_keep)
> > +   return 0;
> > +
> > +   /* we don't know yet; keep looking for more packs */
> > +   return -1;
> > +}
> 
> Moving this logic out to this helper made the main logic in the
> caller easier to grasp.
> 
> > @@ -958,15 +993,30 @@ static int want_object_in_pack(const unsigned char 
> > *sha1,
> >off_t *found_offset)
> >  {
> > struct packed_git *p;
> > +   int want;
> >  
> > if (!exclude && local && has_loose_object_nonlocal(sha1))
> > return 0;
> >  
> > +   /*
> > +* If we already know the pack object lives in, start checks from that
> > +* pack - in the usual case when neither --local was given nor .keep 
> > files
> > +* are present we will determine the answer right now.
> > +*/
> > +   if (*found_pack) {
> > +   want = want_found_object(exclude, *found_pack);
> > +   if (want != -1)
> > +   return want;
> > +   }
> >  
> > for (p = packed_git; p; p = p->next) {
> > +   off_t offset;
> > +
> > +   if (p == *found_pack)
> > +   offset = *found_offset;
> > +   else
> > +   offset = find_pack_entry_one(sha1, p);
> > +
> > if (offset) {
> > if (!*found_pack) {
> > if (!is_pack_valid(p))
> > @@ -974,31 +1024,9 @@ static int want_object_in_pack(const unsigned char 
> > *sha1,
> > *found_offset = offset;
> > *found_pack = p;
> > }
> > +   want = want_found_object(exclude, p);
> > +   if (want != -1)
> > +   return want;
> > }
> > }
> 
> As Peff noted in his earlier review, however, MRU code needed to be
> grafted in to the caller (an update to the MRU list was done in the
> code that was moved to the want_found_object() helper).  I think I
> did it correctly, which ended up looking like this:
> 
> want = want_found_object(exclude, p);
> if (!exclude && want > 0)
> mru_mark(packed_git_mru, entry);
> if (want != -1)
> return want;
> 
> I somewhat feel that it is ugly that the helper knows about exclude
> (i.e. in the original code, we immediately returned 1 without
> futzing with the MRU when we find an entry that is to be excluded,
> which now is done in the helper), and the caller also knows about
> exclude (i.e. the caller knows that the helper may return positive
> in two cases, it knows that MRU marking needs to happen only one of
> the two cases, and it also knows that "exclude" is what
> differentiates between the two cases) at the same time.
> 
> But probably the reason why I feel it ugly is only because I knew
> how the original looked like.  I dunno.

Junio, the code above is correct semantic merge of pack-mru and my
topic, because in pack-mru if found and exclude=1, 1 was returned
without marking found pack.

But I wonder: even if we exclude an object, we were still looking for it
in packs, and when we found it, we found the corresponding pack too. So,
that pack _was_ most-recently-used, and it is correct to mark it as MRU.

We can do the simplification in the follow-up patch after the merge, so
merge does not change semantics and it is all bisectable, etc.

Jeff?


Re: [PATCH] [git-p4.py] Add --checkpoint-period option to sync/clone

2016-09-13 Thread Luke Diamand
On 12 September 2016 at 23:02, Ori Rawlings  wrote:
> Importing a long history from Perforce into git using the git-p4 tool
> can be especially challenging. The `git p4 clone` operation is based
> on an all-or-nothing transactionality guarantee. Under real-world
> conditions like network unreliability or a busy Perforce server,
> `git p4 clone` and  `git p4 sync` operations can easily fail, forcing a
> user to restart the import process from the beginning. The longer the
> history being imported, the more likely a fault occurs during the
> process. Long enough imports thus become statistically unlikely to ever
> succeed.

That would never happen :-)

The usual thing that I find is that my Perforce login session expires.

>
> The underlying git fast-import protocol supports an explicit checkpoint
> command. The idea here is to optionally allow the user to force an
> explicit checkpoint every  seconds. If the sync/clone operation fails
> branches are left updated at the appropriate commit available during the
> latest checkpoint. This allows a user to resume importing Perforce
> history while only having to repeat at most approximately  seconds
> worth of import activity.

I think this ought to work, and could be quite useful. It would be
good to have some kind of test case for it though, and updated
documentation.

Luke

>
> Signed-off-by: Ori Rawlings 
> ---
>  git-p4.py | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/git-p4.py b/git-p4.py
> index fd5ca52..40cb64f 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -2244,6 +2244,7 @@ class P4Sync(Command, P4UserMap):
>  optparse.make_option("-/", dest="cloneExclude",
>   action="append", type="string",
>   help="exclude depot path"),
> +optparse.make_option("--checkpoint-period", 
> dest="checkpointPeriod", type="int", help="Period in seconds between explict 
> git fast-import checkpoints (by default, no explicit checkpoints are 
> performed)"),
>  ]
>  self.description = """Imports from Perforce into a git repository.\n
>  example:
> @@ -2276,6 +2277,7 @@ class P4Sync(Command, P4UserMap):
>  self.tempBranches = []
>  self.tempBranchLocation = "refs/git-p4-tmp"
>  self.largeFileSystem = None
> +self.checkpointPeriod = -1

Or use None?

>
>  if gitConfig('git-p4.largeFileSystem'):
>  largeFileSystemConstructor = 
> globals()[gitConfig('git-p4.largeFileSystem')]
> @@ -3031,6 +3033,8 @@ class P4Sync(Command, P4UserMap):
>
>  def importChanges(self, changes):
>  cnt = 1
> +if self.checkpointPeriod > -1:
> +self.lastCheckpointTime = time.time()

Could you just always set the lastCheckpointTime?

>  for change in changes:
>  description = p4_describe(change)
>  self.updateOptionDict(description)
> @@ -3107,6 +3111,10 @@ class P4Sync(Command, P4UserMap):
>  self.initialParent)
>  # only needed once, to connect to the previous commit
>  self.initialParent = ""
> +
> +if self.checkpointPeriod > -1 and time.time() - 
> self.lastCheckpointTime > self.checkpointPeriod:
> +self.checkpoint()
> +self.lastCheckpointTime = time.time()

If you use time.time(), then this could fail to work as expected if
the system time goes backwards (e.g. NTP updates). However, Python 2
doesn't have access to clock_gettime() without jumping through hoops,
so perhaps we leave this as a bug until git-p4 gets ported to Python
3.



>  except IOError:
>  print self.gitError.read()
>  sys.exit(1)
> --
> 2.7.4 (Apple Git-66)
>


Greetings to you and your family,,

2016-09-13 Thread Mr.Abdulaiye Rahmani
-- 
Greetings to you and your family,

Good Day, I am writing to request your assistance to transfer the sum
of $8, 600.000.00 (Eight million Six hundred thousand USD) into your
accounts.

Please delete it if you are not interested. The total sum will be
shared as follows: 60% for me, 40% for you. The transfer is risk free
hence you will follow my instructions till the fund get to your
account. Official application form will be forwarded to you to explain
more comprehensively what is required of you, feel free to call me
through my private telephone number.

Full Name
Sex...
Age.
Country...
Occupation..
Mobile .

Best Regards

Mr. Abdulaiye Rahmani
TEL: +226 75448132
PRIVATE EMAIL: abdulaiyerahma...@yahoo.com


RE: [PATCH v2] checkout: eliminate unnecessary merge for trivial checkout

2016-09-13 Thread Ben Peart
> -Original Message-
> From: Junio C Hamano [mailto:gits...@pobox.com]
> Sent: Monday, September 12, 2016 4:32 PM
> To: Ben Peart 
> Cc: git@vger.kernel.org; pclo...@gmail.com; 'Ben Peart'
> 
> Subject: Re: [PATCH v2] checkout: eliminate unnecessary merge for trivial
> checkout
> 
> "Ben Peart"  writes:
> 
> > I completely agree that optimizing within merge_working_tree would
> > provide more opportunities for optimization.  I can certainly move the
> > test into that function as a first step.
> 
> Note that "optimizing more" was not the primary point of my response.
> 
> Quite honestly, I'd rather see us speed up _ONLY_ obviously correct and
> commonly used cases, while leaving most cases that _MAY_ turn out to be
> optimizable (if we did careful analysis) unoptimized, and instead have
them
> handled by generic but known to be correct codepath, if it means we do NOT
> to have to spend mental bandwidth to analyze not-common case--that is a
> much better tradeoff.
> 
> The suggestion to move the check one level down in the callchain was
> primarily to avoid the proposed optimization from being overly eager and
> ending up skipping necessary parts of what merge_working_tree() does (e.g.
> like I suspected in the review that the proposed patch skips the check for
> "you have unmerged entries" situation).

The check for unmerged entries makes complete sense when you are about 
to attempt to merge different commit trees and generate an updated index 
and working directory.  This optimization however is trying to skip 
those expensive steps for the specific case of creating a new branch and 
switching to it.  In this narrow (but common) case, all that needs to 
happen is that a new ref is created and HEAD switched to it.  Since 
we're not doing a merge, I don't believe the check is necessary.




Re: [ANNOUNCE] Git User's Survey 2016

2016-09-13 Thread Kevin Daudt
On Mon, Sep 12, 2016 at 09:51:09PM +0200, Jakub Narębski wrote:
> Hello all,
> 

> 
> P.P.S. Different announcements use different URLs (different channels)
> to better track where one got information about this survey.
> 
> Thanks in advance for taking time to answer the survey,

Can we get a channel for the freenode/#git channel? I'm trying to announce the
survey there too.


Re: [ANNOUNCE] Git User's Survey 2016

2016-09-13 Thread Jakub Narębski
Hello Kevin,

On 13 September 2016 at 15:32, Kevin Daudt  wrote:
> On Mon, Sep 12, 2016 at 09:51:09PM +0200, Jakub Narębski wrote:

>>
>> P.P.S. Different announcements use different URLs (different channels)
>> to better track where one got information about this survey.
>>
>> Thanks in advance for taking time to answer the survey,
>
> Can we get a channel for the freenode/#git channel? I'm trying to announce the
> survey there too.

Here you have it:

  https://survs.com/survey/c6wjrerw87

-- 
Jakub Narebski


[PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-13 Thread Ben Peart
Teach git to avoid unnecessary merge during trivial checkout.  When
running 'git checkout -b foo' git follows a common code path through
the expensive merge_working_tree even when it is unnecessary.  As a
result, 95% of the time is spent in merge_working_tree doing the 2-way
merge between the new and old commit trees that is unneeded.

The time breakdown is as follows:

merge_working_tree <-- 95%
unpack_trees <-- 80%
traverse_trees <-- 50%
cache_tree_update <-- 17%
mark_new_skip_worktree <-- 10%

With a large repo, this cost is pronounced.  Using "git checkout -b r"
to create and switch to a new branch costs 166 seconds (all times worst
case with a cold file system cache).

git.c:406   trace: built-in: git 'checkout' '-b' 'r'
read-cache.c:1667   performance: 17.442926555 s: read_index_from
name-hash.c:128 performance: 2.912145231 s: lazy_init_name_hash
read-cache.c:2208   performance: 4.387713335 s: write_locked_index
trace.c:420 performance: 166.458921289 s: git command:
'c:\Users\benpeart\bin\git.exe' 
'checkout' '-b' 'r'
Switched to a new branch 'r'

By adding a test to skip the unnecessary call to merge_working_tree in
this case reduces the cost to 16 seconds.

git.c:406   trace: built-in: git 'checkout' '-b' 's'
read-cache.c:1667   performance: 16.100742476 s: read_index_from
trace.c:420 performance: 16.461547867 s: git command: 
'c:\Users\benpeart\bin\git.exe' 'checkout' '-b' 's'
Switched to a new branch 's'

Signed-off-by: Ben Peart 
---
 builtin/checkout.c | 92 ++
 1 file changed, 92 insertions(+)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 8672d07..8b2f428 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -38,6 +38,10 @@ struct checkout_opts {
int ignore_skipworktree;
int ignore_other_worktrees;
int show_progress;
+   /*
+* If new checkout options are added, needs_working_tree_merge
+* should be updated accordingly.
+*/
 
const char *new_branch;
const char *new_branch_force;
@@ -460,11 +464,99 @@ static void setup_branch_path(struct branch_info *branch)
branch->path = strbuf_detach(&buf, NULL);
 }
 
+static int needs_working_tree_merge(const struct checkout_opts *opts,
+   const struct branch_info *old,
+   const struct branch_info *new)
+{
+   /*
+* We must do the merge if we are actually moving to a new
+* commit tree.
+*/
+   if (!old->commit || !new->commit ||
+   oidcmp(&old->commit->tree->object.oid, 
&new->commit->tree->object.oid))
+   return 1;
+
+   /*
+* opts->patch_mode cannot be used with switching branches so is
+* not tested here
+*/
+
+   /*
+* opts->quiet only impacts output so doesn't require a merge
+*/
+
+   /*
+* Honor the explicit request for a three-way merge or to throw away
+* local changes
+*/
+   if (opts->merge || opts->force)
+   return 1;
+
+   /*
+* Checking out the requested commit may require updating the working
+* directory and index, let the merge handle it.
+*/
+   if (opts->force_detach)
+   return 1;
+
+   /*
+* opts->writeout_stage cannot be used with switching branches so is
+* not tested here
+*/
+
+   /*
+* Honor the explicit ignore requests
+*/
+   if (!opts->overwrite_ignore || opts->ignore_skipworktree ||
+   opts->ignore_other_worktrees)
+   return 1;
+
+   /*
+* opts->show_progress only impacts output so doesn't require a merge
+*/
+
+   /*
+* If we're not creating a new branch, by definition we're changing
+* the existing one so need to do the merge
+*/
+   if (!opts->new_branch)
+   return 1;
+
+   /*
+* new_branch_force is defined to "create/reset and checkout a branch"
+* so needs to go through the merge to do the reset
+*/
+   if (opts->new_branch_force)
+   return 1;
+
+   /*
+* A new orphaned branch requrires the index and the working tree to be
+* adjusted to 
+*/
+   if (opts->new_orphan_branch)
+   return 1;
+
+   /*
+* Remaining variables are not checkout options but used to track state
+* that doesn't trigger the need for a merge.
+*/
+
+   return 0;
+}
+
 static int merge_working_tree(const struct checkout_opts *opts,
  struct branch_info *old,
  struct branch_info *new,
  int *writeout_error)
 {
+   /*
+* Optimize the performance of "git checkout -b foo" by avoiding
+* the expens

Re: [RFC/PATCH 01/17] diff: move line ending check into emit_hunk_header

2016-09-13 Thread René Scharfe

Am 13.09.2016 um 06:45 schrieb Stefan Beller:

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This patch moves code that is conceptually part of
emit_hunk_header, but was using output in fn_out_consume,
back to emit_hunk_header.

Signed-off-by: Stefan Beller 
---
 diff.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index cc8e812..aa50b2d 100644
--- a/diff.c
+++ b/diff.c
@@ -610,6 +610,9 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
}

strbuf_add(&msgbuf, line + len, org_len - len);
+   if (line[org_len - 1] != '\n')
+   strbuf_addch(&msgbuf, '\n');
+


Using strbuf_complete_line() would be nicer.


emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
strbuf_release(&msgbuf);
 }
@@ -1247,8 +1250,6 @@ static void fn_out_consume(void *priv, char *line, 
unsigned long len)
len = sane_truncate_line(ecbdata, line, len);
find_lno(line, ecbdata);
emit_hunk_header(ecbdata, line, len);
-   if (line[len-1] != '\n')
-   putc('\n', o->file);
return;
}






Re: [PATCH v7 10/10] convert: add filter..process option

2016-09-13 Thread Torsten Bögershausen
On 12.09.16 11:49, Lars Schneider wrote:

>> How do we send pathnames the have '\n' ?
>> Not really recommended, but allowed.
>> And here I am a little bit lost, is each of the lines packed into
>> a pkt-line ?
>> command=smudge is packet as pkt-line and pathname= is packed into
>> another one ? (The we don't need the '\n' at all)
> 
> Every line is a dedicated packet. That's why '\n' in a path name would
> not be a problem as the receiver is expected to read the entire packet
> when parsing the value (and the receiver knows the packet length, too).
> 
> The '\n' at the end is required by the pkt-line format:
> "A non-binary line SHOULD BE terminated by an LF..."
> (see protocol-common.txt)

That is only the half part of the story:
A non-binary line SHOULD BE terminated by an LF, which if present
MUST be included in the total length. Receivers MUST treat pkt-lines
with non-binary data the same whether or not they contain the trailing
LF (stripping the LF if present, and not complaining when it is
missing).


How do we treat pathnames ?
They can have each byte value except '\0'.
What should a receiver do, which reads a string like "ABC\n\n" ?
Is it "ABC\n" or "ABC\n\n" ?

I would really consider to treat pathnames as binary, and not add a trailing 
'\n',
are there other opinions ?
 







Re: [ANNOUNCE] Git User's Survey 2016

2016-09-13 Thread Kevin Daudt
On Tue, Sep 13, 2016 at 03:52:28PM +0200, Jakub Narębski wrote:
> Hello Kevin,
> 
> On 13 September 2016 at 15:32, Kevin Daudt  wrote:
> > On Mon, Sep 12, 2016 at 09:51:09PM +0200, Jakub Narębski wrote:
> 
> >>
> >> P.P.S. Different announcements use different URLs (different channels)
> >> to better track where one got information about this survey.
> >>
> >> Thanks in advance for taking time to answer the survey,
> >
> > Can we get a channel for the freenode/#git channel? I'm trying to announce 
> > the
> > survey there too.
> 
> Here you have it:
> 
>   https://survs.com/survey/c6wjrerw87
> 
> -- 
> Jakub Narebski

Thanks.


Re: [PATCH v7 10/10] convert: add filter..process option

2016-09-13 Thread Junio C Hamano
larsxschnei...@gmail.com writes:

> diff --git a/contrib/long-running-filter/example.pl 
> b/contrib/long-running-filter/example.pl
> ...
> +sub packet_read {
> +my $buffer;
> +my $bytes_read = read STDIN, $buffer, 4;
> +if ( $bytes_read == 0 ) {
> +
> +# EOF - Git stopped talking to us!
> +exit();
> +...
> +packet_write( "clean=true\n" );
> +packet_write( "smudge=true\n" );
> +packet_flush();
> +
> +while (1) {

These extra SP around the contents inside () pair look unfamiliar
and somewhat strange to me, but as long as they are consistently
done (and I think you are mostly being consistent), it is OK.

> +#define CAP_CLEAN(1u<<0)
> +#define CAP_SMUDGE   (1u<<1)

As these are meant to be usable together, i.e. bits in a single flag
word, they are of type "unsigned int", which makes perfect sense.

Make sure your variables and fields that store them are of the same
type.  I think I saw "int' used to pass them in at least one place.

> +static int apply_filter(const char *path, const char *src, size_t len,
> +int fd, struct strbuf *dst, struct convert_driver 
> *drv,
> +const int wanted_capability)
> +{
> + const char* cmd = NULL;

"const char *cmd = NULL;" of course.

> diff --git a/unpack-trees.c b/unpack-trees.c
> index 11c37fb..f6798f8 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -10,6 +10,7 @@
>  #include "attr.h"
>  #include "split-index.h"
>  #include "dir.h"
> +#include "convert.h"
>  
>  /*
>   * Error messages expected by scripts out of plumbing commands such as

Why?  The resulting file seems to compile without this addition.


Re: git-am includes escape characters from 'From' field

2016-09-13 Thread Jeff King
On Mon, Sep 12, 2016 at 10:10:23PM +0200, Swift Geek wrote:

> git-am seems to add backslash that escapes double quote character, example
> git format-patch
> 
> From 63da989a5295214f9bd06cd7b409a86a65241eea Mon Sep 17 00:00:00 2001
> From: "Sebastian \"Swift Geek\" Grzywna" 

This looks correct; the output of format-patch is an rfc2822 message,
and it requires this quoting.

The part you don't show, and that I think is wrong, is that if you then
"git am" this patch, it pulls the backslashes into the commit object.
The culprit looks like "parse_mail()" in builtin/am.c (or possibly
mailinfo() that it calls), which blindly picks up the name portion
without doing any rfc2822 de-quoting.

I don't think we have any existing de-quoting routines to plug in, so
fixing it would probably start with writing one.

-Peff


Re: [PATCH v7 00/10] Git filter protocol

2016-09-13 Thread Junio C Hamano
larsxschnei...@gmail.com writes:

> From: Lars Schneider 
>
> The goal of this series is to avoid launching a new clean/smudge filter
> process for each file that is filtered.

I tentatively queued this in 'pu' because I wanted to see these
changes in context and also wanted to know if there are overlaps and
conflicts with other topics in flight.  As your earlier steps
renamed packet_write() to packet_write_fmt() but didn't add a new
packet_write() that has different semantics, it was pleasantly easy
to make sure there is no new caller added in the meantime (it did
conflict with Duy's shallow-deepen topic and the resolution had to
touch outside the <<< conflicted === regions >>>, but it was
otherwise trivial).

Some details of the protocol (I think Torsten has already pointed
out that not all paths can be representable with the current
incarnation; there may be other minor nits like that) may still need
to be worked out, but I think the series at this point gets the
basic code structure right and such additional fixes would not
change things too drastically (IOW, I think we are very close to
being 'next'-ready).

Thanks.



RE: [ANNOUNCE] Git User's Survey 2016

2016-09-13 Thread David Bainbridge
Hi Jakub,

You said:
P.S. At request I can open a separate channel in survey, with a separate survey 
URL, so that responses from particular site or organization could be separated 
out.

Please can you open a channel for use by Ericsson?

Many thanks

David

DAVID BAINBRIDGE
Product Manager SW Development
Strategic Product Manager, Configuration and Compliance Management
BNEP EITTE SP&S Platforms&Infrastructure

Ericsson
8500 Decarie
Montreal, H4P 2N2, Canada
Phone +1 514 345 7900 x42014
Mobile +1 438 990 2452
Office ECN 810 42014
david.bainbri...@ericsson.com
www.ericsson.com

-Original Message-
From: Jakub Narębski [mailto:jna...@gmail.com] 
Sent: Monday, September 12, 2016 15:51
To: git@vger.kernel.org
Cc: Doug Rathbone ; David Bainbridge 
; Stefan Beller ; Andrew Ardill 
; Eric Wong 
Subject: [ANNOUNCE] Git User's Survey 2016

Hello all,

We would like to ask you a few questions about your use of the Git version 
control system. This survey is mainly to understand who is using Git, how and 
why.

The results will be published to the Git wiki on the GitSurvey2016 page 
(https://git.wiki.kernel.org/index.php/GitSurvey2016) and discussed on the git 
mailing list.


The survey would be open from 12 September to 20 October 2016.


Please devote a few minutes of your time to fill this simple questionnaire, it 
will help a lot the git community to understand your needs, what you like of 
Git, and of course what you don't like.

The survey can be found here:
  https://tinyurl.com/GitSurvey2016
  https://survs.com/survey/lmo7ed3439

There is also alternate version which does not require cookies, but it doesn't 
allow one to go back to response and edit it.
  https://tinyurl.com/GitSurvey2016-anon
  https://survs.com/survey/naeec8kwd8


P.S. At request I can open a separate channel in survey, with a separate survey 
URL, so that responses from particular site or organization could be separated 
out.

Please send me a email with name of channel, and I will return with a separate 
survey URL to use.  Note that the name of the channel would be visible to 
others.

P.P.S. Different announcements use different URLs (different channels) to 
better track where one got information about this survey.

Thanks in advance for taking time to answer the survey,
--
Jakub Narębski
on behalf of
Git Development Community


Re: [PATCH v2 09/14] i18n: notes: mark error messages for translation

2016-09-13 Thread Vasco Almeida
A Seg, 12-09-2016 às 14:23 +0200, Jean-Noël Avila escreveu:
> Le 12/09/2016 à 13:29, Vasco Almeida a écrit :
> > 
> > Signed-off-by: Vasco Almeida 
> > ---
> >  builtin/notes.c | 18 +-
> >  1 file changed, 9 insertions(+), 9 deletions(-)
> > 
> > diff --git a/builtin/notes.c b/builtin/notes.c
> > index f848b89..abacae2 100644
> > --- a/builtin/notes.c
> > +++ b/builtin/notes.c
> > @@ -340,7 +340,7 @@ static struct notes_tree
> > *init_notes_check(const char *subcommand,
> >  
> >     ref = (flags & NOTES_INIT_WRITABLE) ? t->update_ref : t-
> > >ref;
> >     if (!starts_with(ref, "refs/notes/"))
> > -   die("Refusing to %s notes in %s (outside of
> > refs/notes/)",
> > +   die(_("Refusing to %s notes in %s (outside of
> > refs/notes/)"),
> >     subcommand, ref);
> >     return t;
> >  }
> 
> Not sure this one will be easy to localize. The verb is passed as a
> parameter : see line 366 "list", line 426 "add", line 517 "copy",
> line
> 658 "show", line 816 "merge", line 908 "remove" or line 595 with
> argv[0].
> 
> If all the verbs are real subcommands, then the translators should be
> warned that this is some english twisting, but that they need to
> refer
> to the subcommand on the command line.

Yes, these verbs are git notes subcommands. I will add a Translators
comment to it explaining so. Or we can unfold that error messages like

if (!strcmp(subcommand, "add")
die(_("Refusing to add notes in %s (outside of refs/notes/)"),
ref);
elseif ...

else
die(_("Refusing to %s notes in %s (outside of refs/notes/)"),
subcommand, ref);

This is more verbose but translations would benefit from it being more
natural. What do we prefer: (1) concise source and a little unnatural
translations or (2) verbose code and natural translations?

Compare, imaging that English is a target translation language, the
user would read:
"Refusing to do add of notes in /path [...]" (1)
"Refusing do add notes in /path [...]" (2)

> Otherwise,
> 
> Acked-by: Jean-Noël Avila 
> 
> JN


Re: [PATCH v2] ls-files: adding support for submodules

2016-09-13 Thread Junio C Hamano
Brandon Williams  writes:

> Allow ls-files to recognize submodules in order to retrieve a list of
> files from a repository's submodules.  This is done by forking off a
> process to recursively call ls-files on all submodules. Also added an
> output-path-prefix command in order to prepend paths to child processes.
>
> Signed-off-by: Brandon Williams 

> @@ -68,6 +71,21 @@ static void write_eolinfo(const struct cache_entry *ce, 
> const char *path)
>  static void write_name(const char *name)
>  {
>   /*
> +  * NEEDSWORK: To make this thread-safe, full_name would have to be owned
> +  * by the caller.
> +  *
> +  * full_name get reused across output lines to minimize the allocation
> +  * churn.
> +  */
> + static struct strbuf full_name = STRBUF_INIT;
> + if (output_path_prefix != '\0') {
> + strbuf_reset(&full_name);
> + strbuf_addstr(&full_name, output_path_prefix);
> + strbuf_addstr(&full_name, name);
> + name = full_name.buf;
> + }

At first glance it was surprising that no test caught this lack of
dereference; the reason is because you initialize output_path_prefix
to an empty string, not NULL, causing full_name.buf always used,
which does not have an impact on the output.

I think initializing it to NULL is a more typical way to say "this
option has not been given", and if you took that route, the
condition would become

if (output_path_prefix && *output_path_prefix) {
...

In any case, the fact that only this much change was required to add
output-path-prefix shows two good things: (1) the original code was
already well structured, funneling any pathname we need to emit
through this single function so that we can do this kind of updates,
and (2) the author of the patch was competent to spot this single
point that needs to be updated.

Nice.

> + status = run_command(&cp);
> + if (status)
> + exit(status);

run_command()'s return value comes from either start_command() or
finish_command().  These signal failure by returning a non-zero
value, and in practice they are negative small integers.  Feeding
negative value to exit() is not quite kosher.  Perhaps exit(128)
to mimick as if we called die() is better.

If your primary interest is to support the "find in the working tree
files that are tracked, recursively in submodules" grep, I think
this "when we hit a submodule, spawn a separate ls-files in there"
is sufficient and a solid base to build on it.

On the other hand, if you are more ambitious and "grep" is merely an
example of things that can be helped by having a list of paths
across module boundaries, we may want to "libify" ls-files in such a
way that a single process can instantiate one or more instances of
"ls-files machinery", that takes which repository to work in and
other arguments that specifies which paths to report, and instead of
always showing the result to the standard output, makes repeated
calls to a callback function to report the discovered path and other
attributes associated with the path that were asked for (the object
name, values of tag_*, etc.), without spawning a separate "ls-files"
process.

The latter would be a lot bigger task and I do not necessarily think
it is needed, but that is one possible future direction to keep in
mind.

Thanks, will queue with a minimum fix.


[PATCH] strbuf: use valid pointer in strbuf_remove()

2016-09-13 Thread René Scharfe
The fourth argument of strbuf_splice() is passed to memcpy(3), which is
not supposed to handle NULL pointers.  Let's be extra careful and use a
valid empty string instead.  It even shortens the source code. :)

Signed-off-by: Rene Scharfe 
---
 strbuf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/strbuf.c b/strbuf.c
index f3bd571..b839be4 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -187,7 +187,7 @@ void strbuf_insert(struct strbuf *sb, size_t pos, const 
void *data, size_t len)
 
 void strbuf_remove(struct strbuf *sb, size_t pos, size_t len)
 {
-   strbuf_splice(sb, pos, len, NULL, 0);
+   strbuf_splice(sb, pos, len, "", 0);
 }
 
 void strbuf_add(struct strbuf *sb, const void *data, size_t len)
-- 
2.10.0


Re: [PATCH v7 10/10] convert: add filter..process option

2016-09-13 Thread Junio C Hamano
Torsten Bögershausen  writes:

> I would really consider to treat pathnames as binary, and not add a trailing 
> '\n',
> are there other opinions ?

It would be the most consistent if the same format as
write_name_quoted() is used for this, I would think.


[PATCH] pathspec: removed unnecessary function prototypes

2016-09-13 Thread Brandon Williams
removed function prototypes from pathspec.h which don't have a
corresponding implementation.

Signed-off-by: Brandon Williams 
---
 pathspec.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/pathspec.h b/pathspec.h
index 4a80f6f..59809e4 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -96,7 +96,5 @@ static inline int ps_strcmp(const struct pathspec_item *item,
 
 extern char *find_pathspecs_matching_against_index(const struct pathspec 
*pathspec);
 extern void add_pathspec_matches_against_index(const struct pathspec 
*pathspec, char *seen);
-extern const char *check_path_for_gitlink(const char *path);
-extern void die_if_path_beyond_symlink(const char *path, const char *prefix);
 
 #endif /* PATHSPEC_H */
-- 
2.8.0.rc3.226.g39d4020



Re: [ANNOUNCE] Git User's Survey 2016

2016-09-13 Thread Jakub Narębski
On 13 September 2016 at 18:15, David Bainbridge
 wrote:
> Hi Jakub,
>
> You said:
> P.S. At request I can open a separate channel in survey, with
> a separate survey URL, so that responses from particular site
> or organization could be separated out.
>
> Please can you open a channel for use by Ericsson?

Sent (privately to David).

Best regards,
-- 
Jakub Narębski


Re: [PATCH v2 07/14] i18n: merge-recursive: mark error messages for translation

2016-09-13 Thread Vasco Almeida
A Seg, 12-09-2016 às 09:04 -0700, Junio C Hamano escreveu:
> Vasco Almeida  writes:
> 
> > 
> > Lowercase first word of such error messages following the usual
> > style.
> 
> "Change X to lowercase" is fine, but "Lowercase" is not a verb.
> 
> Reword it to "Downcase the first word...", perhaps (not limited to
> this step).

Lowercase is a verb [1] meaning to print or write with a lowercase
letter or letters. Knowing that can the commit message be kept?

[1] http://www.dictionary.com/browse/lowercase


[PATCH] checkout: constify parameters of checkout_stage() and checkout_merged()

2016-09-13 Thread René Scharfe
Document the fact that checkout_stage() and checkout_merged() don't
change the objects passed to them by adding the modifier const.

Signed-off-by: Rene Scharfe 
---
 builtin/checkout.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 8672d07..afbff3e 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -154,8 +154,8 @@ static int check_stages(unsigned stages, const struct 
cache_entry *ce, int pos)
return 0;
 }
 
-static int checkout_stage(int stage, struct cache_entry *ce, int pos,
- struct checkout *state)
+static int checkout_stage(int stage, const struct cache_entry *ce, int pos,
+ const struct checkout *state)
 {
while (pos < active_nr &&
   !strcmp(active_cache[pos]->name, ce->name)) {
@@ -169,7 +169,7 @@ static int checkout_stage(int stage, struct cache_entry 
*ce, int pos,
return error(_("path '%s' does not have their version"), 
ce->name);
 }
 
-static int checkout_merged(int pos, struct checkout *state)
+static int checkout_merged(int pos, const struct checkout *state)
 {
struct cache_entry *ce = active_cache[pos];
const char *path = ce->name;
-- 
2.10.0



Bug

2016-09-13 Thread Mike Hawes
To whom this may concern,

I found a bug in git while trying to push my website.

I redid the process and it happened again.

I also tried it on another computer and it happened again.

I was wondering how to claim a bug?

Thank you,


Michael Hawes


Re: Bug

2016-09-13 Thread Santiago Torres
Hi, Michael.

It would be helpful to get more context on what triggered this bug. I'm
not a 'core' dev, so there may be a better way to send this. In general,
you want to state the following:

0) Information about your git installation, host system, etc.
1) Information about your repo (was it GitHub? local? self-hosted?)
2) What did you do? (git push origin master? git push?)
3) What happened instead of working? (the error message would be
   helpful.

Hope this helps.

Cheers!
-Santiago.

On Tue, Sep 13, 2016 at 01:18:52PM -0400, Mike Hawes wrote:
> To whom this may concern,
> 
> I found a bug in git while trying to push my website.
> 
> I redid the process and it happened again.
> 
> I also tried it on another computer and it happened again.
> 
> I was wondering how to claim a bug?
> 
> Thank you,
> 
> 
> Michael Hawes


[PATCH] unpack-trees: pass checkout state explicitly to check_updates()

2016-09-13 Thread René Scharfe
Add a parameter for the struct checkout variable to check_updates()
instead of using a static global variable.  Passing it explicitly makes
object ownership and usage more easily apparent.  And we get rid of a
static variable; those can be problematic in library-like code.

Signed-off-by: Rene Scharfe 
---
 unpack-trees.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 11c37fb..74d6dd4 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -218,8 +218,8 @@ static void unlink_entry(const struct cache_entry *ce)
schedule_dir_for_removal(ce->name, ce_namelen(ce));
 }
 
-static struct checkout state;
-static int check_updates(struct unpack_trees_options *o)
+static int check_updates(struct unpack_trees_options *o,
+const struct checkout *state)
 {
unsigned cnt = 0, total = 0;
struct progress *progress = NULL;
@@ -264,7 +264,7 @@ static int check_updates(struct unpack_trees_options *o)
display_progress(progress, ++cnt);
ce->ce_flags &= ~CE_UPDATE;
if (o->update && !o->dry_run) {
-   errs |= checkout_entry(ce, &state, NULL);
+   errs |= checkout_entry(ce, state, NULL);
}
}
}
@@ -1094,6 +1094,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
struct unpack_trees_options
int i, ret;
static struct cache_entry *dfc;
struct exclude_list el;
+   struct checkout state;
 
if (len > MAX_UNPACK_TREES)
die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
@@ -1239,7 +1240,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
struct unpack_trees_options
}
 
o->src_index = NULL;
-   ret = check_updates(o) ? (-2) : 0;
+   ret = check_updates(o, &state) ? (-2) : 0;
if (o->dst_index) {
if (!ret) {
if (!o->result.cache_tree)
-- 
2.10.0



[PATCH] sha1_file: use llist_mergesort() for sorting packs

2016-09-13 Thread René Scharfe
Sort the linked list of packs directly using llist_mergesort() instead
of building an array, calling qsort(3) and fixing up the list pointers.
This is shorter and less complicated.

Signed-off-by: Rene Scharfe 
---
Peff: Or do you have other plans, e.g. to replace packed_git with
packed_git_mru completely?

 sha1_file.c | 39 +++
 1 file changed, 15 insertions(+), 24 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 472ccb2..66dccaa 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -25,6 +25,7 @@
 #include "dir.h"
 #include "mru.h"
 #include "list.h"
+#include "mergesort.h"
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -1380,10 +1381,20 @@ static void prepare_packed_git_one(char *objdir, int 
local)
strbuf_release(&path);
 }
 
+static void *get_next_packed_git(const void *p)
+{
+   return ((const struct packed_git *)p)->next;
+}
+
+static void set_next_packed_git(void *p, void *next)
+{
+   ((struct packed_git *)p)->next = next;
+}
+
 static int sort_pack(const void *a_, const void *b_)
 {
-   struct packed_git *a = *((struct packed_git **)a_);
-   struct packed_git *b = *((struct packed_git **)b_);
+   const struct packed_git *a = a_;
+   const struct packed_git *b = b_;
int st;
 
/*
@@ -1410,28 +1421,8 @@ static int sort_pack(const void *a_, const void *b_)
 
 static void rearrange_packed_git(void)
 {
-   struct packed_git **ary, *p;
-   int i, n;
-
-   for (n = 0, p = packed_git; p; p = p->next)
-   n++;
-   if (n < 2)
-   return;
-
-   /* prepare an array of packed_git for easier sorting */
-   ary = xcalloc(n, sizeof(struct packed_git *));
-   for (n = 0, p = packed_git; p; p = p->next)
-   ary[n++] = p;
-
-   qsort(ary, n, sizeof(struct packed_git *), sort_pack);
-
-   /* link them back again */
-   for (i = 0; i < n - 1; i++)
-   ary[i]->next = ary[i + 1];
-   ary[n - 1]->next = NULL;
-   packed_git = ary[0];
-
-   free(ary);
+   packed_git = llist_mergesort(packed_git, get_next_packed_git,
+set_next_packed_git, sort_pack);
 }
 
 static void prepare_packed_git_mru(void)
-- 
2.10.0



Re: [PATCH] sha1_file: use llist_mergesort() for sorting packs

2016-09-13 Thread Jeff King
On Tue, Sep 13, 2016 at 07:54:42PM +0200, René Scharfe wrote:

> Sort the linked list of packs directly using llist_mergesort() instead
> of building an array, calling qsort(3) and fixing up the list pointers.
> This is shorter and less complicated.

Makes sense.

> Peff: Or do you have other plans, e.g. to replace packed_git with
> packed_git_mru completely?

Nope. I haven't looked into it, but I think there would be some benefit
to replacing packed_git_mru with the code in list.h. But I don't see any
benefit in replacing packed_git with that, or with the MRU itself (once
list.h is in use, one _could_ shove the MRU into packed_git itself, but
I think we would still retain the original link order for reference).

Thanks for asking.

-Peff


Re: [PATCH] strbuf: use valid pointer in strbuf_remove()

2016-09-13 Thread Jeff King
On Tue, Sep 13, 2016 at 06:40:22PM +0200, René Scharfe wrote:

> The fourth argument of strbuf_splice() is passed to memcpy(3), which is
> not supposed to handle NULL pointers.  Let's be extra careful and use a
> valid empty string instead.  It even shortens the source code. :)

Heh. Looks obviously correct and like a good thing to do.

-Peff


Re: [PATCH] pathspec: removed unnecessary function prototypes

2016-09-13 Thread Jeff King
On Tue, Sep 13, 2016 at 09:52:51AM -0700, Brandon Williams wrote:

> removed function prototypes from pathspec.h which don't have a
> corresponding implementation.

I'm always curious of the "why" in cases like this. Did we forget to add
them? Did they get renamed? Did they go away?

Looks like the latter; 5a76aff (add: convert to use parse_pathspec,
2013-07-14) just forgot to remove them.

-Peff


Re: [PATCH] checkout: constify parameters of checkout_stage() and checkout_merged()

2016-09-13 Thread Jeff King
On Tue, Sep 13, 2016 at 07:11:52PM +0200, René Scharfe wrote:

> Document the fact that checkout_stage() and checkout_merged() don't
> change the objects passed to them by adding the modifier const.

Hmm. Sometimes these big "context" objects are hard to make const,
because we end up using them to hold or pass state between functions
(e.g., see diff_options). So I'd worry slightly that we'll end up
un-consting this in the long run.

That being said, it is easy to revert, and it provides some small
benefit, so I don't mind it in the meantime.

-Peff


Re: [ANNOUNCE] Git User's Survey 2016

2016-09-13 Thread Lars Schneider

> On 13 Sep 2016, at 17:54, Jakub Narębski  wrote:
> 
> On 13 September 2016 at 18:15, David Bainbridge
>  wrote:
>> Hi Jakub,
>> 
>> You said:
>> P.S. At request I can open a separate channel in survey, with
>> a separate survey URL, so that responses from particular site
>> or organization could be separated out.
>> 
>> Please can you open a channel for use by Ericsson?
> 
> Sent (privately to David).

Could you send me a channel for Autodesk, too?

Thank you,
Lars

Re: [PATCH v2 09/14] i18n: notes: mark error messages for translation

2016-09-13 Thread Jean-Noël AVILA
On mardi 13 septembre 2016 16:35:05 CEST Vasco Almeida wrote:
> A Seg, 12-09-2016 às 14:23 +0200, Jean-Noël Avila escreveu:
> > Not sure this one will be easy to localize. The verb is passed as a
> > parameter : see line 366 "list", line 426 "add", line 517 "copy",
> > line
> > 658 "show", line 816 "merge", line 908 "remove" or line 595 with
> > argv[0].
> > 
> > If all the verbs are real subcommands, then the translators should be
> > warned that this is some english twisting, but that they need to
> > refer
> > to the subcommand on the command line.
> 
> Yes, these verbs are git notes subcommands. I will add a Translators
> comment to it explaining so. Or we can unfold that error messages like
> 
> if (!strcmp(subcommand, "add")
>   die(_("Refusing to add notes in %s (outside of refs/notes/)"),
> ref);
> elseif ...
> 
> else
>   die(_("Refusing to %s notes in %s (outside of refs/notes/)"),
> subcommand, ref);

This would be counter productive to use the inject strings as keys just to 
test them just after.

> 
> This is more verbose but translations would benefit from it being more
> natural. What do we prefer: (1) concise source and a little unnatural
> translations or (2) verbose code and natural translations?
> 
> Compare, imaging that English is a target translation language, the
> user would read:
> "Refusing to do add of notes in /path [...]" (1)
> "Refusing do add notes in /path [...]" (2)

Having one sentence per action is cumbersome, but avoiding sentence lego is 
mandatory for proper i18n. How about  just adding quotes around the subcommand 
and warn translators ? 






Re: [RFC 0/3] http: avoid repeatedly adding curl easy to curlm

2016-09-13 Thread Junio C Hamano
Eric Wong  writes:

> The key patch here is 3/3 which seems like an obvious fix to
> adding the problem of adding a curl easy handle to a curl multi
> handle repeatedly.

Yeah, sounds like the right thing to do and 2/3 makes it really easy
to read the resulting code.

> I will investigate those failures in a week or two when I regain
> regular computer access.

Thanks. Will tentatively queue on 'pu' and wait for updates.



Re: [ANNOUNCE] Git User's Survey 2016

2016-09-13 Thread Jakub Narębski
On 13 September 2016 at 22:11, Lars Schneider  wrote:
>> On 13 Sep 2016, at 17:54, Jakub Narębski  wrote:
>> On 13 September 2016 at 18:15, David Bainbridge
>>  wrote:
>>> Hi Jakub,
>>>
>>> You said:
>>> P.S. At request I can open a separate channel in survey, with
>>> a separate survey URL, so that responses from particular site
>>> or organization could be separated out.
>>>
>>> Please can you open a channel for use by Ericsson?
>>
>> Sent (privately to David).
>
> Could you send me a channel for Autodesk, too?

Here it is:

  https://survs.com/survey/c51qiuw394

Please tell me if you want the channel anonymized in survey results
(after its closing).
-- 
Jakub Narębski


Re: [PATCH 01/16] t1007: factor out repeated setup

2016-09-13 Thread Stefan Beller
On Mon, Sep 12, 2016 at 8:23 PM, Jeff King  wrote:
> We have a series of 3 CRLF tests that do exactly the same
> (long) setup sequence. Let's pull it out into a common setup
> test, which is shorter, more efficient, and will make it
> easier to add new tests.
>
> Note that we don't have to worry about cleaning up any of
> the setup which was previously per-test; we call pop_repo
> after the CRLF tests, which cleans up everything.
>
> Signed-off-by: Jeff King 

This makes sense and looks good,
Thanks,

Reviewed-by: Stefan Beller 


Left with empty files after "git stash pop" when system hung

2016-09-13 Thread Daniel Hahler
I have used "git stash --include-untracked", checked out another branch,
went back, and "git stash pop"ed the changes.
Then my system crashed/hung (music that was playing was repeated in a
loop).  I have waited for some minutes, and then turned it off.

Afterwards, the repository in question was in a state where all files
contained in the stash were empty.
"git status" looked good on first sight: all the untracked and modified
files were listed there; but they were empty.

  % git fsck --lost-found
  error: object file .git/objects/04/1e659b5dbfd3f0be351a782b54743692875aec is 
empty
  error: object file .git/objects/04/1e659b5dbfd3f0be351a782b54743692875aec is 
empty
  fatal: loose object 041e659b5dbfd3f0be351a782b54743692875aec (stored in 
.git/objects/04/1e659b5dbfd3f0be351a782b54743692875aec) is corrupt
  % find .git/objects -size 0|wc -l
  12

I would have assumed that the "stash pop" operation would be "atomic",
i.e. it should not remove the stash object before other objects have
been written successfully.

The filesystem in question is ext4, and I am using Arch Linux.

I have removed all empty files in .git/objects and tried to find the
previous stash with `gitk --all $( git fsck | awk '{print $3}' )` then,
but it appears to have disappeared.

Please CC me in replies.


Cheers,
Daniel.

-- 
http://daniel.hahler.de/



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 06/16] diff: always try to set up the repository

2016-09-13 Thread Stefan Beller
On Mon, Sep 12, 2016 at 8:23 PM, Jeff King  wrote:

>   2. If you're in a subdirectory of a repository, then we
>  still try to read ".git/config", but it generally
>  doesn't exist. So "diff --no-index" there does not
>  respect repo config.

Nit:
So IIUC your cover letter even this /used/ to work but
broke only recently? So I feel like the message is a bit
misleading (i.e. you argue for a change in behavior instead of
calling it a bug fix for a regression. I think a bug fix for a regression
is harder to revert as compared to a "new" behavior)

I agree on the code, though.


Re: [PATCH v2 09/14] i18n: notes: mark error messages for translation

2016-09-13 Thread Junio C Hamano
Jean-Noël AVILA  writes:

>> Yes, these verbs are git notes subcommands
>
> Having one sentence per action is cumbersome, but avoiding sentence lego is 
> mandatory for proper i18n. How about  just adding quotes around the 
> subcommand 
> and warn translators ? 

I think that is a sensible way to go.  I do not think it adds value
to "translate" the action names that needs to be typed verbatim in
the message.



Re: [PATCH v7 10/10] convert: add filter..process option

2016-09-13 Thread Lars Schneider

> On 10 Sep 2016, at 17:40, Torsten Bögershausen  wrote:
> 
> []
> 
> One general question up here, more comments inline.
> The current order for a clean-filter is like this, I removed the error 
> handling:
> 
> int convert_to_git()
> {
>   ret |= apply_filter(path, src, len, -1, dst, filter);
>   if (ret && dst) {
>   src = dst->buf;
>   len = dst->len;
>   }
>   ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
>   return ret | ident_to_git(path, src, len, dst, ca.ident);
> }
> 
> The first step is the clean filter, the CRLF-LF conversion (if needed),
> then ident.
> The current implementation streams the whole file content to the filter,
> (STDIN of the filter) and reads back STDOUT from the filter into a STRBUF.
> This is to use the UNIX-like STDIN--STDOUT method for writing a filter.
> 
> However, int would_convert_to_git_filter_fd() and convert_to_git_filter_fd()
> offer a sort of short-cut:
> The filter reads from the file directly, and the output of the filter is
> read into a STRBUF.

Are you sure? As far as I understand the code the filter does not read from 
the file in any case today.  The functions would_convert_to_git_filter_fd() and 
convert_to_git_filter_fd() just avoid avoid mapping the file in Git. The 
content 
is still streamed via pipes:
https://github.com/git/git/commit/9035d75a2be9d80d82676504d69553245017f6d4


> It looks as if the multi-filter approach can use this in a similar way:
> Give the pathname to the filter, the filter opens the file for reading
> and stream the result via the pkt-line protocol into Git.
> This needs some more changes, and may be very well go into a separate patch
> series. (and should).
> 
> What I am asking for:
> When a multi-filter is used, the content is handled to the filter via 
> pkt-line,
> and the result is given to Git via pkt-line ?
> Nothing wrong with it, I just wonder, if it should be mentioned somewhere.

That is most certainly a good idea and the main reason I added "capabilities"
to the protocol. Joey Hess worked on this topic (not merged, yet) and I would 
like to make this available to the long-running filter protocol as soon as the
feature is available:
http://public-inbox.org/git/1468277112-9909-1-git-send-email-jo...@joeyh.name/


>> +sub packet_read {
>> +my $buffer;
>> +my $bytes_read = read STDIN, $buffer, 4;
>> +if ( $bytes_read == 0 ) {
>> +
>> +# EOF - Git stopped talking to us!
>> +exit();
>> +}
>> +elsif ( $bytes_read != 4 ) {
>> +die "invalid packet size '$bytes_read' field";
>> +}
> 
> This is half-kosher, I would say,
> (And I really. really would like to see an implementation in C ;-)

Would you be willing to contribute a patch? :-)


> A read function may look like this:
> 
>   ret = read(0, &buffer, 4);
>   if (ret < 0) {
> /* Error, nothing we can do */
> exit(1);
>   } else if (ret == 0) {
> /* EOF  */
> exit(0);
>   } else if (ret < 4) {
> /* 
>  * Go and read more, until we have 4 bytes or EOF or Error */
>   } else {
> /* Good case, see below */
>   }

I see. However, my intention was to provide an absolute minimal
example to teach a reader how the protocol works. I consider
all proper error handling an exercise for the reader ;-)


>> +#define CAP_CLEAN(1u<<0)
>> +#define CAP_SMUDGE   (1u<<1)
> 
> Is CAP_ too generic, and GIT_FILTER_CAP (or so) less calling for trouble ?

I had something like that but Junio suggested these names in V4:
http://public-inbox.org/git/xmqq8twd8uld@gitster.mtv.corp.google.com/


>> +
>> +err = (strlen(filter_type) > PKTLINE_DATA_MAXLEN);
> 
> Extra () needed ?
> More () in the code...

I thought it might improve readability, but I will remove them
if you think this would be more consistent with existing Git code.


Thanks,
Lars

Re: [PATCH v7 04/10] pkt-line: add packet_flush_gently()

2016-09-13 Thread Lars Schneider

> On 13 Sep 2016, at 00:30, Junio C Hamano  wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> From: Lars Schneider 
>> 
>> packet_flush() would die in case of a write error even though for some
>> callers an error would be acceptable. Add packet_flush_gently() which
>> writes a pkt-line flush packet and returns `0` for success and `-1` for
>> failure.
>> ...
>> +int packet_flush_gently(int fd)
>> +{
>> +packet_trace("", 4, 1);
>> +if (write_in_full(fd, "", 4) == 4)
>> +return 0;
>> +error("flush packet write failed");
>> +return -1;
> 
> It is more idiomatic to do
> 
>   return error(...);
> 
> but more importantly, does the caller even want an error message
> unconditionally printed here?
> 
> I suspect that it is a strong sign that the caller wants to be in
> control of when and what error message is produced; otherwise it
> wouldn't be calling the _gently() variant, no?

Agreed!

Thanks,
Lars


Re: [PATCH 16/16] init: reset cached config when entering new repo

2016-09-13 Thread Stefan Beller
I have reviewed all patches though I am no expert on
the init routines. They all look good except for the one
nit I noted for the commit message in patch 6.
With that, the whole series is:
Reviewed-by: Stefan Beller 

Thanks!


Re: [PATCH 06/16] diff: always try to set up the repository

2016-09-13 Thread Jeff King
On Tue, Sep 13, 2016 at 03:00:17PM -0700, Stefan Beller wrote:

> On Mon, Sep 12, 2016 at 8:23 PM, Jeff King  wrote:
> 
> >   2. If you're in a subdirectory of a repository, then we
> >  still try to read ".git/config", but it generally
> >  doesn't exist. So "diff --no-index" there does not
> >  respect repo config.
> 
> Nit:
> So IIUC your cover letter even this /used/ to work but
> broke only recently? So I feel like the message is a bit
> misleading (i.e. you argue for a change in behavior instead of
> calling it a bug fix for a regression. I think a bug fix for a regression
> is harder to revert as compared to a "new" behavior)

No, this has always been broken. What broke recently is the tests
covered in patch 14 related to git-init.

IOW, we have always done this "blind read" of .git/config, ever since
the early days. It was always wrong, but was mostly overlooked because
it worked often enough and usually didn't cause other problems. But
because of the caching and lazy-reading done by git_config() these days
(and get_shared_repository() which builds on it), you can get quite
confusing and buggy effects if you lazy-read at the wrong time.

-Peff


Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-13 Thread Junio C Hamano
Ben Peart  writes:

> +static int needs_working_tree_merge(const struct checkout_opts *opts,
> + const struct branch_info *old,
> + const struct branch_info *new)
> +{
> +...
> +}

I do not think I need to repeat the same remarks on the conditions
in this helper, which hasn't changed since v2.  Many "comments" in
the code do not explain why skipping is justified, or what they
claim to check looks to me just plain wrong.

For example, there is

   /*
* If we're not creating a new branch, by definition we're changing
* the existing one so need to do the merge
*/
   if (!opts->new_branch)
   return 1;

but "git checkout" (no other argument) hits this condition.  It
disables the most trivial optimization opportunity, because we are
not "creating".

"By definition, we're changing"?  Really?  Not quite.

If you disable this bogus check, "git checkout" (no other argument)
would be allowed to skip the merge_working_tree(), and that in turn
reveals another case that the helper is not checking when
unpack_trees() MUST be called.

Note: namely, when sparse checkout is in effect, switching from
HEAD to HEAD can nuke existing working tree files outside the
sparse pattern -- YUCK!  See penultimate test in t1011 for
an example.

This yuckiness is not your fault, but needs_working_tree_merge()
logic you added needs to refrain from skipping unpack_trees() call
when sparse thing is in effect.  I'd expect "git checkout -b foo"
instead of "git checkout" (no other argument) would fail to honor
the sparse thing and reveal this bug, because the above bogus
"!opts->new_branch" check will not protect you for that case.

In other words, these random series of "if (...) return 1" are bugs
hiding other real bugs and we need to reason about which ones are
bugs that are hiding what other bugs that are not covered by this
function.  As Peff said earlier for v1, this is still an unreadable
mess.  We need to figure out a way to make sure we are skipping on
the right condition and not accidentally hiding a bug of failing to
check the right condition.  I offhand do not have a good suggestion
on this; sorry.

>  static int merge_working_tree(const struct checkout_opts *opts,
> struct branch_info *old,
> struct branch_info *new,
> int *writeout_error)
>  {
> + /*
> +  * Optimize the performance of "git checkout -b foo" by avoiding
> +  * the expensive merge, index and working directory updates if they
> +  * are not needed.
> +  */
> + if (!needs_working_tree_merge(opts, old, new))
> + return 0;
> +
>   int ret;
>   struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));

With the change you made at the beginning of this function, it no
longer compiles with -Wdecl-after-stmt, but that is the smallest of
the problems.

It is a small step in the right direction to move the call to the
helper from the caller to this function, but it is a bit too small.

Notice that the lines after the above context look like this:

hold_locked_index(lock_file, 1);
if (read_cache_preload(NULL) < 0)
return error(_("index file corrupt"));

resolve_undo_clear();
if (opts->force) {
ret = reset_tree(new->commit->tree, opts, 1, writeout_error);
if (ret)
return ret;
} else {
struct tree_desc trees[2];
...

I would have expected that the check goes inside the "else" thing
that actually does a two-tree merge, and the helper loses the check
with opts->force, at least.  That would still be a change smaller
than desired, but at least a meaningful improvement compared to the
previous one.  As I have already pointed out, in the "else" clause
there is a check "is the index free of conflicted entries? if so
error out", and that must be honored in !opt->force case, no matter
what your needs_working_tree_merge() says.  I also was hoping that
you would notice, when you were told about the unmerged check, by
reading the remainder of the merge_working_tree(), that we need to
call show_local_changes() when we are not doing force and when we
are not quiet---returning early like the above patch will never be
able to call that one downstream in the function.

Regardless of what the actual checks end up to be, the right place
to do this "optimization" would look more like:

 builtin/checkout.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 2b50a49..a6b9e17 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -508,14 +508,19 @@ static int merge_working_tree(const struct checkout_opts 
*opts,
topts.dir->flags |= DIR_SHOW_IGNORED;
setup_standard_excludes(topts.dir);
}
+
+   if ( we kno

Re: [RFC/PATCH 01/17] diff: move line ending check into emit_hunk_header

2016-09-13 Thread Stefan Beller
On Tue, Sep 13, 2016 at 7:42 AM, René Scharfe  wrote:

>>
>> strbuf_add(&msgbuf, line + len, org_len - len);
>> +   if (line[org_len - 1] != '\n')
>> +   strbuf_addch(&msgbuf, '\n');
>> +
>
>
> Using strbuf_complete_line() would be nicer.

That makes sense!

Thanks,
Stefan


Re: [PATCH v7 04/10] pkt-line: add packet_flush_gently()

2016-09-13 Thread Junio C Hamano
Lars Schneider  writes:

>> On 13 Sep 2016, at 00:30, Junio C Hamano  wrote:
>> 
>> larsxschnei...@gmail.com writes:
>> 
>>> From: Lars Schneider 
>>> 
>>> packet_flush() would die in case of a write error even though for some
>>> callers an error would be acceptable. Add packet_flush_gently() which
>>> writes a pkt-line flush packet and returns `0` for success and `-1` for
>>> failure.
>>> ...
>>> +int packet_flush_gently(int fd)
>>> +{
>>> +   packet_trace("", 4, 1);
>>> +   if (write_in_full(fd, "", 4) == 4)
>>> +   return 0;
>>> +   error("flush packet write failed");
>>> +   return -1;
>> 
>> It is more idiomatic to do
>> 
>>  return error(...);
>> 
>> but more importantly, does the caller even want an error message
>> unconditionally printed here?
>> 
>> I suspect that it is a strong sign that the caller wants to be in
>> control of when and what error message is produced; otherwise it
>> wouldn't be calling the _gently() variant, no?
>
> Agreed!

I am also OK with the current form, too.  Those who need to enhance
it to packet_flush_gently(int fd, int quiet) can come later.



Re: [RFC/PATCH 05/17] diff.c: emit_line_0 can handle no color setting

2016-09-13 Thread Junio C Hamano
Stefan Beller  writes:

> In a later patch, I want to propose an option to detect&color
> moved lines in a diff, which cannot be done in a one-pass over
> the diff. Instead we need to go over the whole diff twice,
> because we cannot detect the first line of the two corresponding
> lines (+ and -) that got moved.
>
> So to prepare the diff machinery for two pass algorithms
> (i.e. buffer it all up and then operate on the result),
> move all emissions to places, such that the only emitting
> function is emit_line_0.
>
> In later patches we may pass lines that are not colored to
> the central function emit_line_0, so we
> need to emit the color only when it is non-NULL.

Explained this way, a reader would find that this step is here
before the underlying code is ready--we are not even buffering
at this step yet.

But that is OK.  It used to be that passing "" as set/reset was the
way to get a --no-color output.  Now you can pass NULL instead of
empty strings.  That would be an alternative explanation why this is
an acceptable change (as your later step probably has a good reason
why it cannot pass "" instead of NULL).

>
> Signed-off-by: Stefan Beller 
> ---
>  diff.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index b6a40ae..5d57130 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -473,11 +473,13 @@ static void emit_line_0(struct diff_options *o, const 
> char *set, const char *res
>   }
>  
>   if (len || !nofirst) {
> - fputs(set, file);
> + if (set)
> + fputs(set, file);
>   if (!nofirst)
>   fputc(first, file);
>   fwrite(line, len, 1, file);
> - fputs(reset, file);
> + if (reset)
> + fputs(reset, file);
>   }
>   if (has_trailing_carriage_return)
>   fputc('\r', file);


Re: [RFC/PATCH 06/17] diff.c: convert fn_out_consume to use emit_line_*

2016-09-13 Thread Junio C Hamano
Stefan Beller  writes:

> In a later patch, I want to propose an option to detect&color
> moved lines in a diff, which cannot be done in a one-pass over
> the diff. Instead we need to go over the whole diff twice,
> because we cannot detect the first line of the two corresponding
> lines (+ and -) that got moved.
>
> So to prepare the diff machinery for two pass algorithms
> (i.e. buffer it all up and then operate on the result),
> move all emissions to places, such that the only emitting
> function is emit_line_0.
>
> This covers the remaining parts of fn_out_consume.

name_x_tab are colored as before, which you are already aware of and
we'd need to find a way to handle, but other than that, this is a
no-op conversion, getting us closer to the goal of making everything
go through a single funnel.

>   name_a_tab = strchr(ecbdata->label_path[0], ' ') ? "\t" : "";
>   name_b_tab = strchr(ecbdata->label_path[1], ' ') ? "\t" : "";
> -
> - fprintf(o->file, "%s%s--- %s%s%s\n",
> - line_prefix, meta, ecbdata->label_path[0], reset, 
> name_a_tab);
> - fprintf(o->file, "%s%s+++ %s%s%s\n",
> - line_prefix, meta, ecbdata->label_path[1], reset, 
> name_b_tab);
> + emit_line_fmt(o, meta, reset, "--- %s%s\n",
> +   ecbdata->label_path[0], name_a_tab);
> + emit_line_fmt(o, meta, reset, "+++ %s%s\n",
> +   ecbdata->label_path[1], name_b_tab);
>   ecbdata->label_path[0] = ecbdata->label_path[1] = NULL;
>   }


Re: [RFC/PATCH 10/17] submodule.c: convert show_submodule_summary to use emit_line_fmt

2016-09-13 Thread Junio C Hamano
Stefan Beller  writes:

> In a later patch, I want to propose an option to detect&color
> moved lines in a diff, which cannot be done in a one-pass over
> the diff. Instead we need to go over the whole diff twice,
> because we cannot detect the first line of the two corresponding
> lines (+ and -) that got moved.
>
> So to prepare the diff machinery for two pass algorithms
> (i.e. buffer it all up and then operate on the result),
> move all emissions to places, such that the only emitting
> function is emit_line_0.
>
> This prepares the code for submodules to go through the
> emit_line_0 function.
>
> Signed-off-by: Stefan Beller 
> ---

I wonder how this interacts with the jk/diff-submodule-diff-inline
topic by Jacob that has graduated recently to 'master'.  IIRC, it
just lets a separate "git diff" instance that is spawned in the
submodule directory emit its findings to the output of the driving
"git diff" in the superproject.



Re: [RFC/PATCH 16/17] diff: buffer output in emit_line_0

2016-09-13 Thread Junio C Hamano
Stefan Beller  writes:

> +struct line_emission {
> + const char *set;
> + const char *line;
> + const char *ws;
> + const char *reset;
> + int first;
> + int len;
> + int whitespace_check;
> + unsigned ws_rule;
> + int has_trailing_carriage_return;
> + int has_trailing_newline;
> +};

It is somewhat strange to see whitespace things are per-line here.
I'd understand it if it were per-path, though.


Re: [PATCH] pathspec: removed unnecessary function prototypes

2016-09-13 Thread Junio C Hamano
Jeff King  writes:

> On Tue, Sep 13, 2016 at 09:52:51AM -0700, Brandon Williams wrote:
>
>> removed function prototypes from pathspec.h which don't have a
>> corresponding implementation.
>
> I'm always curious of the "why" in cases like this. Did we forget to add
> them? Did they get renamed? Did they go away?
>
> Looks like the latter; 5a76aff (add: convert to use parse_pathspec,
> 2013-07-14) just forgot to remove them.

Thanks for digging.


Re: [RFC/PATCH 10/17] submodule.c: convert show_submodule_summary to use emit_line_fmt

2016-09-13 Thread Stefan Beller
On Tue, Sep 13, 2016 at 4:02 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> In a later patch, I want to propose an option to detect&color
>> moved lines in a diff, which cannot be done in a one-pass over
>> the diff. Instead we need to go over the whole diff twice,
>> because we cannot detect the first line of the two corresponding
>> lines (+ and -) that got moved.
>>
>> So to prepare the diff machinery for two pass algorithms
>> (i.e. buffer it all up and then operate on the result),
>> move all emissions to places, such that the only emitting
>> function is emit_line_0.
>>
>> This prepares the code for submodules to go through the
>> emit_line_0 function.
>>
>> Signed-off-by: Stefan Beller 
>> ---
>
> I wonder how this interacts with the jk/diff-submodule-diff-inline
> topic by Jacob that has graduated recently to 'master'.  IIRC, it
> just lets a separate "git diff" instance that is spawned in the
> submodule directory emit its findings to the output of the driving
> "git diff" in the superproject.
>

The easiest way to find out is to merge HEAD^ of this patch series
(i.e. "diff: buffer output in emit_line_0") with whatever we suspect can
cause breakage for the goal of channeling everything though emit_line_*
functions. Looking at that series, I think I'll have to redo
this (maybe even including sb/diff-cleanup, to have it all in one series)
to capture all output there.

I suspected a breakage, but as the patch series grew larger and larger,
I first wanted to get into a working state before paying attention to solving
conflicts as resolving conflicts is easier when I know where this series is
headed.

Thanks!
Stefan


Re: [PATCH] unpack-trees: pass checkout state explicitly to check_updates()

2016-09-13 Thread Junio C Hamano
René Scharfe  writes:

> Add a parameter for the struct checkout variable to check_updates()
> instead of using a static global variable.  Passing it explicitly makes
> object ownership and usage more easily apparent.  And we get rid of a
> static variable; those can be problematic in library-like code.
> ...
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 11c37fb..74d6dd4 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -218,8 +218,8 @@ static void unlink_entry(const struct cache_entry *ce)
>   schedule_dir_for_removal(ce->name, ce_namelen(ce));
>  }
>  
> -static struct checkout state;

> ...
> @@ -1094,6 +1094,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> struct unpack_trees_options
>   int i, ret;
>   static struct cache_entry *dfc;
>   struct exclude_list el;
> + struct checkout state;

Does the distinction between this thing in BSS implicitly cleared
and the new one on stack that does not seem to have any
initialization matter?

... goes and looks ...

OK, after this hunk we clear and set up everything in state, so
there is no difference in behaviour.  Just we got rid of an
unnecessary file-scope global.

Nice.  Thanks.
  
>   if (len > MAX_UNPACK_TREES)
>   die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
> @@ -1239,7 +1240,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> struct unpack_trees_options
>   }
>  
>   o->src_index = NULL;
> - ret = check_updates(o) ? (-2) : 0;
> + ret = check_updates(o, &state) ? (-2) : 0;
>   if (o->dst_index) {
>   if (!ret) {
>   if (!o->result.cache_tree)


Re: [RFC/PATCH 16/17] diff: buffer output in emit_line_0

2016-09-13 Thread Stefan Beller
On Tue, Sep 13, 2016 at 4:06 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> +struct line_emission {
>> + const char *set;
>> + const char *line;
>> + const char *ws;
>> + const char *reset;
>> + int first;
>> + int len;
>> + int whitespace_check;
>> + unsigned ws_rule;
>> + int has_trailing_carriage_return;
>> + int has_trailing_newline;
>> +};
>
> It is somewhat strange to see whitespace things are per-line here.
> I'd understand it if it were per-path, though.

Yeah we have to have it at least per path as that is the granularity
the user can configure it.

So would we rather want to keep the ecbdata around for each file pair and
just reference that? I thought we deliberately want to avoid ecbdata, so maybe
we rather want to have another struct that keeps path related information
around (pointer to the blob and white space information).

Thanks,
Stefan


Re: [RFC/PATCH 16/17] diff: buffer output in emit_line_0

2016-09-13 Thread Junio C Hamano
Stefan Beller  writes:

> So would we rather want to keep the ecbdata around for each file pair and
> just reference that? I thought we deliberately want to avoid ecbdata, so maybe
> we rather want to have another struct that keeps path related information
> around (pointer to the blob and white space information).

I would expect that there would be two structs, one per path
"struct buffered_patch" that has the per-path thing, and another per
line "struct buffered_patch_line" that describes what each line is,
and has a pointer to the former.





[RFC 0/1] de-quote quoted-strings in mailinfo

2016-09-13 Thread Kevin Daudt
This is my first 'big' C patch, so first an RFC. 

This patch implements RFC2822 dequoting of quoted-pairs in quoted
strings, which was not done yet. This means removing the "\" as escape
character from header fields, but only quoted strings, and comments
(text between braces).

According to the RFC, comments can also appear in square brackets in the
e-mail domain, but that has not been implemented. In fact, just like
other functions, it just looks at the whole header line.

Please let me know what you think.

Kevin Daudt (1):
  mailinfo: de-quote quoted-pair in header fields

 mailinfo.c | 46 ++
 t/t5100-mailinfo.sh|  5 +
 t/t5100/quoted-pair.expect |  5 +
 t/t5100/quoted-pair.in |  9 +
 t/t5100/quoted-pair.info   |  5 +
 5 files changed, 70 insertions(+)
 create mode 100644 t/t5100/quoted-pair.expect
 create mode 100644 t/t5100/quoted-pair.in
 create mode 100644 t/t5100/quoted-pair.info

-- 
2.10.0.rc2



Re: [RFC/PATCH 16/17] diff: buffer output in emit_line_0

2016-09-13 Thread Stefan Beller
On Tue, Sep 13, 2016 at 4:32 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> So would we rather want to keep the ecbdata around for each file pair and
>> just reference that? I thought we deliberately want to avoid ecbdata, so 
>> maybe
>> we rather want to have another struct that keeps path related information
>> around (pointer to the blob and white space information).
>
> I would expect that there would be two structs, one per path
> "struct buffered_patch" that has the per-path thing, and another per
> line "struct buffered_patch_line" that describes what each line is,
> and has a pointer to the former.
>

Heh, I was trying to come up with a clever thing to save that pointer,
as we would need to have that pointer once per line, so in large patches
that would save a bit of space, but probably I should not try to be too
smart about it.

So I'd split up the struct line_emission into the two proposed
buffered_patch_line as well as buffered_patch.

However the naming is a bit off than I would expect. Historically you
had one patch per file, so it was natural to name a change of multiple
files a "patchset" (c.f. a commit in Gerrit is called "patchset"/revision)

Today as Git is quite successful, one "patch" is easily understood
as the equivalent of one patch, i.e. what format-patch produced.

So I'd prefer to go with buffer_filepair and buffer_line maybe?


[RFC 0/1] mailinfo: de-quote quoted-pair in header fields

2016-09-13 Thread Kevin Daudt
rfc2822 has provisions for quoted strings in structured header fields,
but also allows for escaping these with so-called quoted-pairs.

git currently does not do anything with this at all, and verbatim takes
over the field body.

Make sure to properly dequote these quoted-strings and comments.

Signed-off-by: Kevin Daudt 
---
 mailinfo.c | 46 ++
 t/t5100-mailinfo.sh|  5 +
 t/t5100/quoted-pair.expect |  5 +
 t/t5100/quoted-pair.in |  9 +
 t/t5100/quoted-pair.info   |  5 +
 5 files changed, 70 insertions(+)
 create mode 100644 t/t5100/quoted-pair.expect
 create mode 100644 t/t5100/quoted-pair.in
 create mode 100644 t/t5100/quoted-pair.info

diff --git a/mailinfo.c b/mailinfo.c
index e19abe3..3b7ae8a 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -445,6 +445,51 @@ static void decode_header(struct mailinfo *mi, struct 
strbuf *it)
mi->input_error = -1;
 }
 
+static int unescape_quoted_pair(struct mailinfo *mi, struct strbuf *line)
+{
+   struct strbuf outbuf = STRBUF_INIT;
+   const char *in = line->buf;
+   int c, skip=0;
+   char escape_context=0;
+
+   while ((c = *in++) != 0) {
+   if (!skip) {
+   switch (c) {
+   case '"':
+   if (!escape_context)
+   escape_context = '"';
+   else if (escape_context == '"')
+   escape_context = 0;
+   break;
+   case '\\':
+   if (escape_context) {
+   skip = 1;
+   continue;
+   }
+   break;
+   case '(':
+   if (!escape_context)
+   escape_context = '(';
+   break;
+   case ')':
+   if (escape_context == '(')
+   escape_context = 0;
+   break;
+   }
+   } else {
+   skip = 0;
+   }
+
+   strbuf_addch(&outbuf, c);
+   }
+
+   strbuf_reset(line);
+   strbuf_addbuf(line, &outbuf);
+
+   return 0;
+
+}
+
 static int check_header(struct mailinfo *mi,
const struct strbuf *line,
struct strbuf *hdr_data[], int overwrite)
@@ -461,6 +506,7 @@ static int check_header(struct mailinfo *mi,
 */
strbuf_add(&sb, line->buf + len + 2, line->len - len - 
2);
decode_header(mi, &sb);
+   unescape_quoted_pair(mi, &sb);
handle_header(&hdr_data[i], &sb);
ret = 1;
goto check_header_out;
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 1a5a546..2be61bf 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -142,4 +142,9 @@ test_expect_success 'mailinfo unescapes with --mboxrd' '
test_cmp expect mboxrd/msg
 '
 
+test_expect_success 'mailinfo unescapes rfc2822 quoted-pair' '
+git mailinfo /dev/null /dev/null <"$TEST_DIRECTORY"/t5100/quoted-pair.in 
>"$TEST_DIRECTORY"/t5100/quoted-pair.info &&
+test_cmp "$TEST_DIRECTORY"/t5100/quoted-pair.expect 
"$TEST_DIRECTORY"/t5100/quoted-pair.info
+'
+
 test_done
diff --git a/t/t5100/quoted-pair.expect b/t/t5100/quoted-pair.expect
new file mode 100644
index 000..9fe72e9
--- /dev/null
+++ b/t/t5100/quoted-pair.expect
@@ -0,0 +1,5 @@
+Author: "Author "The Author" Name"
+Email: someb...@example.com
+Subject: testing quoted-pair
+Date: Sun, 25 May 2008 00:38:18 -0700
+
diff --git a/t/t5100/quoted-pair.in b/t/t5100/quoted-pair.in
new file mode 100644
index 000..e2e627a
--- /dev/null
+++ b/t/t5100/quoted-pair.in
@@ -0,0 +1,9 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: "Author \"The Author\" Name" 
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] testing quoted-pair
+
+
+
+---
+patch
diff --git a/t/t5100/quoted-pair.info b/t/t5100/quoted-pair.info
new file mode 100644
index 000..9fe72e9
--- /dev/null
+++ b/t/t5100/quoted-pair.info
@@ -0,0 +1,5 @@
+Author: "Author "The Author" Name"
+Email: someb...@example.com
+Subject: testing quoted-pair
+Date: Sun, 25 May 2008 00:38:18 -0700
+
-- 
2.10.0.rc2



[RFC 0/1] de-quote quoted-strings in mailinfo

2016-09-13 Thread Kevin Daudt
This is my first 'big' C patch, so first an RFC. 

This patch implements RFC2822 dequoting of quoted-pairs in quoted
strings, which was not done yet. This means removing the "\" as escape
character from header fields, but only quoted strings, and comments
(text between braces).

According to the RFC, comments can also appear in square brackets in the
e-mail domain, but that has not been implemented. In fact, just like
other functions, it just looks at the whole header line.

Please let me know what you think.

Kevin Daudt (1):
  mailinfo: de-quote quoted-pair in header fields

 mailinfo.c | 46 ++
 t/t5100-mailinfo.sh|  5 +
 t/t5100/quoted-pair.expect |  5 +
 t/t5100/quoted-pair.in |  9 +
 t/t5100/quoted-pair.info   |  5 +
 5 files changed, 70 insertions(+)
 create mode 100644 t/t5100/quoted-pair.expect
 create mode 100644 t/t5100/quoted-pair.in
 create mode 100644 t/t5100/quoted-pair.info

-- 
2.10.0.rc2



Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields

2016-09-13 Thread Junio C Hamano
Kevin Daudt  writes:

> +static int unescape_quoted_pair(struct mailinfo *mi, struct strbuf *line)
> +{
> + struct strbuf outbuf = STRBUF_INIT;
> + const char *in = line->buf;
> + int c, skip=0;
> + char escape_context=0;

Have SP around '=', i.e.

int c, skip = 0;
char escape_context = 0;

> + while ((c = *in++) != 0) {
> + if (!skip) {
> + switch (c) {
> + case '"':
> +...
> + break;
> + }
> + } else {
> + skip = 0;
> + }
> +
> + strbuf_addch(&outbuf, c);
> + }

It often is easier to read if smaller of the two are in the if part
and the larger in else part.  Also your switch/case is indented one
level too deep.  I.e.

while (...) {
if (skip) {
skip = 0;
} else {
switch (c) {
case '"':
do this;
...
}
}
strbuf_addch(...);
}

I found the variable name "skip" a bit hard to reason about.  What
it does is to signal the next round of the processing that we have
seen a single-byte quote and it should keep the byte it will get, no
matter what its value is.  It is "skipping" the conditional
processing, but I'd imagine most people would consider it is
"keeping the byte".

> @@ -461,6 +506,7 @@ static int check_header(struct mailinfo *mi,
>*/
>   strbuf_add(&sb, line->buf + len + 2, line->len - len - 
> 2);
>   decode_header(mi, &sb);
> + unescape_quoted_pair(mi, &sb);
>   handle_header(&hdr_data[i], &sb);
>   ret = 1;
>   goto check_header_out;

I wonder why this call is only in here, not on other headers that
all call decode_header().  For that matter, I wonder if the call (or
the logic of the helper function itself) should go at the end of
decode_header().  After all, this is different kind of decoding; the
current one knows how to do b/q encoding but forgot about the more
traditional quoting done with backslash, and you are teaching the
code that the current decoding it does is insufficient and how to
handle the one that the original implementors forgot about.

Thanks.


Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields

2016-09-13 Thread Kevin Daudt
On Tue, Sep 13, 2016 at 05:04:45PM -0700, Junio C Hamano wrote:
> Kevin Daudt  writes:
> 
> 
> It often is easier to read if smaller of the two are in the if part
> and the larger in else part.  Also your switch/case is indented one
> level too deep.  I.e.
> 

Thanks, I've switched the order and fixed indentation.

> 
> I found the variable name "skip" a bit hard to reason about.  What
> it does is to signal the next round of the processing that we have
> seen a single-byte quote and it should keep the byte it will get, no
> matter what its value is.  It is "skipping" the conditional
> processing, but I'd imagine most people would consider it is
> "keeping the byte".

Yes, I agree and was trying to find a better name. I have renamed it to
"take_next_literally", which indicates better what it means.

> 
> > @@ -461,6 +506,7 @@ static int check_header(struct mailinfo *mi,
> >  */
> > strbuf_add(&sb, line->buf + len + 2, line->len - len - 
> > 2);
> > decode_header(mi, &sb);
> > +   unescape_quoted_pair(mi, &sb);
> > handle_header(&hdr_data[i], &sb);
> > ret = 1;
> > goto check_header_out;
> 
> I wonder why this call is only in here, not on other headers that
> all call decode_header().  For that matter, I wonder if the call (or
> the logic of the helper function itself) should go at the end of
> decode_header().  After all, this is different kind of decoding; the
> current one knows how to do b/q encoding but forgot about the more
> traditional quoting done with backslash, and you are teaching the
> code that the current decoding it does is insufficient and how to
> handle the one that the original implementors forgot about.

Makes sense, it should be applied to all headers (I missed the other
decode_header calls).

I will send a new version later.


Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields

2016-09-13 Thread Jeff King
On Tue, Sep 13, 2016 at 05:04:45PM -0700, Junio C Hamano wrote:

> > @@ -461,6 +506,7 @@ static int check_header(struct mailinfo *mi,
> >  */
> > strbuf_add(&sb, line->buf + len + 2, line->len - len - 
> > 2);
> > decode_header(mi, &sb);
> > +   unescape_quoted_pair(mi, &sb);
> > handle_header(&hdr_data[i], &sb);
> > ret = 1;
> > goto check_header_out;
> 
> I wonder why this call is only in here, not on other headers that
> all call decode_header().  For that matter, I wonder if the call (or
> the logic of the helper function itself) should go at the end of
> decode_header().  After all, this is different kind of decoding; the
> current one knows how to do b/q encoding but forgot about the more
> traditional quoting done with backslash, and you are teaching the
> code that the current decoding it does is insufficient and how to
> handle the one that the original implementors forgot about.

It has been a while since I looked at rfc2822, but aren't the quoting
and syntax rules different for addresses versus other headers? We would
not want to dequote a Subject header, I think.

-Peff


Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields

2016-09-13 Thread Jeff King
On Wed, Sep 14, 2016 at 01:46:12AM +0200, Kevin Daudt wrote:

> diff --git a/t/t5100/quoted-pair.expect b/t/t5100/quoted-pair.expect
> new file mode 100644
> index 000..9fe72e9
> --- /dev/null
> +++ b/t/t5100/quoted-pair.expect
> @@ -0,0 +1,5 @@
> +Author: "Author "The Author" Name"
> +Email: someb...@example.com
> +Subject: testing quoted-pair
> +Date: Sun, 25 May 2008 00:38:18 -0700

So obviously this is much better than including the backslashed quotes.
But I have to wonder why the first line is not:

  Author: Author "The Author" Name

Who is responsible for stripping out the other quotes? I know that they
_do_ get stripped out even in the current code, but it is not clear to
me if that is intentional or an accident.

In Git's world-view (e.g., in commit headers), an ident name continues
until we get to the "<" of the email (or a "\n" terminates the header
line completely). So if mailinfo is converting rfc2822 headers into Git
ident, I'd expect it to fully remove any quotes that are not intended to
be in the name, and everything after "Author: " up to the newline would
become the name.

It's entirely possible I'm missing something subtle about the design of
mailinfo, though.

-Peff


Re: [RFC 0/1] mailinfo: de-quote quoted-pair in header fields

2016-09-13 Thread Junio C Hamano
Jeff King  writes:

> It has been a while since I looked at rfc2822, but aren't the quoting
> and syntax rules different for addresses versus other headers? We would
> not want to dequote a Subject header, I think.

You're absolutely right.  RFC2822 does not quite _want_ to dequote
anything.  As you pointed out in a separate message, we are the one
who want to strip out "" quoting when mailinfo says

Author: "Jeff King"

to its standard output (aka "info"), and turn it into

GIT_AUTHOR_NAME='Jeff King'

and do so ONLY for the author name.

So I would think it is the responsibility of the one that reads the
"info" file that is produced by mailinfo to dequote the backslash
thing if the mailinfo gave us

Author: "Jeff \"Peff\" King"



Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

2016-09-13 Thread Oleg Taranenko
Sorry for bothering, why not introduce a brand new option like git
checkout -b foo --skip-worktree-merge for such rare optimization use
case?

On Wed, Sep 14, 2016 at 12:34 AM, Junio C Hamano  wrote:
> Ben Peart  writes:
>
>> +static int needs_working_tree_merge(const struct checkout_opts *opts,
>> + const struct branch_info *old,
>> + const struct branch_info *new)
>> +{
>> +...
>> +}
>
> I do not think I need to repeat the same remarks on the conditions
> in this helper, which hasn't changed since v2.  Many "comments" in
> the code do not explain why skipping is justified, or what they
> claim to check looks to me just plain wrong.
>
> For example, there is
>
>/*
> * If we're not creating a new branch, by definition we're changing
> * the existing one so need to do the merge
> */
>if (!opts->new_branch)
>return 1;
>
> but "git checkout" (no other argument) hits this condition.  It
> disables the most trivial optimization opportunity, because we are
> not "creating".
>
> "By definition, we're changing"?  Really?  Not quite.
>
> If you disable this bogus check, "git checkout" (no other argument)
> would be allowed to skip the merge_working_tree(), and that in turn
> reveals another case that the helper is not checking when
> unpack_trees() MUST be called.
>
> Note: namely, when sparse checkout is in effect, switching from
> HEAD to HEAD can nuke existing working tree files outside the
> sparse pattern -- YUCK!  See penultimate test in t1011 for
> an example.
>
> This yuckiness is not your fault, but needs_working_tree_merge()
> logic you added needs to refrain from skipping unpack_trees() call
> when sparse thing is in effect.  I'd expect "git checkout -b foo"
> instead of "git checkout" (no other argument) would fail to honor
> the sparse thing and reveal this bug, because the above bogus
> "!opts->new_branch" check will not protect you for that case.
>
> In other words, these random series of "if (...) return 1" are bugs
> hiding other real bugs and we need to reason about which ones are
> bugs that are hiding what other bugs that are not covered by this
> function.  As Peff said earlier for v1, this is still an unreadable
> mess.  We need to figure out a way to make sure we are skipping on
> the right condition and not accidentally hiding a bug of failing to
> check the right condition.  I offhand do not have a good suggestion
> on this; sorry.
>
>>  static int merge_working_tree(const struct checkout_opts *opts,
>> struct branch_info *old,
>> struct branch_info *new,
>> int *writeout_error)
>>  {
>> + /*
>> +  * Optimize the performance of "git checkout -b foo" by avoiding
>> +  * the expensive merge, index and working directory updates if they
>> +  * are not needed.
>> +  */
>> + if (!needs_working_tree_merge(opts, old, new))
>> + return 0;
>> +
>>   int ret;
>>   struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
>
> With the change you made at the beginning of this function, it no
> longer compiles with -Wdecl-after-stmt, but that is the smallest of
> the problems.
>
> It is a small step in the right direction to move the call to the
> helper from the caller to this function, but it is a bit too small.
>
> Notice that the lines after the above context look like this:
>
> hold_locked_index(lock_file, 1);
> if (read_cache_preload(NULL) < 0)
> return error(_("index file corrupt"));
>
> resolve_undo_clear();
> if (opts->force) {
> ret = reset_tree(new->commit->tree, opts, 1, writeout_error);
> if (ret)
> return ret;
> } else {
> struct tree_desc trees[2];
> ...
>
> I would have expected that the check goes inside the "else" thing
> that actually does a two-tree merge, and the helper loses the check
> with opts->force, at least.  That would still be a change smaller
> than desired, but at least a meaningful improvement compared to the
> previous one.  As I have already pointed out, in the "else" clause
> there is a check "is the index free of conflicted entries? if so
> error out", and that must be honored in !opt->force case, no matter
> what your needs_working_tree_merge() says.  I also was hoping that
> you would notice, when you were told about the unmerged check, by
> reading the remainder of the merge_working_tree(), that we need to
> call show_local_changes() when we are not doing force and when we
> are not quiet---returning early like the above patch will never be
> able to call that one downstream in the function.
>
> Regardless of what the actual checks end up to be, the right place
> to do this "optimization" would look more like:
>
>  builtin/checkout.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git

[PATCH] vcs-svn/fast_export: fix timestamp fmt specifiers

2016-09-13 Thread Mike Ralphson
Two instances of %ld being used for unsigned longs

Signed-off-by: Mike Ralphson 
---
 vcs-svn/fast_export.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index bd0f2c2..97cba39 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -73,7 +73,7 @@ void fast_export_begin_note(uint32_t revision, const char 
*author,
static int firstnote = 1;
size_t loglen = strlen(log);
printf("commit %s\n", note_ref);
-   printf("committer %s <%s@%s> %ld +\n", author, author, "local", 
timestamp);
+   printf("committer %s <%s@%s> %lu +\n", author, author, "local", 
timestamp);
printf("data %"PRIuMAX"\n", (uintmax_t)loglen);
fwrite(log, loglen, 1, stdout);
if (firstnote) {
@@ -107,7 +107,7 @@ void fast_export_begin_commit(uint32_t revision, const char 
*author,
}
printf("commit %s\n", local_ref);
printf("mark :%"PRIu32"\n", revision);
-   printf("committer %s <%s@%s> %ld +\n",
+   printf("committer %s <%s@%s> %lu +\n",
   *author ? author : "nobody",
   *author ? author : "nobody",
   *uuid ? uuid : "local", timestamp);

--
https://github.com/git/git/pull/293