Re: [BUG] Symbolic links break "git fast-export"?

2019-06-30 Thread Lars Schneider
> On Jun 24, 2019, at 5:33 AM, Elijah Newren wrote: > > On Mon, Jun 24, 2019 at 5:05 AM Lars Schneider > wrote: >> >> Hi folks, >> >> Is my understanding correct, that `git fast-export | git fast-import` >> should not modify the repository? If yes

Re: [BUG] Symbolic links break "git fast-export"?

2019-06-30 Thread Lars Schneider
> On Jun 24, 2019, at 11:58 AM, Jeff King wrote: > > On Mon, Jun 24, 2019 at 06:33:38AM -0600, Elijah Newren wrote: > >> We should probably also make a corresponding improvement to >> fast-import; it also makes some attempts to be smart about handling >> order of modifies and deletes, but mis

[BUG] Symbolic links break "git fast-export"?

2019-06-24 Thread Lars Schneider
Hi folks, Is my understanding correct, that `git fast-export | git fast-import` should not modify the repository? If yes, then we might have a bug in `git fast-export` if symbolic directory links are removed and converted to a real directory. Consider this test case: # Create test repo

Re: Import/Export as a fast way to purge files from Git?

2018-10-31 Thread Lars Schneider
> On Sep 24, 2018, at 7:24 PM, Elijah Newren wrote: > > On Sun, Sep 23, 2018 at 6:08 AM Lars Schneider > wrote: >> >> Hi, >> >> I recently had to purge files from large Git repos (many files, many >> commits). >> The usual recommendation

Re: Import/Export as a fast way to purge files from Git?

2018-09-23 Thread Lars Schneider
> On Sep 23, 2018, at 4:55 PM, Eric Sunshine wrote: > > On Sun, Sep 23, 2018 at 9:05 AM Lars Schneider > wrote: >> I recently had to purge files from large Git repos (many files, many >> commits). >> The usual recommendation is to use `git filter-branc

Import/Export as a fast way to purge files from Git?

2018-09-23 Thread Lars Schneider
Hi, I recently had to purge files from large Git repos (many files, many commits). The usual recommendation is to use `git filter-branch --index-filter` to purge files. However, this is *very* slow for large repos (e.g. it takes 45min to remove the `builtin` directory from git core). I realized

Re: Find commit that referenced a blob first

2018-07-19 Thread Lars Schneider
> On Jul 19, 2018, at 11:19 PM, Stefan Beller wrote: > > On Thu, Jul 19, 2018 at 2:02 PM Lars Schneider > wrote: >> >> Hi, >> >> I have a blob hash and I would like to know what commit referenced >> this blob first in a given Git repo. > >

Find commit that referenced a blob first

2018-07-19 Thread Lars Schneider
Hi, I have a blob hash and I would like to know what commit referenced this blob first in a given Git repo. I could iterate through all commits sorted by date (or generation number) and then recursively search in the referenced trees until I find my blob. I wonder, is this the most efficient w

Re: [PATCH v1 2/2] convert: add alias support for 'working-tree-encoding' attributes

2018-07-08 Thread Lars Schneider
> On Jul 8, 2018, at 8:30 PM, larsxschnei...@gmail.com wrote: > > From: Lars Schneider > > In 107642fe26 ("convert: add 'working-tree-encoding' attribute", > 2018-04-15) we added an attribute which defines the working tree > encoding of a file. >

Re: [PATCH v1 1/2] convert: refactor conversion driver config parsing

2018-07-08 Thread Lars Schneider
> On Jul 8, 2018, at 8:30 PM, larsxschnei...@gmail.com wrote: > > From: Lars Schneider > > Refactor conversion driver config parsing to ease the parsing of new > configs in a subsequent patch. > > No functional change intended. > > Signed-off-by: Lars Sch

Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-07-02 Thread Lars Schneider
> -----Lars Schneider wrote: - > To: Jeff King > From: Lars Schneider > Date: 06/28/2018 18:21 > Cc: "brian m. carlson" , Steve Groeger > , git@vger.kernel.org > Subject: Re: Use of new .gitattributes working-tree-encoding attribute across > different

Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

2018-06-28 Thread Lars Schneider
> On Jun 28, 2018, at 4:34 PM, Jeff King wrote: > > On Thu, Jun 28, 2018 at 02:44:47AM +, brian m. carlson wrote: > >> On Wed, Jun 27, 2018 at 07:54:52AM +, Steve Groeger wrote: >>> We have common code that is supposed to be usable across different >>> platforms and hence different f

Re: [RFC PATCH v1] http: add http.keepRejectedCredentials config

2018-06-07 Thread Lars Schneider
> On 04 Jun 2018, at 11:55, Jeff King wrote: > > On Mon, Jun 04, 2018 at 12:18:59PM -0400, Martin-Louis Bright wrote: > >> Why must the credentials must be deleted after receiving the 401 (or >> any) error? What's the rationale for this? > > Because Git only tries a single credential per invo

Re: [ANNOUNCE] Git v2.18.0-rc1

2018-06-04 Thread Lars Schneider
> On 04 Jun 2018, at 06:53, Junio C Hamano wrote: > > A release candidate Git v2.18.0-rc1 is now available for testing > at the usual places. It is comprised of 842 non-merge commits > since v2.17.0, contributed by 65 people, 20 of which are new faces. > > ... > > * The new "checkout-encodin

[RFC PATCH v1] http: add http.keepRejectedCredentials config

2018-06-04 Thread lars . schneider
From: Lars Schneider If a Git HTTP server responds with 401 or 407, then Git tells the credential helper to reject and delete the credentials. In general this is good. However, in certain automation environments it is not desired to remove credentials automatically. This is in particular the

Re: worktrees vs. alternates

2018-05-16 Thread Lars Schneider
> On 16 May 2018, at 11:29, Ævar Arnfjörð Bjarmason wrote: > > > On Wed, May 16 2018, Lars Schneider wrote: > >> I am looking into different options to cache Git repositories on build >> machines. The two most promising ways seem to be git-worktree [1] and >

worktrees vs. alternates

2018-05-16 Thread Lars Schneider
Hi, I am looking into different options to cache Git repositories on build machines. The two most promising ways seem to be git-worktree [1] and git-alternates [2]. I wonder if you see an advantage of one over the other? My impression is that git-worktree supersedes git-alternates. Would that b

Re: Optimizing writes to unchanged files during merges?

2018-04-17 Thread Lars Schneider
> On 16 Apr 2018, at 19:45, Jacob Keller wrote: > > On Mon, Apr 16, 2018 at 10:43 AM, Jacob Keller wrote: >> On Mon, Apr 16, 2018 at 9:07 AM, Lars Schneider >> wrote: >>> What if Git kept a LRU list that contains file path, content hash, and >>> mtime

Re: Optimizing writes to unchanged files during merges?

2018-04-17 Thread Lars Schneider
> On 16 Apr 2018, at 19:04, Ævar Arnfjörð Bjarmason wrote: > > > On Mon, Apr 16 2018, Lars Schneider wrote: > >>> On 16 Apr 2018, at 04:03, Linus Torvalds >>> wrote: >>> >>> On Sun, Apr 15, 2018 at 6:44 PM, Junio C Hamano wrote: >&

Re: Optimizing writes to unchanged files during merges?

2018-04-16 Thread Lars Schneider
> On 16 Apr 2018, at 04:03, Linus Torvalds > wrote: > > On Sun, Apr 15, 2018 at 6:44 PM, Junio C Hamano wrote: >> >> I think Elijah's corrected was_tracked() also does not care "has >> this been renamed". > > I'm perfectly happy with the slightly smarter patches. My patch was > really just a

[PATCH v13 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-04-15 Thread lars . schneider
From: Lars Schneider UTF supports lossless conversion round tripping and conversions between UTF and other encodings are mostly round trip safe as Unicode aims to be a superset of all other character encodings. However, certain encodings (e.g. SHIFT-JIS) are known to have round trip issues [1

[PATCH v13 09/10] convert: add tracing for 'working-tree-encoding' attribute

2018-04-15 Thread lars . schneider
From: Lars Schneider Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable tracing for content that is reencoded with the 'working-tree-encoding' attribute. This is useful to debug encoding issues. Signed-off-by: Lars Schneider --- convert.c

[PATCH v13 07/10] convert: add 'working-tree-encoding' attribute

2018-04-15 Thread lars . schneider
From: Lars Schneider Git recognizes files encoded with ASCII or one of its supersets (e.g. UTF-8 or ISO-8859-1) as text files. All other encodings are usually interpreted as binary and consequently built-in Git text processing tools (e.g. 'git diff') as well as most Git web front e

[PATCH v13 04/10] utf8: teach same_encoding() alternative UTF encoding names

2018-04-15 Thread lars . schneider
From: Lars Schneider The function same_encoding() could only recognize alternative names for UTF-8 encodings. Teach it to recognize all kinds of alternative UTF encoding names (e.g. utf16). While we are at it, fix a crash that would occur if same_encoding() was called with a NULL argument and a

[PATCH v13 01/10] strbuf: remove unnecessary NUL assignment in xstrdup_tolower()

2018-04-15 Thread lars . schneider
From: Lars Schneider Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we allocate the buffer for the lower case string with xmallocz(). This already ensures a NUL at the end of the allocated buffer. Remove the unnecessary assignment. Signed-off-by: Lars Schneider

[PATCH v13 05/10] utf8: add function to detect prohibited UTF-16/32 BOM

2018-04-15 Thread lars . schneider
From: Lars Schneider Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used [1]. The function returns true if this is the case. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bom.html#bom10 Signed-off-by: Lars

[PATCH v13 06/10] utf8: add function to detect a missing UTF-16/32 BOM

2018-04-15 Thread lars . schneider
From: Lars Schneider If the endianness is not defined in the encoding name, then let's be strict and require a BOM to avoid any encoding confusion. The is_missing_required_utf_bom() function returns true if a required BOM is missing. The Unicode standard instructs to assume big-endian if

[PATCH v13 03/10] strbuf: add a case insensitive starts_with()

2018-04-15 Thread lars . schneider
From: Lars Schneider Check in a case insensitive manner if one string is a prefix of another string. This function is used in a subsequent commit. Signed-off-by: Lars Schneider --- git-compat-util.h | 1 + strbuf.c | 9 + 2 files changed, 10 insertions(+) diff --git a/git

[PATCH v13 02/10] strbuf: add xstrdup_toupper()

2018-04-15 Thread lars . schneider
From: Lars Schneider Create a copy of an existing string and make all characters upper case. Similar xstrdup_tolower(). This function is used in a subsequent commit. Signed-off-by: Lars Schneider --- strbuf.c | 12 strbuf.h | 1 + 2 files changed, 13 insertions(+) diff --git a

[PATCH v13 00/10] convert: add support for different encodings

2018-04-15 Thread lars . schneider
From: Lars Schneider Hi, Patches 1-6,9 are preparation and helper functions. Patch 7,8,10 are the actual change. This series is based on v2.16.0 and Torsten's 8462ff43e4 (convert_to_git(): safe_crlf/checksafe becomes int conv_flags, 2018-01-13). The series can be rebased without conflic

[PATCH v13 08/10] convert: check for detectable errors in UTF encodings

2018-04-15 Thread lars . schneider
From: Lars Schneider Check that new content is valid with respect to the user defined 'working-tree-encoding' attribute. Signed-off-by: Lars Schneider --- convert.c| 61 +++ t/t0028-working-tree-encodi

Re: [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute

2018-04-15 Thread Lars Schneider
> On 05 Apr 2018, at 18:41, Torsten Bögershausen wrote: > > On 01.04.18 15:24, Lars Schneider wrote: >>> TRUE or false are values, but just wrong ones. >>> If this test is removed, the user will see "failed to encode "TRUE" to >>> "UT

Re: [PATCH v2 1/5] core.aheadbehind: add new config setting

2018-04-03 Thread Lars Schneider
> On 04 Jan 2018, at 20:26, Jeff King wrote: > > On Wed, Dec 27, 2017 at 09:41:30AM -0800, Junio C Hamano wrote: > >> Jeff King writes: >> >>> I, too, had a funny feeling about calling this "core". But I didn't have >>> a better name, as I'm not sure what other place we have for config >>> op

Re: [PATCH v12 00/10] convert: add support for different encodings

2018-04-03 Thread Lars Schneider
> On 02 Apr 2018, at 20:31, Lars Schneider wrote: > > >> On 29 Mar 2018, at 20:37, Junio C Hamano wrote: >> >> lars.schnei...@autodesk.com writes: >> >>> From: Lars Schneider >>> >>> Patches 1-6,9 are preparation and helper funct

Re: [PATCH v12 00/10] convert: add support for different encodings

2018-04-02 Thread Lars Schneider
> On 29 Mar 2018, at 20:37, Junio C Hamano wrote: > > lars.schnei...@autodesk.com writes: > >> From: Lars Schneider >> >> Patches 1-6,9 are preparation and helper functions. Patch 4 is new. >> Patch 7,8,10 are the actual change. >> >

Re: [GSoC] [PATCH] travis-ci: added clang static analysis

2018-04-01 Thread Lars Schneider
> On 13 Mar 2018, at 18:45, Siddhartha Mishra wrote: > > On Mon, Mar 12, 2018 at 3:49 PM, Lars Schneider > wrote: >> Hi, >> >> That looks interesting but I agree with Dscho that we should not limit >> this to master/maint. >> >> I assume you d

Re: [PATCH v12 04/10] utf8: teach same_encoding() alternative UTF encoding names

2018-04-01 Thread Lars Schneider
> On 16 Mar 2018, at 19:19, Eric Sunshine wrote: > > On Fri, Mar 16, 2018 at 1:50 PM, Junio C Hamano wrote: >> Eric Sunshine writes: >>> However, I'm having a tough time imagining cases in which callers >>> would want same_encoding() to return true if both arguments are NULL, >>> but outright

Re: [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute

2018-04-01 Thread Lars Schneider
> On 18 Mar 2018, at 08:24, Torsten Bögershausen wrote: > > Some comments inline > > On Fri, Mar 09, 2018 at 06:35:32PM +0100, lars.schnei...@autodesk.com wrote: >> From: Lars Schneider >> >> Git recognizes files encoded with ASCII or one of its supersets

Re: What's cooking in git.git (Mar 2018, #05; Wed, 28)

2018-04-01 Thread Lars Schneider
> On 30 Mar 2018, at 12:32, Lars Schneider wrote: > > >> On 30 Mar 2018, at 11:24, Ævar Arnfjörð Bjarmason wrote: >> >> >> On Wed, Mar 28 2018, Junio C. Hamano wrote: >> >>> * ls/checkout-encoding (2018-03-16) 10 comm

Re: What's cooking in git.git (Mar 2018, #05; Wed, 28)

2018-03-30 Thread Lars Schneider
> On 30 Mar 2018, at 11:24, Ævar Arnfjörð Bjarmason wrote: > > > On Wed, Mar 28 2018, Junio C. Hamano wrote: > >> * ls/checkout-encoding (2018-03-16) 10 commits >> - convert: add round trip check based on 'core.checkRoundtripEncoding' >> - convert: add tracing for 'working-tree-encoding' attri

Re: [PATCH v2] travis-ci: enable more warnings on travis linux-gcc job

2018-03-17 Thread Lars Schneider
> On 17 Mar 2018, at 09:01, Duy Nguyen wrote: > > On Fri, Mar 16, 2018 at 10:22 PM, Jeff King wrote: >>> diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh >>> index 3735ce413f..f6f346c468 100755 >>> --- a/ci/run-build-and-tests.sh >>> +++ b/ci/run-build-and-tests.sh >>> @@ -7,6

Re: [PATCH v6 00/14] Serialized Git Commit Graph

2018-03-16 Thread Lars Schneider
> On 14 Mar 2018, at 21:43, Junio C Hamano wrote: > > Derrick Stolee writes: > >> This v6 includes feedback around csum-file.c and the rename of hashclose() >> to finalize_hashfile(). These are the first two commits of the series, so >> they could be pulled out independently. >> >> The only o

Re: [PATCH v12 04/10] utf8: teach same_encoding() alternative UTF encoding names

2018-03-15 Thread Lars Schneider
> On 16 Mar 2018, at 00:25, Eric Sunshine wrote: > > On Thu, Mar 15, 2018 at 6:57 PM, wrote: >> The function same_encoding() checked only for alternative UTF-8 encoding >> names. Teach it to check for all kinds of alternative UTF encoding >> names. >>

Re: What's cooking in git.git (Mar 2018, #03; Wed, 14)

2018-03-15 Thread Lars Schneider
> On 15 Mar 2018, at 20:18, Lars Schneider wrote: > > >> On 15 Mar 2018, at 02:34, Junio C Hamano wrote: >> >> ... >> >> * ls/checkout-encoding (2018-03-09) 10 commits >> - convert: add round trip check based on 'core.checkRoundtripEnc

[PATCH v12 08/10] convert: check for detectable errors in UTF encodings

2018-03-15 Thread lars . schneider
From: Lars Schneider Check that new content is valid with respect to the user defined 'working-tree-encoding' attribute. Signed-off-by: Lars Schneider --- convert.c| 61 +++ t/t0028-working-tree-encodi

[PATCH v12 09/10] convert: add tracing for 'working-tree-encoding' attribute

2018-03-15 Thread lars . schneider
From: Lars Schneider Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable tracing for content that is reencoded with the 'working-tree-encoding' attribute. This is useful to debug encoding issues. Signed-off-by: Lars Schneider --- convert.c

[PATCH v12 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-15 Thread lars . schneider
From: Lars Schneider UTF supports lossless conversion round tripping and conversions between UTF and other encodings are mostly round trip safe as Unicode aims to be a superset of all other character encodings. However, certain encodings (e.g. SHIFT-JIS) are known to have round trip issues [1

[PATCH v12 07/10] convert: add 'working-tree-encoding' attribute

2018-03-15 Thread lars . schneider
From: Lars Schneider Git recognizes files encoded with ASCII or one of its supersets (e.g. UTF-8 or ISO-8859-1) as text files. All other encodings are usually interpreted as binary and consequently built-in Git text processing tools (e.g. 'git diff') as well as most Git web front e

[PATCH v12 06/10] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-15 Thread lars . schneider
From: Lars Schneider If the endianness is not defined in the encoding name, then let's be strict and require a BOM to avoid any encoding confusion. The is_missing_required_utf_bom() function returns true if a required BOM is missing. The Unicode standard instructs to assume big-endian if

[PATCH v12 02/10] strbuf: add xstrdup_toupper()

2018-03-15 Thread lars . schneider
From: Lars Schneider Create a copy of an existing string and make all characters upper case. Similar xstrdup_tolower(). This function is used in a subsequent commit. Signed-off-by: Lars Schneider --- strbuf.c | 12 strbuf.h | 1 + 2 files changed, 13 insertions(+) diff --git a

[PATCH v12 00/10] convert: add support for different encodings

2018-03-15 Thread lars . schneider
From: Lars Schneider Hi, Patches 1-6,9 are preparation and helper functions. Patch 4 is new. Patch 7,8,10 are the actual change. This series depends on Torsten's 8462ff43e4 (convert_to_git(): safe_crlf/checksafe becomes int conv_flags, 2018-01-13) which is already in master. Changes sinc

[PATCH v12 04/10] utf8: teach same_encoding() alternative UTF encoding names

2018-03-15 Thread lars . schneider
From: Lars Schneider The function same_encoding() checked only for alternative UTF-8 encoding names. Teach it to check for all kinds of alternative UTF encoding names. This function is used in a subsequent commit. Signed-off-by: Lars Schneider --- utf8.c | 20 +++- 1 file

[PATCH v12 01/10] strbuf: remove unnecessary NUL assignment in xstrdup_tolower()

2018-03-15 Thread lars . schneider
From: Lars Schneider Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we allocate the buffer for the lower case string with xmallocz(). This already ensures a NUL at the end of the allocated buffer. Remove the unnecessary assignment. Signed-off-by: Lars Schneider

[PATCH v12 03/10] strbuf: add a case insensitive starts_with()

2018-03-15 Thread lars . schneider
From: Lars Schneider Check in a case insensitive manner if one string is a prefix of another string. This function is used in a subsequent commit. Signed-off-by: Lars Schneider --- git-compat-util.h | 1 + strbuf.c | 9 + 2 files changed, 10 insertions(+) diff --git a/git

[PATCH v12 05/10] utf8: add function to detect prohibited UTF-16/32 BOM

2018-03-15 Thread lars . schneider
From: Lars Schneider Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used [1]. The function returns true if this is the case. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bom.html#bom10 Signed-off-by: Lars

Re: [PATCH v11 08/10] convert: advise canonical UTF encoding names

2018-03-15 Thread Lars Schneider
> On 09 Mar 2018, at 20:11, Junio C Hamano wrote: > > lars.schnei...@autodesk.com writes: > >> From: Lars Schneider >> >> The canonical name of an UTF encoding has the format UTF, dash, number, >> and an optionally byte order in upper case (e.g. UTF-8

Re: [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute

2018-03-15 Thread Lars Schneider
> On 09 Mar 2018, at 20:10, Junio C Hamano wrote: > > lars.schnei...@autodesk.com writes: > >> +static const char *default_encoding = "UTF-8"; >> + >> ... >> +static const char *git_path_check_encoding(struct attr_check_item *check) >> +{ >> +const char *value = check->value; >> + >> +i

Re: What's cooking in git.git (Mar 2018, #03; Wed, 14)

2018-03-15 Thread Lars Schneider
> On 15 Mar 2018, at 02:34, Junio C Hamano wrote: > > ... > > * ls/checkout-encoding (2018-03-09) 10 commits > - convert: add round trip check based on 'core.checkRoundtripEncoding' > - convert: add tracing for 'working-tree-encoding' attribute > - convert: advise canonical UTF encoding names >

Re: How to debug a "git merge"?

2018-03-15 Thread Lars Schneider
> On 14 Mar 2018, at 23:20, Jeff King wrote: > > On Wed, Mar 14, 2018 at 05:56:04PM +0100, Lars Schneider wrote: > >> I am investigating a Git merge (a86dd40fe) in which an older version of >> a file won over the newer version. I try to understand why this is the >

Re: How to debug a "git merge"?

2018-03-14 Thread Lars Schneider
> On 14 Mar 2018, at 18:02, Derrick Stolee wrote: > > On 3/14/2018 12:56 PM, Lars Schneider wrote: >> Hi, >> >> I am investigating a Git merge (a86dd40fe) in which an older version of >> a file won over the newer version. I try to understand why this is the &

How to debug a "git merge"?

2018-03-14 Thread Lars Schneider
Hi, I am investigating a Git merge (a86dd40fe) in which an older version of a file won over the newer version. I try to understand why this is the case. I can reproduce the merge with the following commands: $ git checkout -b test a02fa3303 $ GIT_MERGE_VERBOSITY=5 git merge --verbose c1b82995c

Re: [git-sizer] Implications of a large commit object

2018-03-14 Thread Lars Schneider
> On 14 Mar 2018, at 09:33, Michael Haggerty wrote: > > On Wed, Mar 14, 2018 at 9:14 AM, Lars Schneider > wrote: >> I am using Michael's fantastic Git repo analyzer tool "git-sizer" [*] >> and it detected a very large commit of 7.33 MiB in my repo (see ch

[git-sizer] Implications of a large commit object

2018-03-14 Thread Lars Schneider
Hi, I am using Michael's fantastic Git repo analyzer tool "git-sizer" [*] and it detected a very large commit of 7.33 MiB in my repo (see chart below). This large commit is expected. I've imported that repo from another version control system but excluded all binary files (e.g. images) and some

Re: [GSoC] [PATCH] travis-ci: added clang static analysis

2018-03-12 Thread Lars Schneider
Hi, That looks interesting but I agree with Dscho that we should not limit this to master/maint. I assume you did run this on TravisCI already? Can you share a link? I assume you did find errors? Can we fix them or are there too many? If there are existing errors, how do we define a "successful"

Re: [GSoC][PATCH] git-ci: use pylint to analyze the git-p4 code

2018-03-12 Thread Lars Schneider
Hi Viet, > On 12 Mar 2018, at 03:20, Viet Hung Tran wrote: > > This is my submission as a microproject for the Google Summer of code. > I apologize for not setting the [GSoC] in my previous email > at <20180312020855.7950-1-viethtran1...@gmail.com>. > Please ignore it. No worries :-) > Add a n

Re: [PATCH v11 07/10] convert: check for detectable errors in UTF encodings

2018-03-09 Thread Lars Schneider
> On 09 Mar 2018, at 20:00, Junio C Hamano wrote: > > lars.schnei...@autodesk.com writes: > >> +const char *advise_msg = _( >> +"The file '%s' contains a byte order " >> +"mark (BOM). Please use %.6s as " >> +

[PATCH v11 04/10] utf8: add function to detect prohibited UTF-16/32 BOM

2018-03-09 Thread lars . schneider
From: Lars Schneider Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used [1]. The function returns true if this is the case. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bom.html#bom10 Signed-off-by: Lars

[PATCH v11 07/10] convert: check for detectable errors in UTF encodings

2018-03-09 Thread lars . schneider
From: Lars Schneider Check that new content is valid with respect to the user defined 'working-tree-encoding' attribute. Signed-off-by: Lars Schneider --- convert.c| 48 ++ t/t0028-working-tree-encodi

[PATCH v11 06/10] convert: add 'working-tree-encoding' attribute

2018-03-09 Thread lars . schneider
From: Lars Schneider Git recognizes files encoded with ASCII or one of its supersets (e.g. UTF-8 or ISO-8859-1) as text files. All other encodings are usually interpreted as binary and consequently built-in Git text processing tools (e.g. 'git diff') as well as most Git web front e

[PATCH v11 03/10] strbuf: add a case insensitive starts_with()

2018-03-09 Thread lars . schneider
From: Lars Schneider Check in a case insensitive manner if one string is a prefix of another string. This function is used in a subsequent commit. Signed-off-by: Lars Schneider --- git-compat-util.h | 1 + strbuf.c | 9 + 2 files changed, 10 insertions(+) diff --git a/git

[PATCH v11 09/10] convert: add tracing for 'working-tree-encoding' attribute

2018-03-09 Thread lars . schneider
From: Lars Schneider Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable tracing for content that is reencoded with the 'working-tree-encoding' attribute. This is useful to debug encoding issues. Signed-off-by: Lars Schneider --- convert.c

[PATCH v11 08/10] convert: advise canonical UTF encoding names

2018-03-09 Thread lars . schneider
From: Lars Schneider The canonical name of an UTF encoding has the format UTF, dash, number, and an optionally byte order in upper case (e.g. UTF-8 or UTF-16BE). Some iconv versions support alternative names without a dash or with lower case characters. To avoid problems between different iconv

[PATCH v11 05/10] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-09 Thread lars . schneider
From: Lars Schneider If the endianness is not defined in the encoding name, then let's be strict and require a BOM to avoid any encoding confusion. The is_missing_required_utf_bom() function returns true if a required BOM is missing. The Unicode standard instructs to assume big-endian if

[PATCH v11 01/10] strbuf: remove unnecessary NUL assignment in xstrdup_tolower()

2018-03-09 Thread lars . schneider
From: Lars Schneider Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we allocate the buffer for the lower case string with xmallocz(). This already ensures a NUL at the end of the allocated buffer. Remove the unnecessary assignment. Signed-off-by: Lars Schneider

[PATCH v11 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-09 Thread lars . schneider
From: Lars Schneider UTF supports lossless conversion round tripping and conversions between UTF and other encodings are mostly round trip safe as Unicode aims to be a superset of all other character encodings. However, certain encodings (e.g. SHIFT-JIS) are known to have round trip issues [1

[PATCH v11 02/10] strbuf: add xstrdup_toupper()

2018-03-09 Thread lars . schneider
From: Lars Schneider Create a copy of an existing string and make all characters upper case. Similar xstrdup_tolower(). This function is used in a subsequent commit. Signed-off-by: Lars Schneider --- strbuf.c | 12 strbuf.h | 1 + 2 files changed, 13 insertions(+) diff --git a

[PATCH v11 00/10] convert: add support for different encodings

2018-03-09 Thread lars . schneider
From: Lars Schneider Hi, Patches 1-5,9 are preparation and helper functions. Patch 6-8,10 are the actual change. Patch 8 is new. This series depends on Torsten's 8462ff43e4 (convert_to_git(): safe_crlf/checksafe becomes int conv_flags, 2018-01-13) which is already in master. Changes sinc

Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-09 Thread Lars Schneider
> On 07 Mar 2018, at 19:04, Eric Sunshine wrote: > > On Wed, Mar 7, 2018 at 12:30 PM, wrote: >> Check that new content is valid with respect to the user defined >> 'working-tree-encoding' attribute. >> >> Signed-off-by: Lars Schneider &

Re: [PATCH v10 3/9] strbuf: add a case insensitive starts_with()

2018-03-09 Thread Lars Schneider
> On 09 Mar 2018, at 00:12, Junio C Hamano wrote: > > Duy Nguyen writes: > >>> extern int starts_with(const char *str, const char *prefix); >>> +extern int startscase_with(const char *str, const char *prefix); >> >> This name is a bit hard to read. Boost [1] goes with istarts_with. I >> wonde

Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread Lars Schneider
> On 07 Mar 2018, at 23:57, Junio C Hamano wrote: > > Lars Schneider writes: > >> At this point I thought it would make sense to make the advised >> encoding name uppercase in both situations. OK with you? > > In the endgame, if upcased and properly dashed for

Re: [PATCH v10 9/9] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-07 Thread Lars Schneider
> On 07 Mar 2018, at 23:52, Junio C Hamano wrote: > > Lars Schneider writes: > >> I don't think HT makes too much sense. However, isspace() is nice >> and I will use it. Being more permissive on the inputs should hurt. > > You are being incoherent in the

Re: [PATCH v10 6/9] convert: add 'working-tree-encoding' attribute

2018-03-07 Thread Lars Schneider
> content to a canonical UTF-8 representation. On checkout Git will >> reverse the conversion. >> >> Signed-off-by: Lars Schneider >> --- >> Documentation/gitattributes.txt | 80 +++ >> diff --git a/convert.c b/convert.c >> @@ -265,6 +266,78 @@

Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread Lars Schneider
> On 07 Mar 2018, at 23:32, Junio C Hamano wrote: > > Lars Schneider writes: > >> I also would have liked to advise "UTF-16" instead of "UTF16" as >> you suggested. However, that required a few more lines and I wanted >> to keep the ch

Re: [PATCH v10 9/9] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-07 Thread Lars Schneider
> On 07 Mar 2018, at 20:59, Junio C Hamano wrote: > > lars.schnei...@autodesk.com writes: > >> +static int check_roundtrip(const char* enc_name) > > The asterisk sticks to the variable, not type. Argh. I need to put this check into Travis CI ;-) >> +{ >> +/* >> + * check_roundtrip_e

Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread Lars Schneider
> On 07 Mar 2018, at 20:49, Junio C Hamano wrote: > > lars.schnei...@autodesk.com writes: > >> +static int validate_encoding(const char *path, const char *enc, >> + const char *data, size_t len, int die_on_error) >> +{ >> +/* We only check for UTF here as UTF?? can be an al

[PATCH v10 8/9] convert: add tracing for 'working-tree-encoding' attribute

2018-03-07 Thread lars . schneider
From: Lars Schneider Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable tracing for content that is reencoded with the 'working-tree-encoding' attribute. This is useful to debug encoding issues. Signed-off-by: Lars Schneider --- convert.c

[PATCH v10 4/9] utf8: add function to detect prohibited UTF-16/32 BOM

2018-03-07 Thread lars . schneider
From: Lars Schneider Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used [1]. The function returns true if this is the case. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bom.html#bom10 Signed-off-by: Lars

[PATCH v10 6/9] convert: add 'working-tree-encoding' attribute

2018-03-07 Thread lars . schneider
From: Lars Schneider Git recognizes files encoded with ASCII or one of its supersets (e.g. UTF-8 or ISO-8859-1) as text files. All other encodings are usually interpreted as binary and consequently built-in Git text processing tools (e.g. 'git diff') as well as most Git web front e

[PATCH v10 5/9] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-07 Thread lars . schneider
From: Lars Schneider If the endianness is not defined in the encoding name, then let's be strict and require a BOM to avoid any encoding confusion. The is_missing_required_utf_bom() function returns true if a required BOM is missing. The Unicode standard instructs to assume big-endian if

[PATCH v10 0/9] convert: add support for different encodings

2018-03-07 Thread lars . schneider
From: Lars Schneider Hi, Patches 1-5,8 are preparation and helper functions. Patch 3 is new. Patch 6,7,9 are the actual change. This series depends on Torsten's 8462ff43e4 (convert_to_git(): safe_crlf/checksafe becomes int conv_flags, 2018-01-13) which is already in master. Changes sin

[PATCH v10 1/9] strbuf: remove unnecessary NUL assignment in xstrdup_tolower()

2018-03-07 Thread lars . schneider
From: Lars Schneider Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we allocate the buffer for the lower case string with xmallocz(). This already ensures a NUL at the end of the allocated buffer. Remove the unnecessary assignment. Signed-off-by: Lars Schneider

[PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread lars . schneider
From: Lars Schneider Check that new content is valid with respect to the user defined 'working-tree-encoding' attribute. Signed-off-by: Lars Schneider --- convert.c| 55 +++ t/t0028-working-tree-encodi

[PATCH v10 9/9] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-07 Thread lars . schneider
From: Lars Schneider UTF supports lossless conversion round tripping and conversions between UTF and other encodings are mostly round trip safe as Unicode aims to be a superset of all other character encodings. However, certain encodings (e.g. SHIFT-JIS) are known to have round trip issues [1

[PATCH v10 3/9] strbuf: add a case insensitive starts_with()

2018-03-07 Thread lars . schneider
From: Lars Schneider Check in a case insensitive manner if one string is a prefix of another string. This function is used in a subsequent commit. Signed-off-by: Lars Schneider --- git-compat-util.h | 1 + strbuf.c | 9 + 2 files changed, 10 insertions(+) diff --git a/git

[PATCH v10 2/9] strbuf: add xstrdup_toupper()

2018-03-07 Thread lars . schneider
From: Lars Schneider Create a copy of an existing string and make all characters upper case. Similar xstrdup_tolower(). This function is used in a subsequent commit. Signed-off-by: Lars Schneider --- strbuf.c | 12 strbuf.h | 1 + 2 files changed, 13 insertions(+) diff --git a

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-06 Thread Lars Schneider
> On 07 Mar 2018, at 00:07, Junio C Hamano wrote: > > Junio C Hamano writes: > >> Lars Schneider writes: >> >>>> Also "UTF16" or other spelling >>>> the platform may support but this code fails to recognise will go >>>&g

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-06 Thread Lars Schneider
> On 06 Mar 2018, at 23:53, Junio C Hamano wrote: > > Lars Schneider writes: > >>> Also "UTF16" or other spelling >>> the platform may support but this code fails to recognise will go >>> unchecked. >> >> That is true. Howe

Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-06 Thread Lars Schneider
> On 06 Mar 2018, at 21:50, Junio C Hamano wrote: > > lars.schnei...@autodesk.com writes: > >> +int is_missing_required_utf_bom(const char *enc, const char *data, size_t >> len) >> +{ >> +return ( >> + !strcmp(enc, "UTF-16") && >> + !(has_bom_prefix(data, len, utf16_be_bom, siz

Re: [PATCH v9 5/8] convert: add 'working-tree-encoding' attribute

2018-03-06 Thread Lars Schneider
d consequently built-in Git text processing >> tools (e.g. 'git diff') as well as most Git web front ends do not >> visualize the content. >> [...] >> Signed-off-by: Lars Schneider >> --- >> diff --git a/convert.c b/convert.c >> @@ -978,6 +1051,25 @@

Re: [PATCH v9 6/8] convert: check for detectable errors in UTF encodings

2018-03-06 Thread Lars Schneider
> On 06 Mar 2018, at 02:23, Junio C Hamano wrote: > > Lars Schneider writes: > >>> On 05 Mar 2018, at 22:50, Junio C Hamano wrote: >>> >>> lars.schnei...@autodesk.com writes: >>> >>>> +static int validate_encoding(const char *p

  1   2   3   4   5   6   7   8   9   10   >