> On Jun 24, 2019, at 5:33 AM, Elijah Newren wrote:
>
> On Mon, Jun 24, 2019 at 5:05 AM Lars Schneider
> wrote:
>>
>> Hi folks,
>>
>> Is my understanding correct, that `git fast-export | git fast-import`
>> should not modify the repository? If yes
> On Jun 24, 2019, at 11:58 AM, Jeff King wrote:
>
> On Mon, Jun 24, 2019 at 06:33:38AM -0600, Elijah Newren wrote:
>
>> We should probably also make a corresponding improvement to
>> fast-import; it also makes some attempts to be smart about handling
>> order of modifies and deletes, but mis
Hi folks,
Is my understanding correct, that `git fast-export | git fast-import`
should not modify the repository? If yes, then we might have a bug in
`git fast-export` if symbolic directory links are removed and converted
to a real directory.
Consider this test case:
# Create test repo
> On Sep 24, 2018, at 7:24 PM, Elijah Newren wrote:
>
> On Sun, Sep 23, 2018 at 6:08 AM Lars Schneider
> wrote:
>>
>> Hi,
>>
>> I recently had to purge files from large Git repos (many files, many
>> commits).
>> The usual recommendation
> On Sep 23, 2018, at 4:55 PM, Eric Sunshine wrote:
>
> On Sun, Sep 23, 2018 at 9:05 AM Lars Schneider
> wrote:
>> I recently had to purge files from large Git repos (many files, many
>> commits).
>> The usual recommendation is to use `git filter-branc
Hi,
I recently had to purge files from large Git repos (many files, many commits).
The usual recommendation is to use `git filter-branch --index-filter` to purge
files. However, this is *very* slow for large repos (e.g. it takes 45min to
remove the `builtin` directory from git core). I realized
> On Jul 19, 2018, at 11:19 PM, Stefan Beller wrote:
>
> On Thu, Jul 19, 2018 at 2:02 PM Lars Schneider
> wrote:
>>
>> Hi,
>>
>> I have a blob hash and I would like to know what commit referenced
>> this blob first in a given Git repo.
>
>
Hi,
I have a blob hash and I would like to know what commit referenced
this blob first in a given Git repo.
I could iterate through all commits sorted by date (or generation
number) and then recursively search in the referenced trees until
I find my blob. I wonder, is this the most efficient w
> On Jul 8, 2018, at 8:30 PM, larsxschnei...@gmail.com wrote:
>
> From: Lars Schneider
>
> In 107642fe26 ("convert: add 'working-tree-encoding' attribute",
> 2018-04-15) we added an attribute which defines the working tree
> encoding of a file.
>
> On Jul 8, 2018, at 8:30 PM, larsxschnei...@gmail.com wrote:
>
> From: Lars Schneider
>
> Refactor conversion driver config parsing to ease the parsing of new
> configs in a subsequent patch.
>
> No functional change intended.
>
> Signed-off-by: Lars Sch
> -----Lars Schneider wrote: -
> To: Jeff King
> From: Lars Schneider
> Date: 06/28/2018 18:21
> Cc: "brian m. carlson" , Steve Groeger
> , git@vger.kernel.org
> Subject: Re: Use of new .gitattributes working-tree-encoding attribute across
> different
> On Jun 28, 2018, at 4:34 PM, Jeff King wrote:
>
> On Thu, Jun 28, 2018 at 02:44:47AM +, brian m. carlson wrote:
>
>> On Wed, Jun 27, 2018 at 07:54:52AM +, Steve Groeger wrote:
>>> We have common code that is supposed to be usable across different
>>> platforms and hence different f
> On 04 Jun 2018, at 11:55, Jeff King wrote:
>
> On Mon, Jun 04, 2018 at 12:18:59PM -0400, Martin-Louis Bright wrote:
>
>> Why must the credentials must be deleted after receiving the 401 (or
>> any) error? What's the rationale for this?
>
> Because Git only tries a single credential per invo
> On 04 Jun 2018, at 06:53, Junio C Hamano wrote:
>
> A release candidate Git v2.18.0-rc1 is now available for testing
> at the usual places. It is comprised of 842 non-merge commits
> since v2.17.0, contributed by 65 people, 20 of which are new faces.
>
> ...
>
> * The new "checkout-encodin
From: Lars Schneider
If a Git HTTP server responds with 401 or 407, then Git tells the
credential helper to reject and delete the credentials. In general
this is good.
However, in certain automation environments it is not desired to remove
credentials automatically. This is in particular the
> On 16 May 2018, at 11:29, Ævar Arnfjörð Bjarmason wrote:
>
>
> On Wed, May 16 2018, Lars Schneider wrote:
>
>> I am looking into different options to cache Git repositories on build
>> machines. The two most promising ways seem to be git-worktree [1] and
>
Hi,
I am looking into different options to cache Git repositories on build
machines. The two most promising ways seem to be git-worktree [1] and
git-alternates [2].
I wonder if you see an advantage of one over the other?
My impression is that git-worktree supersedes git-alternates. Would
that b
> On 16 Apr 2018, at 19:45, Jacob Keller wrote:
>
> On Mon, Apr 16, 2018 at 10:43 AM, Jacob Keller wrote:
>> On Mon, Apr 16, 2018 at 9:07 AM, Lars Schneider
>> wrote:
>>> What if Git kept a LRU list that contains file path, content hash, and
>>> mtime
> On 16 Apr 2018, at 19:04, Ævar Arnfjörð Bjarmason wrote:
>
>
> On Mon, Apr 16 2018, Lars Schneider wrote:
>
>>> On 16 Apr 2018, at 04:03, Linus Torvalds
>>> wrote:
>>>
>>> On Sun, Apr 15, 2018 at 6:44 PM, Junio C Hamano wrote:
>&
> On 16 Apr 2018, at 04:03, Linus Torvalds
> wrote:
>
> On Sun, Apr 15, 2018 at 6:44 PM, Junio C Hamano wrote:
>>
>> I think Elijah's corrected was_tracked() also does not care "has
>> this been renamed".
>
> I'm perfectly happy with the slightly smarter patches. My patch was
> really just a
From: Lars Schneider
UTF supports lossless conversion round tripping and conversions between
UTF and other encodings are mostly round trip safe as Unicode aims to be
a superset of all other character encodings. However, certain encodings
(e.g. SHIFT-JIS) are known to have round trip issues [1
From: Lars Schneider
Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable
tracing for content that is reencoded with the 'working-tree-encoding'
attribute. This is useful to debug encoding issues.
Signed-off-by: Lars Schneider
---
convert.c
From: Lars Schneider
Git recognizes files encoded with ASCII or one of its supersets (e.g.
UTF-8 or ISO-8859-1) as text files. All other encodings are usually
interpreted as binary and consequently built-in Git text processing
tools (e.g. 'git diff') as well as most Git web front e
From: Lars Schneider
The function same_encoding() could only recognize alternative names for
UTF-8 encodings. Teach it to recognize all kinds of alternative UTF
encoding names (e.g. utf16).
While we are at it, fix a crash that would occur if same_encoding() was
called with a NULL argument and a
From: Lars Schneider
Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we
allocate the buffer for the lower case string with xmallocz(). This
already ensures a NUL at the end of the allocated buffer.
Remove the unnecessary assignment.
Signed-off-by: Lars Schneider
From: Lars Schneider
Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE
or UTF-32LE a BOM must not be used [1]. The function returns true if
this is the case.
This function is used in a subsequent commit.
[1] http://unicode.org/faq/utf_bom.html#bom10
Signed-off-by: Lars
From: Lars Schneider
If the endianness is not defined in the encoding name, then let's
be strict and require a BOM to avoid any encoding confusion. The
is_missing_required_utf_bom() function returns true if a required BOM
is missing.
The Unicode standard instructs to assume big-endian if
From: Lars Schneider
Check in a case insensitive manner if one string is a prefix of another
string.
This function is used in a subsequent commit.
Signed-off-by: Lars Schneider
---
git-compat-util.h | 1 +
strbuf.c | 9 +
2 files changed, 10 insertions(+)
diff --git a/git
From: Lars Schneider
Create a copy of an existing string and make all characters upper case.
Similar xstrdup_tolower().
This function is used in a subsequent commit.
Signed-off-by: Lars Schneider
---
strbuf.c | 12
strbuf.h | 1 +
2 files changed, 13 insertions(+)
diff --git a
From: Lars Schneider
Hi,
Patches 1-6,9 are preparation and helper functions.
Patch 7,8,10 are the actual change.
This series is based on v2.16.0 and Torsten's 8462ff43e4 (convert_to_git():
safe_crlf/checksafe becomes int conv_flags, 2018-01-13).
The series can be rebased without conflic
From: Lars Schneider
Check that new content is valid with respect to the user defined
'working-tree-encoding' attribute.
Signed-off-by: Lars Schneider
---
convert.c| 61 +++
t/t0028-working-tree-encodi
> On 05 Apr 2018, at 18:41, Torsten Bögershausen wrote:
>
> On 01.04.18 15:24, Lars Schneider wrote:
>>> TRUE or false are values, but just wrong ones.
>>> If this test is removed, the user will see "failed to encode "TRUE" to
>>> "UT
> On 04 Jan 2018, at 20:26, Jeff King wrote:
>
> On Wed, Dec 27, 2017 at 09:41:30AM -0800, Junio C Hamano wrote:
>
>> Jeff King writes:
>>
>>> I, too, had a funny feeling about calling this "core". But I didn't have
>>> a better name, as I'm not sure what other place we have for config
>>> op
> On 02 Apr 2018, at 20:31, Lars Schneider wrote:
>
>
>> On 29 Mar 2018, at 20:37, Junio C Hamano wrote:
>>
>> lars.schnei...@autodesk.com writes:
>>
>>> From: Lars Schneider
>>>
>>> Patches 1-6,9 are preparation and helper funct
> On 29 Mar 2018, at 20:37, Junio C Hamano wrote:
>
> lars.schnei...@autodesk.com writes:
>
>> From: Lars Schneider
>>
>> Patches 1-6,9 are preparation and helper functions. Patch 4 is new.
>> Patch 7,8,10 are the actual change.
>>
>
> On 13 Mar 2018, at 18:45, Siddhartha Mishra wrote:
>
> On Mon, Mar 12, 2018 at 3:49 PM, Lars Schneider
> wrote:
>> Hi,
>>
>> That looks interesting but I agree with Dscho that we should not limit
>> this to master/maint.
>>
>> I assume you d
> On 16 Mar 2018, at 19:19, Eric Sunshine wrote:
>
> On Fri, Mar 16, 2018 at 1:50 PM, Junio C Hamano wrote:
>> Eric Sunshine writes:
>>> However, I'm having a tough time imagining cases in which callers
>>> would want same_encoding() to return true if both arguments are NULL,
>>> but outright
> On 18 Mar 2018, at 08:24, Torsten Bögershausen wrote:
>
> Some comments inline
>
> On Fri, Mar 09, 2018 at 06:35:32PM +0100, lars.schnei...@autodesk.com wrote:
>> From: Lars Schneider
>>
>> Git recognizes files encoded with ASCII or one of its supersets
> On 30 Mar 2018, at 12:32, Lars Schneider wrote:
>
>
>> On 30 Mar 2018, at 11:24, Ævar Arnfjörð Bjarmason wrote:
>>
>>
>> On Wed, Mar 28 2018, Junio C. Hamano wrote:
>>
>>> * ls/checkout-encoding (2018-03-16) 10 comm
> On 30 Mar 2018, at 11:24, Ævar Arnfjörð Bjarmason wrote:
>
>
> On Wed, Mar 28 2018, Junio C. Hamano wrote:
>
>> * ls/checkout-encoding (2018-03-16) 10 commits
>> - convert: add round trip check based on 'core.checkRoundtripEncoding'
>> - convert: add tracing for 'working-tree-encoding' attri
> On 17 Mar 2018, at 09:01, Duy Nguyen wrote:
>
> On Fri, Mar 16, 2018 at 10:22 PM, Jeff King wrote:
>>> diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
>>> index 3735ce413f..f6f346c468 100755
>>> --- a/ci/run-build-and-tests.sh
>>> +++ b/ci/run-build-and-tests.sh
>>> @@ -7,6
> On 14 Mar 2018, at 21:43, Junio C Hamano wrote:
>
> Derrick Stolee writes:
>
>> This v6 includes feedback around csum-file.c and the rename of hashclose()
>> to finalize_hashfile(). These are the first two commits of the series, so
>> they could be pulled out independently.
>>
>> The only o
> On 16 Mar 2018, at 00:25, Eric Sunshine wrote:
>
> On Thu, Mar 15, 2018 at 6:57 PM, wrote:
>> The function same_encoding() checked only for alternative UTF-8 encoding
>> names. Teach it to check for all kinds of alternative UTF encoding
>> names.
>>
> On 15 Mar 2018, at 20:18, Lars Schneider wrote:
>
>
>> On 15 Mar 2018, at 02:34, Junio C Hamano wrote:
>>
>> ...
>>
>> * ls/checkout-encoding (2018-03-09) 10 commits
>> - convert: add round trip check based on 'core.checkRoundtripEnc
From: Lars Schneider
Check that new content is valid with respect to the user defined
'working-tree-encoding' attribute.
Signed-off-by: Lars Schneider
---
convert.c| 61 +++
t/t0028-working-tree-encodi
From: Lars Schneider
Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable
tracing for content that is reencoded with the 'working-tree-encoding'
attribute. This is useful to debug encoding issues.
Signed-off-by: Lars Schneider
---
convert.c
From: Lars Schneider
UTF supports lossless conversion round tripping and conversions between
UTF and other encodings are mostly round trip safe as Unicode aims to be
a superset of all other character encodings. However, certain encodings
(e.g. SHIFT-JIS) are known to have round trip issues [1
From: Lars Schneider
Git recognizes files encoded with ASCII or one of its supersets (e.g.
UTF-8 or ISO-8859-1) as text files. All other encodings are usually
interpreted as binary and consequently built-in Git text processing
tools (e.g. 'git diff') as well as most Git web front e
From: Lars Schneider
If the endianness is not defined in the encoding name, then let's
be strict and require a BOM to avoid any encoding confusion. The
is_missing_required_utf_bom() function returns true if a required BOM
is missing.
The Unicode standard instructs to assume big-endian if
From: Lars Schneider
Create a copy of an existing string and make all characters upper case.
Similar xstrdup_tolower().
This function is used in a subsequent commit.
Signed-off-by: Lars Schneider
---
strbuf.c | 12
strbuf.h | 1 +
2 files changed, 13 insertions(+)
diff --git a
From: Lars Schneider
Hi,
Patches 1-6,9 are preparation and helper functions. Patch 4 is new.
Patch 7,8,10 are the actual change.
This series depends on Torsten's 8462ff43e4 (convert_to_git():
safe_crlf/checksafe becomes int conv_flags, 2018-01-13) which is
already in master.
Changes sinc
From: Lars Schneider
The function same_encoding() checked only for alternative UTF-8 encoding
names. Teach it to check for all kinds of alternative UTF encoding
names.
This function is used in a subsequent commit.
Signed-off-by: Lars Schneider
---
utf8.c | 20 +++-
1 file
From: Lars Schneider
Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we
allocate the buffer for the lower case string with xmallocz(). This
already ensures a NUL at the end of the allocated buffer.
Remove the unnecessary assignment.
Signed-off-by: Lars Schneider
From: Lars Schneider
Check in a case insensitive manner if one string is a prefix of another
string.
This function is used in a subsequent commit.
Signed-off-by: Lars Schneider
---
git-compat-util.h | 1 +
strbuf.c | 9 +
2 files changed, 10 insertions(+)
diff --git a/git
From: Lars Schneider
Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE
or UTF-32LE a BOM must not be used [1]. The function returns true if
this is the case.
This function is used in a subsequent commit.
[1] http://unicode.org/faq/utf_bom.html#bom10
Signed-off-by: Lars
> On 09 Mar 2018, at 20:11, Junio C Hamano wrote:
>
> lars.schnei...@autodesk.com writes:
>
>> From: Lars Schneider
>>
>> The canonical name of an UTF encoding has the format UTF, dash, number,
>> and an optionally byte order in upper case (e.g. UTF-8
> On 09 Mar 2018, at 20:10, Junio C Hamano wrote:
>
> lars.schnei...@autodesk.com writes:
>
>> +static const char *default_encoding = "UTF-8";
>> +
>> ...
>> +static const char *git_path_check_encoding(struct attr_check_item *check)
>> +{
>> +const char *value = check->value;
>> +
>> +i
> On 15 Mar 2018, at 02:34, Junio C Hamano wrote:
>
> ...
>
> * ls/checkout-encoding (2018-03-09) 10 commits
> - convert: add round trip check based on 'core.checkRoundtripEncoding'
> - convert: add tracing for 'working-tree-encoding' attribute
> - convert: advise canonical UTF encoding names
>
> On 14 Mar 2018, at 23:20, Jeff King wrote:
>
> On Wed, Mar 14, 2018 at 05:56:04PM +0100, Lars Schneider wrote:
>
>> I am investigating a Git merge (a86dd40fe) in which an older version of
>> a file won over the newer version. I try to understand why this is the
>
> On 14 Mar 2018, at 18:02, Derrick Stolee wrote:
>
> On 3/14/2018 12:56 PM, Lars Schneider wrote:
>> Hi,
>>
>> I am investigating a Git merge (a86dd40fe) in which an older version of
>> a file won over the newer version. I try to understand why this is the
&
Hi,
I am investigating a Git merge (a86dd40fe) in which an older version of
a file won over the newer version. I try to understand why this is the
case. I can reproduce the merge with the following commands:
$ git checkout -b test a02fa3303
$ GIT_MERGE_VERBOSITY=5 git merge --verbose c1b82995c
> On 14 Mar 2018, at 09:33, Michael Haggerty wrote:
>
> On Wed, Mar 14, 2018 at 9:14 AM, Lars Schneider
> wrote:
>> I am using Michael's fantastic Git repo analyzer tool "git-sizer" [*]
>> and it detected a very large commit of 7.33 MiB in my repo (see ch
Hi,
I am using Michael's fantastic Git repo analyzer tool "git-sizer" [*]
and it detected a very large commit of 7.33 MiB in my repo (see chart
below).
This large commit is expected. I've imported that repo from another
version control system but excluded all binary files (e.g. images) and
some
Hi,
That looks interesting but I agree with Dscho that we should not limit
this to master/maint.
I assume you did run this on TravisCI already? Can you share a link?
I assume you did find errors? Can we fix them or are there too many?
If there are existing errors, how do we define a "successful"
Hi Viet,
> On 12 Mar 2018, at 03:20, Viet Hung Tran wrote:
>
> This is my submission as a microproject for the Google Summer of code.
> I apologize for not setting the [GSoC] in my previous email
> at <20180312020855.7950-1-viethtran1...@gmail.com>.
> Please ignore it.
No worries :-)
> Add a n
> On 09 Mar 2018, at 20:00, Junio C Hamano wrote:
>
> lars.schnei...@autodesk.com writes:
>
>> +const char *advise_msg = _(
>> +"The file '%s' contains a byte order "
>> +"mark (BOM). Please use %.6s as "
>> +
From: Lars Schneider
Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE
or UTF-32LE a BOM must not be used [1]. The function returns true if
this is the case.
This function is used in a subsequent commit.
[1] http://unicode.org/faq/utf_bom.html#bom10
Signed-off-by: Lars
From: Lars Schneider
Check that new content is valid with respect to the user defined
'working-tree-encoding' attribute.
Signed-off-by: Lars Schneider
---
convert.c| 48 ++
t/t0028-working-tree-encodi
From: Lars Schneider
Git recognizes files encoded with ASCII or one of its supersets (e.g.
UTF-8 or ISO-8859-1) as text files. All other encodings are usually
interpreted as binary and consequently built-in Git text processing
tools (e.g. 'git diff') as well as most Git web front e
From: Lars Schneider
Check in a case insensitive manner if one string is a prefix of another
string.
This function is used in a subsequent commit.
Signed-off-by: Lars Schneider
---
git-compat-util.h | 1 +
strbuf.c | 9 +
2 files changed, 10 insertions(+)
diff --git a/git
From: Lars Schneider
Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable
tracing for content that is reencoded with the 'working-tree-encoding'
attribute. This is useful to debug encoding issues.
Signed-off-by: Lars Schneider
---
convert.c
From: Lars Schneider
The canonical name of an UTF encoding has the format UTF, dash, number,
and an optionally byte order in upper case (e.g. UTF-8 or UTF-16BE).
Some iconv versions support alternative names without a dash or with
lower case characters.
To avoid problems between different iconv
From: Lars Schneider
If the endianness is not defined in the encoding name, then let's
be strict and require a BOM to avoid any encoding confusion. The
is_missing_required_utf_bom() function returns true if a required BOM
is missing.
The Unicode standard instructs to assume big-endian if
From: Lars Schneider
Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we
allocate the buffer for the lower case string with xmallocz(). This
already ensures a NUL at the end of the allocated buffer.
Remove the unnecessary assignment.
Signed-off-by: Lars Schneider
From: Lars Schneider
UTF supports lossless conversion round tripping and conversions between
UTF and other encodings are mostly round trip safe as Unicode aims to be
a superset of all other character encodings. However, certain encodings
(e.g. SHIFT-JIS) are known to have round trip issues [1
From: Lars Schneider
Create a copy of an existing string and make all characters upper case.
Similar xstrdup_tolower().
This function is used in a subsequent commit.
Signed-off-by: Lars Schneider
---
strbuf.c | 12
strbuf.h | 1 +
2 files changed, 13 insertions(+)
diff --git a
From: Lars Schneider
Hi,
Patches 1-5,9 are preparation and helper functions.
Patch 6-8,10 are the actual change. Patch 8 is new.
This series depends on Torsten's 8462ff43e4 (convert_to_git():
safe_crlf/checksafe becomes int conv_flags, 2018-01-13) which is
already in master.
Changes sinc
> On 07 Mar 2018, at 19:04, Eric Sunshine wrote:
>
> On Wed, Mar 7, 2018 at 12:30 PM, wrote:
>> Check that new content is valid with respect to the user defined
>> 'working-tree-encoding' attribute.
>>
>> Signed-off-by: Lars Schneider
&
> On 09 Mar 2018, at 00:12, Junio C Hamano wrote:
>
> Duy Nguyen writes:
>
>>> extern int starts_with(const char *str, const char *prefix);
>>> +extern int startscase_with(const char *str, const char *prefix);
>>
>> This name is a bit hard to read. Boost [1] goes with istarts_with. I
>> wonde
> On 07 Mar 2018, at 23:57, Junio C Hamano wrote:
>
> Lars Schneider writes:
>
>> At this point I thought it would make sense to make the advised
>> encoding name uppercase in both situations. OK with you?
>
> In the endgame, if upcased and properly dashed for
> On 07 Mar 2018, at 23:52, Junio C Hamano wrote:
>
> Lars Schneider writes:
>
>> I don't think HT makes too much sense. However, isspace() is nice
>> and I will use it. Being more permissive on the inputs should hurt.
>
> You are being incoherent in the
> content to a canonical UTF-8 representation. On checkout Git will
>> reverse the conversion.
>>
>> Signed-off-by: Lars Schneider
>> ---
>> Documentation/gitattributes.txt | 80 +++
>> diff --git a/convert.c b/convert.c
>> @@ -265,6 +266,78 @@
> On 07 Mar 2018, at 23:32, Junio C Hamano wrote:
>
> Lars Schneider writes:
>
>> I also would have liked to advise "UTF-16" instead of "UTF16" as
>> you suggested. However, that required a few more lines and I wanted
>> to keep the ch
> On 07 Mar 2018, at 20:59, Junio C Hamano wrote:
>
> lars.schnei...@autodesk.com writes:
>
>> +static int check_roundtrip(const char* enc_name)
>
> The asterisk sticks to the variable, not type.
Argh. I need to put this check into Travis CI ;-)
>> +{
>> +/*
>> + * check_roundtrip_e
> On 07 Mar 2018, at 20:49, Junio C Hamano wrote:
>
> lars.schnei...@autodesk.com writes:
>
>> +static int validate_encoding(const char *path, const char *enc,
>> + const char *data, size_t len, int die_on_error)
>> +{
>> +/* We only check for UTF here as UTF?? can be an al
From: Lars Schneider
Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable
tracing for content that is reencoded with the 'working-tree-encoding'
attribute. This is useful to debug encoding issues.
Signed-off-by: Lars Schneider
---
convert.c
From: Lars Schneider
Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE
or UTF-32LE a BOM must not be used [1]. The function returns true if
this is the case.
This function is used in a subsequent commit.
[1] http://unicode.org/faq/utf_bom.html#bom10
Signed-off-by: Lars
From: Lars Schneider
Git recognizes files encoded with ASCII or one of its supersets (e.g.
UTF-8 or ISO-8859-1) as text files. All other encodings are usually
interpreted as binary and consequently built-in Git text processing
tools (e.g. 'git diff') as well as most Git web front e
From: Lars Schneider
If the endianness is not defined in the encoding name, then let's
be strict and require a BOM to avoid any encoding confusion. The
is_missing_required_utf_bom() function returns true if a required BOM
is missing.
The Unicode standard instructs to assume big-endian if
From: Lars Schneider
Hi,
Patches 1-5,8 are preparation and helper functions. Patch 3 is new.
Patch 6,7,9 are the actual change.
This series depends on Torsten's 8462ff43e4 (convert_to_git():
safe_crlf/checksafe becomes int conv_flags, 2018-01-13) which is
already in master.
Changes sin
From: Lars Schneider
Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we
allocate the buffer for the lower case string with xmallocz(). This
already ensures a NUL at the end of the allocated buffer.
Remove the unnecessary assignment.
Signed-off-by: Lars Schneider
From: Lars Schneider
Check that new content is valid with respect to the user defined
'working-tree-encoding' attribute.
Signed-off-by: Lars Schneider
---
convert.c| 55 +++
t/t0028-working-tree-encodi
From: Lars Schneider
UTF supports lossless conversion round tripping and conversions between
UTF and other encodings are mostly round trip safe as Unicode aims to be
a superset of all other character encodings. However, certain encodings
(e.g. SHIFT-JIS) are known to have round trip issues [1
From: Lars Schneider
Check in a case insensitive manner if one string is a prefix of another
string.
This function is used in a subsequent commit.
Signed-off-by: Lars Schneider
---
git-compat-util.h | 1 +
strbuf.c | 9 +
2 files changed, 10 insertions(+)
diff --git a/git
From: Lars Schneider
Create a copy of an existing string and make all characters upper case.
Similar xstrdup_tolower().
This function is used in a subsequent commit.
Signed-off-by: Lars Schneider
---
strbuf.c | 12
strbuf.h | 1 +
2 files changed, 13 insertions(+)
diff --git a
> On 07 Mar 2018, at 00:07, Junio C Hamano wrote:
>
> Junio C Hamano writes:
>
>> Lars Schneider writes:
>>
>>>> Also "UTF16" or other spelling
>>>> the platform may support but this code fails to recognise will go
>>>&g
> On 06 Mar 2018, at 23:53, Junio C Hamano wrote:
>
> Lars Schneider writes:
>
>>> Also "UTF16" or other spelling
>>> the platform may support but this code fails to recognise will go
>>> unchecked.
>>
>> That is true. Howe
> On 06 Mar 2018, at 21:50, Junio C Hamano wrote:
>
> lars.schnei...@autodesk.com writes:
>
>> +int is_missing_required_utf_bom(const char *enc, const char *data, size_t
>> len)
>> +{
>> +return (
>> + !strcmp(enc, "UTF-16") &&
>> + !(has_bom_prefix(data, len, utf16_be_bom, siz
d consequently built-in Git text processing
>> tools (e.g. 'git diff') as well as most Git web front ends do not
>> visualize the content.
>> [...]
>> Signed-off-by: Lars Schneider
>> ---
>> diff --git a/convert.c b/convert.c
>> @@ -978,6 +1051,25 @@
> On 06 Mar 2018, at 02:23, Junio C Hamano wrote:
>
> Lars Schneider writes:
>
>>> On 05 Mar 2018, at 22:50, Junio C Hamano wrote:
>>>
>>> lars.schnei...@autodesk.com writes:
>>>
>>>> +static int validate_encoding(const char *p
1 - 100 of 1033 matches
Mail list logo