On Sat, Jan 06, 2018 at 01:48:01AM +0100, lars.schnei...@autodesk.com wrote:
> From: Lars Schneider <larsxschnei...@gmail.com>
> 
> Hi,
> 
> Patches 1-5 and 6 are helper functions and preparation.
> Patch 6 is the actual change.
> 
> I am still torn between "checkout-encoding" and "working-tree-encoding"
> as attribute name. I am happy to hear arguments for/against one or the
> other.

checkout-encoding is probably misleading, as it is even the checkin-encoding.

What is wrong with working-tree-encoding ?
I think the 2 "-".

What was wrong with workingtree-encoding ?
Or
workdir-encoding ?



> 
> Changes since v2:
> 
> * Added Torsten's crlfsave refactoring patch (patch 5)
>   @Torsten: I tried to make the commit message more clean, added
>             some comments to and renamed conv_flags_eol to
>             global_conv_flags_eol.
> 
> * Improved documentation and commit message (Torsten)

Good, thanks.
> 
> * Removed unnecessary NUL assignment in xstrdup_tolower() (Torsten)
> 
> * Set "git config core.eol lf" to made the test run on Windows (Dscho)
> 
> * Made BOM arrays static (Ramsay)


Some comments:

I would like to have the CRLF conversion a little bit more strict -
many users tend to set core.autocrlf=true or write "* text=auto"
in the .gitattributes.
Reading all the effort about BOM markers and UTF-16LE, I think there
should ne some effort to make the line endings round trip.
Therefore I changed convert.c to demand that the "text" attribute
is set to enable CRLF conversions.
(If I had submitted the patch, I would have demanded
"text eol=lf" or "text eol=crlf", but the test case t0028 indicates
that there is a demand to produce line endings as configured in core.eol)

Anyway, I rebased it onto git.git/master, changed the docu, and pushed it to
https://github.com/tboegi/git/tree/180107-0935-For-lars-schneider-encode-V3B

Here is a inter-diff against your version:

 diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
 index 1bc03e69c..b8d9f91c8 100644
 --- a/Documentation/gitattributes.txt
 +++ b/Documentation/gitattributes.txt
 @@ -281,7 +281,7 @@ interpreted as binary and consequently built-in Git text 
processing
  tools (e.g. 'git diff') as well as most Git web front ends do not
  visualize the content.
  
 -In these cases you can teach Git the encoding of a file in the working
 +In these cases you can tell Git the encoding of a file in the working
  directory with the `checkout-encoding` attribute. If a file with this
  attributes is added to Git, then Git reencodes the content from the
  specified encoding to UTF-8 and stores the result in its internal data
 @@ -308,17 +308,20 @@ Use the `checkout-encoding` attribute only if you cannot 
store a file in
  UTF-8 encoding and if you want Git to be able to process the content as
  text.
  
 +Note that when `checkout-encoding` is defined, by default the line
 +endings are not converted. `text=auto` and core.autocrlf are ignored.
 +Set the `text` attribute to enable CRLF conversions.
 +
  Use the following attributes if your '*.txt' files are UTF-16 encoded
 -with byte order mark (BOM) and you want Git to perform automatic line
 -ending conversion based on your platform.
 +with byte order mark (BOM).
  
  ------------------------
 -*.txt         text checkout-encoding=UTF-16
 +*.txt         checkout-encoding=UTF-16
  ------------------------
  
  Use the following attributes if your '*.txt' files are UTF-16 little
 -endian encoded without BOM and you want Git to use Windows line endings
 -in the working directory.
 +endian encoded without BOM and you want Git to use LF in the repo and
 +CRLF in the working directory.
  
  ------------------------
  *.txt                 checkout-encoding=UTF-16LE text eol=CRLF
 diff --git a/convert.c b/convert.c
 index 13f766d2a..1e29f515e 100644
 --- a/convert.c
 +++ b/convert.c
 @@ -221,18 +221,27 @@ static void check_global_conv_flags_eol(const char 
*path, enum crlf_action crlf_
        }
  }
  
  
  static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats,
 @@ -432,7 +441,7 @@ static int crlf_to_git(const struct index_state *istate,
                 * cherry-pick.
                 */
                if ((!(conv_flags & CONV_EOL_RENORMALIZE)) &&
 -                  has_cr_in_index(istate, path))
 +                  has_crlf_in_index(istate, path))
                        convert_crlf_into_lf = 0;
        }
        if (((conv_flags & CONV_EOL_RNDTRP_WARN) ||
 @@ -1214,9 +1223,28 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
                        ca->crlf_action = git_path_check_crlf(ccheck + 0);
                ca->ident = git_path_check_ident(ccheck + 1);
                ca->drv = git_path_check_convert(ccheck + 2);
 +              ca->checkout_encoding = git_path_check_encoding(ccheck + 5);
                if (ca->crlf_action != CRLF_BINARY) {
                        enum eol eol_attr = git_path_check_eol(ccheck + 3);
 -                      if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF)
 +                      if (ca->checkout_encoding) {
 +                              enum crlf_action crlf_action = CRLF_BINARY;
 +                              /*
 +                               * encoded files don't use auto.
 +                               * 'text' must be specified to
 +                               * do crlf conversions
 +                               */
 +                              if (ca->crlf_action == CRLF_TEXT) {
 +                                      if (eol_attr == EOL_LF)
 +                                              crlf_action = CRLF_TEXT_INPUT;
 +                                      else if (eol_attr == EOL_CRLF)
 +                                              crlf_action = CRLF_TEXT_CRLF;
 +                                      else if (text_eol_is_crlf())
 +                                              crlf_action = CRLF_TEXT_CRLF;
 +                                      else
 +                                              crlf_action = CRLF_TEXT_INPUT;
 +                              }
 +                              ca->crlf_action = crlf_action;
 +                      } else if (ca->crlf_action == CRLF_AUTO && eol_attr == 
EOL_LF)
                                ca->crlf_action = CRLF_AUTO_INPUT;
                        else if (ca->crlf_action == CRLF_AUTO && eol_attr == 
EOL_CRLF)
                                ca->crlf_action = CRLF_AUTO_CRLF;
 @@ -1225,11 +1253,11 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
                        else if (eol_attr == EOL_CRLF)
                                ca->crlf_action = CRLF_TEXT_CRLF;
                }
 -              ca->checkout_encoding = git_path_check_encoding(ccheck + 5);
        } else {
                ca->drv = NULL;
                ca->crlf_action = CRLF_UNDEFINED;
                ca->ident = 0;
 +              ca->checkout_encoding = NULL;
        }
  
        /* Save attr and make a decision for action */


Reply via email to