Sven Strickroth <sven.strickr...@tu-clausthal.de> writes:

> when I create a git repository, add a file containing utf-8 characters
> or umlauts (like öäü.txt), commit and then export the HEAD revision to a
> zip archive using "git archive --format zip -o 1.zip HEAD", the zip file
> contains incorrect filenames:

My reading of archive-zip.c seems to suggest that we write out
whatever pathname you have in the tree, so a pathname encoded in
UTF-8 will be literally written out in the resulting zip archive.

Do you know in what encoding the pathnames are _expected_ to be
stored in zip archives?  Random documentation seems to suggest that
there is no standard encoding, e.g. http://docs.python.org/library/zipfile.html
says:

    There is no official file name encoding for ZIP files. If you
    have unicode file names, you must convert them to byte strings
    in your desired encoding before passing them to write(). WinZip
    interprets all file names as encoded in CP437, also known as DOS
    Latin.

which may explain it.

It may not be a bad idea for "git archive --format=zip" to

 (1) check if pathname is a correct UTF-8; and
 (2) check if it can be reencoded to latin-1

and if (and only if) both are true, automatically re-encode the path
to latin-1.

Of course, "git archive --format=zip --path-reencode=utf8-to-latin1"
would be the most generic way to do this.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to