On Thu, May 10, 2018 at 04:43:37PM +0100, Chris Lamb wrote: > > Do you think this would be fine? > > Whilst this works, would it not be better if we could use bytes for > filenames throughout? I mean, AIUI there is no assumption that > filesystems need to have any form of valid encoding whatsoever, let > alone UTF-8.
That was my initial idea as well, but apparently the Python developers are of different opinion. Check out the PEP I linked in my previous email: https://www.python.org/dev/peps/pep-0383/ Together with the argparse bug I also linked: https://bugs.python.org/issue21416 - apparently it's "hard" (more like impossible?) to get bytes from the CLI... I believe that, like that bug is showing, we should just specify type=os.fsencode # https://docs.python.org/3/library/os.html#os.fsencode in the parser.add_argument() calls using a filename (to make sure argparse doesn't change output), and then re-encode them before passing them to functions that can't handle surrogate encoded stuff like this magic module. > However, somewhat happy to see this in diffoscope as it certainly > improves the current state of affairs. If you do commit it, please > include my testcase (or something based on it) that I added in: > > https://bugs.debian.org/898022#5 Of course. -- regards, Mattia Rizzolo GPG Key: 66AE 2B4A FCCF 3F52 DA18 4D18 4B04 3FCD B944 4540 .''`. more about me: https://mapreri.org : :' : Launchpad user: https://launchpad.net/~mapreri `. `'` Debian QA page: https://qa.debian.org/developer.php?login=mattia `-
signature.asc
Description: PGP signature
_______________________________________________ Reproducible-builds mailing list Reproducible-builds@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/reproducible-builds