Johannes Berg added the comment:
I've also filed https://sourceware.org/bugzilla/show_bug.cgi?id=26034 for
glibc, because that's where really the issues seems to be?
But perhaps python should be forgiving of glibc errors here.
--
Johannes Berg added the comment:
Like I said above, it could be argued that the bug is in glibc, and then
https://p.sipsolutions.net/6a4e9fce82dbbfa0.txt
could be used as a simple LD_PRELOAD wrapper to work around this, just to
illustrate the problem from that side.
Arguably, that makes
Johannes Berg added the comment:
And wrt. _Py_DecodeUTF8Ex() - it doesn't seem to help. But that's probably
because I'm not __ANDROID__, nor __APPLE__, and then regardless of
current_locale being non-zero or not, we end up in decode_current_locale()
where the impedance m
Johannes Berg added the comment:
In fact that python one-liner works with just about everything else that you
can throw at it, just not something that "looks like utf-8 but isn't".
And of course adding LC_CTYPE=ascii or something like that fixes it, as you'
Johannes Berg added the comment:
A simple test case is something like
./python -c 'import sys;
print(sys.argv[1].encode(sys.getfilesystemencoding(), "surrogateescape"))'
"$(echo -ne '\xfa\xbd\x83\x96\x80')"
Which you'd probably expect to pr
Johannes Berg added the comment:
Pretty sure this is an issue still, I see it on current git master.
This seems to work around it?
https://p.sipsolutions.net/603927f1537226b3.txt
Basically, it seems that mbstowcs() and mbrtowc() on glibc with utf-8 just
blindly decode even invalid UTF-8 to