[issue34403] test_utf8_mode.test_cmd_line() fails on HP-UX due to false assumptions

Michael Felt Mon, 27 Aug 2018 13:59:22 -0700

Michael Felt <[email protected]> added the comment:

On 27/08/2018 15:22, Michael Osipov wrote:
> Michael Osipov <[email protected]> added the comment:
>
> So I changed the test code to:
>
> diff --git a/Lib/test/test_utf8_mode.py b/Lib/test/test_utf8_mode.py
> index 26e2e13ec5..d9f8a3ed8b 100644
> --- a/Lib/test/test_utf8_mode.py
> +++ b/Lib/test/test_utf8_mode.py
> @@ -208,7 +208,7 @@ class UTF8ModeTests(unittest.TestCase):
>      def test_cmd_line(self):
>          arg = 'h\xe9\u20ac'.encode('utf-8')
>          arg_utf8 = arg.decode('utf-8')
> -        arg_ascii = arg.decode('ascii', 'surrogateescape')
> +        arg_ascii = arg.decode('roman8', 'surrogateescape')
>          code = 'import locale, sys; print("%s:%s" % 
> (locale.getpreferredencoding(), ascii(sys.argv[1:])))'
>
>          def check(utf8_opt, expected, **kw):
>
> and the output is:
> ======================================================================
> FAIL: test_cmd_line (test.test_utf8_mode.UTF8ModeTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 224, in 
> test_cmd_line
>     check('utf8=0', [c_arg], LC_ALL='C')
>   File "/var/osipovmi/cpython/Lib/test/test_utf8_mode.py", line 217, in check
>     self.assertEqual(args, ascii(expected), out)
> AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != 
> "['h\\xfb\\u02cb\\xe3\\x82\\u02dc']"
> - ['h\xc3\xa9\xe2\x82\xac']
> + ['h\xfb\u02cb\xe3\x82\u02dc']
>  : roman8:['h\xc3\xa9\xe2\x82\xac']
>
> I still don't understand that.
Something I found helpful was to change:


        check('utf8=0', [c_arg], LC_ALL='C')

to
        check('utf8=0', [c_arg], LC_ALL='C', failure=True )

This also fails, but it shows what is being executed.

Further, my 'understanding' is that ascii(whatever) is much smarter than
whatever.decode('ascii', ...) does. Also, ascii() tends to use the \x
shorthand, while decode('ascii', 'surrogateescape') uses the \udc prefix.

And, while you might still consider it a 'bug', did you try using c_arg
= arg.decode('iso-88859-1') ?

Michael (F)

>
> I believe that surrogate escape only works for ASCII and nothing else. If so, 
> this test must be skipped on HP-UX and AIX.
>
> ----------
>
> _______________________________________
> Python tracker <[email protected]>
> <https://bugs.python.org/issue34403>
> _______________________________________
>

----------

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue34403>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue34403] test_utf8_mode.test_cmd_line() fails on HP-UX due to false assumptions

Reply via email to