Bugs item #1293741, was opened at 2005-09-17 14:41 Message generated for change (Comment added) made by akaihola You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1293741&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Extension Modules Group: Python 2.4 Status: Open Resolution: None Priority: 5 Private: No Submitted By: GRISEL (ogrisel) Assigned to: Nobody/Anonymous (nobody) Summary: doctest runner cannot handle non-ascii characters Initial Comment: The doctest module fails when the expected result string has non-ascii charcaters even if the # -*- coding: XXX -*- line is properly set. The enclosed code sample produce the following error: Traceback (most recent call last): File "test_iso-8859-15.py", line 41, in ? _test() File "test_iso-8859-15.py", line 26, in _test tried, failed = runner.run(t) File "/usr/lib/python2.4/doctest.py", line 1376, in run return self.__run(test, compileflags, out) File "/usr/lib/python2.4/doctest.py", line 1259, in __run if check(example.want, got, self.optionflags): File "/usr/lib/python2.4/doctest.py", line 1475, in check_output if got == want: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 8: ordinal not in range(128) ---------------------------------------------------------------------- Comment By: akaihola (akaihola) Date: 2007-05-09 11:19 Message: Logged In: YES user_id=1432932 Originator: NO I made some tests with Python 2.5 on an Ubuntu Edgy system with an UTF-8 terminal. Here's the basic test which does work correctly: # -*- encoding: utf-8 -*- __doc__ = u""" >>> print u'ä' ä """ ; import doctest ; doctest.testmod() If I start to vary the "ä" (a with umlaut) characters in "print u'ä'" (the test) and the "ä" below it (expected result), I get a UnicodeEncodeError whenever doctest tries to print a message about non-matching test output. Here's a summary of my results in the format of test | expected result | success/failure Note that \u00e4 is unicode for the "ä" character. ä | ä | success \u00e4 | ä | success ä | \u00e4 | success \u00e4 | \u00e4 | success ä | x | fails to display message x | ä | fails to display message \u00e4 | x | fails to display message x | \u00e4 | fails to display message Conclusion: test running and output checking do work correctly, but there's a problem displaying messages about non-matching output whenever either the expected output or the output produced by the test contain any extended characters. The doctest documentation doesn't give any hint on work-arounds. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2006-04-24 04:21 Message: Logged In: YES user_id=31435 Unassigned myself -- don't know enough about encodings. ---------------------------------------------------------------------- Comment By: Bjorn Tillenius (bjoti) Date: 2006-02-16 13:41 Message: Logged In: YES user_id=1032069 I'm quite sure that you can use non-ASCII characters in your doctest, given that it's a unicode string. So if you make your docstring a unicode string, it should work. That is: u"""Docstring containing non-ASCII characters. ... """ ---------------------------------------------------------------------- Comment By: GRISEL (ogrisel) Date: 2005-09-18 13:25 Message: Logged In: YES user_id=795041 Unfortunateny that patch does not fix my problem. The patch at bug #1080727 fixes the problem for doctests written in external reST files (testfile and DocFileTest functions). My problem is related to internal docstring encoding (testmod for instance). However, Bjorn Tillenius says: """ If one writes doctests within documentation strings of classes and functions, it's possible to use non-ASCII characters since one can specify the encoding used in the source file. """ So according to him, docstrings' doctests with non-ascii characters should work by default. So maybe my system setup is somewhat broken. Could somebody please confirm/infirm this by running the attached sample script on his system? My system config: [EMAIL PROTECTED] (on linux) python 2.4.1 with: sys.getdefaultencoding() == 'ascii' and locale.getpreferredencoding() == 'ISO-8859-15' $ file test_iso-8859-15.py test_iso-8859-15.py: ISO-8859 English text ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2005-09-17 20:42 Message: Logged In: YES user_id=31435 Please try the patch at http://www.python.org/sf/1080727 and report back on whether it solves your problem (attaching comments to the patch report would be most useful). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1293741&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com