Re: unicode and manage.py test

Karen Tracey Fri, 02 Oct 2009 10:34:41 -0700

On Thu, Oct 1, 2009 at 8:11 AM, gentlestone <tibor.b...@hotmail.com> wrote:


>
> [snip]
> My question is. Can anybody explain, what does it mean? How should I
> rewrite my doctests in above way? How this piece of code should be?
>
> def slugify(name):
>    u"""
>    >>> slugify(u'Žabovitá zmiešaná kaša s.r.o')
>    u'zabovita-zmiesana-kasa-sro'
>    """
>    for key, value in _MAP.iteritems():
>        name = name.replace(key, value)
>    return defaultfilters.slugify(name
>
)
>

The doctest runner has an open problem with unicode literal docstrings: if
they contain non-ASCII characters, attempting to output a failure message
runs into trouble.  So instead of getting a message saying this was expected
but that was received, you get a message saying that the AssertionError
object is unprintable.  So you know the test failed, but you have no idea
why.

This problem is logged in the Python issue tracker:
http://bugs.python.org/issue1293741

There's a patch on that issue that fixes the problem, at least for
environments where stdout has an encoding that is capable of representing
the characters that need to be output.  With the last patch attached to that
Python bug applied to Django's copy of _doctest.py, your test above
(modified to report a failure by changing the expected output), successfully
reports the failure on my Linux box:

----------------------------------------------------------------------
File "/home/kmt/software/web/playground/ttt/models.py", line 133, in
ttt.models.slugify
Failed example:
    slugify(u'Žabovitá zmiešaná kaša s.r.o')
Expected:
    u'!!!zabovita-zmiesana-kasa-sro'
Got:
    u'zabovita-zmiesana-kasa-sro'

----------------------------------------------------------------------

However, using exactly the same code on a Windows box you still can't see
the failure reported properly because the Windows box uses a different
stdout encoding that is unable to represent the characters in the unicode
literal docstring:

======================================================================
ERROR: Doctest: ttt.models.slugify
----------------------------------------------------------------------
Traceback (most recent call last):
  File "d:\u\kmt\django\trunk\django\test\_doctest.py", line 2187, in
runTest
    clear_globs=False)
  File "d:\u\kmt\django\trunk\django\test\_doctest.py", line 1409, in run
    return self.__run(test, compileflags, out)
  File "d:\u\kmt\django\trunk\django\test\_doctest.py", line 1316, in __run
    self.report_failure(out, test, example, got)
  File "d:\u\kmt\django\trunk\django\test\_doctest.py", line 1184, in
report_failure
    self._checker.output_difference(example, got, self.optionflags))
  File "d:\u\kmt\django\trunk\django\test\_doctest.py", line 2186, in
<lambda>
    test, out=lambda x: new.write(x.encode(output_encoding)),
  File "d:\bin\Python2.5.2\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u017d' in
position 182: character maps to <undefined>

So, that fix doesn't make it possible to write doctests that will correctly
report failures cross-platform.  (It also possibly causes other problems
that I've seen while experimenting with it, but I don't have time to track
them down at the moment....but I'm unconvinced that fix alone will cure all
problems with docstrings and non-ASCII chars.)

What you can do to avoid the problem is not use unicode literal docstrings.
So get rid of the u in front of the docstring literal for slugify.  Then
doctest won't run into trouble attempting to auto-convert from unicode to
bytestring for output.  But then you'll have another problem, because the
embedded unicode literal in the docstring won't be built using the proper
encoding, causing a failure on what should be success:

----------------------------------------------------------------------
File "/home/kmt/software/web/playground/ttt/models.py", line 133, in
ttt.models.slugify
Failed example:
    slugify(u'Žabovitá zmiešaná kaša s.r.o')
Expected:
    u'zabovita-zmiesana-kasa-sro'
Got:
    u'a12abovita-zmieaana-kaaa-sro'

To fix that, remove the dependence on an embedded unicode literal in the
docstring.  That is, create a unicode object by explicitly decoding a
bytestring using the proper codec:

   """
   >>> slugify('Žabovitá zmiešaná kaša s.r.o'.decode('utf-8'))
   u'zabovita-zmiesana-kasa-sro'
   """

A big ugly, but that version will pass when it is supposed to, and will be
able to report a descriptive failure message across different platforms.

Karen

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: unicode and manage.py test

Reply via email to