Devin Jeanpierre <jeanpierr...@gmail.com> added the comment:

You're right, and good catch. If a doctest starts with a "#coding:XXX" line, 
this should break.

One option is to replace the call to tokenize.tokenize with a call to 
tokenize._tokenize and pass 'utf-8' as a parameter. Downside: that's a private 
and undocumented API. The alternative is to manually add a coding line that 
specifies UTF-8, so that any coding line in the doctest would be ignored. 

My preferred option would be to add the ability to read unicode to the tokenize 
API, and then use that. I can file a separate ticket if that sounds good, since 
it's probably useful to others too.

One other thing to be worried about -- I'm not sure how doctest would treat 
tests with leading "coding:XXX" lines. I'd hope it ignores them, if it doesn't 
then this is more complicated and the above stuff wouldn't work.

I'll see if I have the time to play around with this (and add more test cases 
to the patch, correspondingly) this weekend.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11909>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to