Devin Jeanpierre <jeanpierr...@gmail.com> added the comment: You're right, and good catch. If a doctest starts with a "#coding:XXX" line, this should break.
One option is to replace the call to tokenize.tokenize with a call to tokenize._tokenize and pass 'utf-8' as a parameter. Downside: that's a private and undocumented API. The alternative is to manually add a coding line that specifies UTF-8, so that any coding line in the doctest would be ignored. My preferred option would be to add the ability to read unicode to the tokenize API, and then use that. I can file a separate ticket if that sounds good, since it's probably useful to others too. One other thing to be worried about -- I'm not sure how doctest would treat tests with leading "coding:XXX" lines. I'd hope it ignores them, if it doesn't then this is more complicated and the above stuff wouldn't work. I'll see if I have the time to play around with this (and add more test cases to the patch, correspondingly) this weekend. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue11909> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com