New submission from Tyler Crompton: Line 402 in lib/python3.3/tokenize.py, contains the following line:
if first.startswith(BOM_UTF8): BOM_UTF8 is a bytes object. str.startswith does not accept bytes objects. I was able to use tokenize.tokenize only after making the following changes: Change line 402 to the following: if first.startswith(BOM_UTF8.decode()): Add these two lines at line 374: except AttributeError: line_string = line Change line 485 to the following: try: line = line.decode(encoding) except AttributeError: pass I do not know if these changes are correct as I have not fully tested this module after these changes, but it started working for me. This is the meat of my invokation of tokenize.tokenize: import tokenize with open('example.py') as file: # opening a file encoded as UTF-8 for token in tokenize.tokenize(file.readline): print(token) I am not suggesting that these changes are correct, but I do believe that the current implementation is incorrect. I am also unsure as to what other versions of Python are affected by this. ---------- components: Library (Lib) messages: 181349 nosy: Tyler.Crompton priority: normal severity: normal status: open title: tokenizer.tokenize passes a bytes object to str.startswith type: behavior versions: Python 3.3 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17125> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com