[issue17125] tokenizer.tokenize passes a bytes object to str.startswith

Tyler Crompton Mon, 04 Feb 2013 08:51:34 -0800

New submission from Tyler Crompton:

Line 402 in lib/python3.3/tokenize.py, contains the following line:


    if first.startswith(BOM_UTF8):

BOM_UTF8 is a bytes object. str.startswith does not accept bytes objects. I was 
able to use tokenize.tokenize only after making the following changes:

Change line 402 to the following:

    if first.startswith(BOM_UTF8.decode()):

Add these two lines at line 374:

        except AttributeError:
            line_string = line

Change line 485 to the following:

            try:
                line = line.decode(encoding)
            except AttributeError:
                pass

I do not know if these changes are correct as I have not fully tested this 
module after these changes, but it started working for me. This is the meat of 
my invokation of tokenize.tokenize:

import tokenize

with open('example.py') as file: # opening a file encoded as UTF-8
        for token in tokenize.tokenize(file.readline):
                print(token)

I am not suggesting that these changes are correct, but I do believe that the 
current implementation is incorrect. I am also unsure as to what other versions 
of Python are affected by this.

----------
components: Library (Lib)
messages: 181349
nosy: Tyler.Crompton
priority: normal
severity: normal
status: open
title: tokenizer.tokenize passes a bytes object to str.startswith
type: behavior
versions: Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue17125>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue17125] tokenizer.tokenize passes a bytes object to str.startswith

Reply via email to