Harrison Chudleigh wrote:

> While working on a program, I ran into an error with the usage of the
> module tokenize. The following message was displayed.
>  File
> line 467, in tokenize
>     encoding, consumed = detect_encoding(readline)
>   File
> line 409, in detect_encoding
>     if first.startswith(BOM_UTF8):
> TypeError: startswith first arg must be str or a tuple of str, not bytes
> Undaunted, I changed the error on line 409. The line then read:
> if first.startswith(BOM_UTF8):
As Steven says -- don't change the standard library.

Your problem is likely that you are opening the file containing the code you 
want to tokenize in text mode. Compare:

$ cat 42.py
$ python3
Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tokenize

First with the file opened in text mode:

>>> with open("42.py", "r") as f:
...     for t in tokenize.tokenize(f.readline): print(t)
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python3.4/tokenize.py", line 468, in tokenize
    encoding, consumed = detect_encoding(readline)
  File "/usr/lib/python3.4/tokenize.py", line 408, in detect_encoding
    if first.startswith(BOM_UTF8):
TypeError: startswith first arg must be str or a tuple of str, not bytes

Now let's switch to binary mode:

>>> with open("42.py", "rb") as f:
...     for t in tokenize.tokenize(f.readline): print(t)
TokenInfo(type=56 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), 
TokenInfo(type=2 (NUMBER), string='42', start=(1, 0), end=(1, 2), 
TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 2), end=(1, 3), 
TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')


Reply via email to