New submission from Ben Finney: In `tokenize.detect_encoding` is the following code::
first = read_or_stop() if first.startswith(BOM_UTF8): # … The `read_or_stop` function is defined as:: def read_or_stop(): try: return readline() except StopIteration: return b'' So, on catching ``StopIteration``, the return value will be a byte string. The `detect_encoding` code then immediately calls `sartswith`, which fails:: File "/usr/lib/python3.4/tokenize.py", line 409, in detect_encoding if first.startswith(BOM_UTF8): TypeError: startswith first arg must be str or a tuple of str, not bytes One or both of those locations in the code is wrong. Either `read_or_stop` should never return a byte string; or `detect_encoding` should not assume it can call `startswith` on the result. ---------- components: Library (Lib) messages: 234471 nosy: bignose priority: normal severity: normal status: open title: ‘tokenize.detect_encoding’ is confused between text and bytes: no ‘startswith’ method on a byte string type: crash versions: Python 3.4 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue23297> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com