[issue40416] Calling TextIOWrapper.tell() in the middle of reading a gb2312-encoded file causes UnicodeDecodeError
New submission from Rob Malouf : Calling TextIOWrapper.tell() while reading the attached gb2312-encoded file like this: with open('udhr-gb2312.txt', encoding='GB2312') as f: while True: line = f.readline() t = f.tell() if not line: break gives this result: Traceback (most recent call last): File "test.py", line 4, in t = f.tell() UnicodeDecodeError: 'gb2312' codec can't decode byte 0xb5 in position 0: illegal multibyte sequence The file seems to be well-formed and can be read without any problem. It's only the call to tell() that raises an issue. -- components: IO, Unicode files: udhr-gb2312.txt messages: 367494 nosy: ezio.melotti, rmalouf, vstinner priority: normal severity: normal status: open title: Calling TextIOWrapper.tell() in the middle of reading a gb2312-encoded file causes UnicodeDecodeError type: crash versions: Python 3.7 Added file: https://bugs.python.org/file49096/udhr-gb2312.txt ___ Python tracker <https://bugs.python.org/issue40416> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40416] Calling TextIOWrapper.tell() in the middle of reading a gb2312-encoded file causes UnicodeDecodeError
Rob Malouf added the comment: Same results on MacOS 10.15.4 (both the system python and the intel/anaconda version) and on CentOS 7.8 Here's the output with print(...): 13 71 72 392 393 399 536 537 761 762 879 880 933 934 1146 1147 1254 1255 1359 1360 1760 1761 1772 1895 1897 1906 2105 2107 2338 2339 2348 2398 2399 2408 2509 2510 2519 2612 2614 2622 2682 2684 2693 2898 2900 2909 3050 3052 3061 3113 3115 3124 3295 3297 3309 3445 3632 3644 3814 3816 3828 3882 3967 3979 4048 4184 4196 4226 4308 4320 4492 4559 4641 4653 4728 4770 4782 4999 5001 5013 5202 5204 5216 5270 5318 5333 5411 5465 5672 5687 5953 5954 5969 6082 6137 6307 6373 6388 6494 6496 6511 6786 6913 6928 7148 7371 7447 7462 7569 7704 7719 7847 7848 7863 7972 8238 8342 Traceback (most recent call last): File "test.py", line 4, in print(f.tell()) UnicodeDecodeError: 'gb2312' codec can't decode byte 0xb5 in position 0: illegal multibyte sequence -- ___ Python tracker <https://bugs.python.org/issue40416> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28340] [py2] TextIOWrapper.tell extremely slow
Changes by Rob Malouf : -- pull_requests: +1832 ___ Python tracker <http://bugs.python.org/issue28340> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25535] collections.Counter methods return Counter objects
New submission from Rob Malouf: Several collections.Counter methods return Counter objects, which is leads to wrong or at least confusing behavior when Counter is subclassed. For example, nltk.FreqDist is a subclass of Counter: >>> x = nltk.FreqDist(['a','a','b','b','b']) >>> y = nltk.FreqDist(['b','b','b','b','b']) >>> z = x + y >>> z.__class__ This applies to __add__(), __sub__(), __or__(), __and__(), __pos__(), and __neg__(). In contrast, the copy() method does (what I think is) the right thing: >>> x.copy().__class__ -- components: Library (Lib) messages: 253930 nosy: rmalouf priority: normal severity: normal status: open title: collections.Counter methods return Counter objects type: behavior versions: Python 3.5 ___ Python tracker <http://bugs.python.org/issue25535> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28340] TextIOWrapper.tell extremely slow
New submission from Rob Malouf: io.TextIOWrapper.tell() is unusably slow in Python 2.7. This same problem was introduced in Python 3 and fixed in Python 3.3 (see Issue # 4). Any chance of getting the fix backported into the Python 2.7 library? It would make it much easier to modernize Unicode handling in libraries that have to support both 2 and 3 using the same codebase. -- components: IO messages: 277898 nosy: rmalouf priority: normal severity: normal status: open title: TextIOWrapper.tell extremely slow type: performance versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue28340> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com