Hi, I tried using seek to reverse a text file after reading about the
subject in the documentation:

https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects

https://docs.python.org/3/library/io.html#io.TextIOBase.seek

The script "reverse_text_by_seek3.py" produces expected result on a UTF-8
encoded text file "Moon-utf8.txt" (several lines of Chinese characters):

    $ ./reverse_text_by_seek3.py Moon-utf8.txt
    [0, 10, 11, 27, 28, 44, 60, 76, 92]
    低头思故乡
    举头望明月
    疑似地上霜
    床前明月光
    
    李白(唐)
    
    静夜思

or

    $ ./reverse_text_by_seek3.py Moon-utf8.txt seek
    [0, 10, 11, 27, 28, 44, 60, 76, 92]
    低头思故乡
    举头望明月
    疑似地上霜
    床前明月光
    
    李白(唐)
    
    静夜思

However, an exception is raised if a file with the same content encoded in
GBK is provided:

    $ ./reverse_text_by_seek3.py Moon-gbk.txt
    [0, 7, 8, 19, 21, 32, 42, 53, 64]
    低头思故乡
    举头望明月
    Traceback (most recent call last):
      File "./reverse_text_by_seek3.py", line 21, in <module>
        print(f.readline(), end="")
    UnicodeDecodeError: 'gbk' codec can't decode byte 0xaa in position 8: 
illegal multibyte sequence

While everything works fine again when a seek operation is applied after
each readline invocation:

    $ ./reverse_text_by_seek3.py Moon-gbk.txt seek
    [0, 7, 8, 19, 20, 31, 42, 53, 64]
    低头思故乡
    举头望明月
    疑似地上霜
    床前明月光
    
    李白(唐)
    
    静夜思

Some of the printed positions are also different.

A python2 counterpart "reverse_text_by_seek2.py" is written, which decodes
the lines upon printing instead of reading, no exception occurs.

It's just fun doing this, not for anything useful. Can anyone reproduce the
above results? What's really happening here? Is it a bug?

Other information:

    Distribution: Arch Linux
    Python3 package: 3.4.3-2 (official)
    Python2 package: 2.7.10-1 (official)

    $ uname -rvom
    4.1.2-2-ARCH #1 SMP PREEMPT Wed Jul 15 08:30:32 UTC 2015 x86_64 GNU/Linux

    $ env | grep -e LC -e LANG
    LC_ALL=en_US.UTF-8
    LC_COLLATE=C
    LANG=en_US.UTF-8

Attachment: reverse_text_by_seek3.py
Description: Binary data

¾²Ò¹Ë¼

Àî°×£¨ÌÆ£©

´²Ç°Ã÷Ô¹â
ÒÉËƵØÉÏ˪
¾ÙÍ·ÍûÃ÷ÔÂ
µÍͷ˼¹ÊÏç
静夜思

李白(唐)

床前明月光
疑似地上霜
举头望明月
低头思故乡

Attachment: reverse_text_by_seek2.py
Description: Binary data

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to