New submission from Lu jaymin <ljm51...@gmail.com>:

```
# demo.py
s = 
'测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试'
```
The file on above is for testing, it's encoding is utf-8, the length of `s` is 
1020 bytes(3 * 340).

When execute `python3 demo.py` on terminal, Python will throws the following 
error:

```
$ python3 -V
Python 3.6.4

$ python3 demo.py
  File "demo.py", line 2
SyntaxError: Non-UTF-8 code starting with '\xe8' in file demo.py on line 2, but 
no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
```

I've found this error occurred on about line 630(the bottom of the function 
`decoding_fgets`) of the file `cpython/Parser/tokenizer.c` after I read 
Python-3.6.6's source code.

When Python execute xxx.py, Python will call the function `decoding_fgets` to 
read one line of raw bytes from file and save the raw bytes to a buffer, the 
initial length of the buffer is 1024 bytes, `decoding_fgets` will use the 
function `valid_utf8` to check raw bytes's encoding.

If the lenght of raw bytes is too long(like greater than 1023 bytes), then 
Python will call `decoding_fgets` multiple times and increase buffer's size by 
1024 bytes every time.so raw bytes read by `decoding_fgets` is maybe 
incomplete, for example, raw bytes contains a part of bytes of a character, 
that will cause `valide_utf8` failed.

I suggest that we should always use `fp_readl` to read source coe from file.

----------
components: Interpreter Core
messages: 327686
nosy: Lu jaymin
priority: normal
severity: normal
status: open
title: Python throws “SyntaxError: Non-UTF-8 code start with \xe8...” when 
parse source file
type: behavior
versions: Python 3.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34979>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to