New submission from Zac Hatfield-Dodds <zac.hatfield.do...@gmail.com>:

I've been working on a tool called Hypothesmith - 
https://github.com/Zac-HD/hypothesmith - to generate arbitrary Python source 
code, inspired by CSmith's success in finding C compiler bugs.  It's based on 
the grammar but ultimately only generates strings which `compile` accepts; this 
is the only way I know to answer the question "is the string valid Python"!

I should be clear that I don't think the minimal examples are representative of 
real problems that users may encounter!  However, fuzzing is very effective at 
finding important bugs if we can get these apparently-trivial ones out of the 
way by changing either the code or the test :-)

```python
@example("#")
@example("\n\\\n")
@example("#\n\x0cpass#\n")
@given(source_code=hypothesmith.from_grammar().map(fixup).filter(str.strip))
def test_tokenize_round_trip_string(source_code):
    tokens = list(tokenize.generate_tokens(io.StringIO(source_code).readline))
    outstring = tokenize.untokenize(tokens)  # may have changed whitespace from 
source
    output = tokenize.generate_tokens(io.StringIO(outstring).readline)
    assert [(t.type, t.string) for t in tokens] == [(t.type, t.string) for t in 
output]
```

Each of the `@example` cases are accepted by `compile` but fail the test; the 
`@given` case describes how to generate more such strings.  You can read more 
details in the Hypothesmith repo if interested.

I think these are real and probably unimportant bugs, but I'd love to start a 
conversation about what properties should *always* hold for functions dealing 
with Python source code - and how best to report research results if I can 
demonstrate that they don't!

(for example, lib2to3 has many similar failures but I don't want to open a long 
list of low-value issues)

----------
components: Library (Lib)
messages: 357704
nosy: Zac Hatfield-Dodds, meador.inge
priority: normal
severity: normal
status: open
title: Untokenize and retokenize does not round-trip
type: behavior
versions: Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue38953>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to