New submission from Zac Hatfield-Dodds <zac.hatfield.do...@gmail.com>:
I've been working on a tool called Hypothesmith - https://github.com/Zac-HD/hypothesmith - to generate arbitrary Python source code, inspired by CSmith's success in finding C compiler bugs. It's based on the grammar but ultimately only generates strings which `compile` accepts; this is the only way I know to answer the question "is the string valid Python"! I should be clear that I don't think the minimal examples are representative of real problems that users may encounter! However, fuzzing is very effective at finding important bugs if we can get these apparently-trivial ones out of the way by changing either the code or the test :-) ```python @example("#") @example("\n\\\n") @example("#\n\x0cpass#\n") @given(source_code=hypothesmith.from_grammar().map(fixup).filter(str.strip)) def test_tokenize_round_trip_string(source_code): tokens = list(tokenize.generate_tokens(io.StringIO(source_code).readline)) outstring = tokenize.untokenize(tokens) # may have changed whitespace from source output = tokenize.generate_tokens(io.StringIO(outstring).readline) assert [(t.type, t.string) for t in tokens] == [(t.type, t.string) for t in output] ``` Each of the `@example` cases are accepted by `compile` but fail the test; the `@given` case describes how to generate more such strings. You can read more details in the Hypothesmith repo if interested. I think these are real and probably unimportant bugs, but I'd love to start a conversation about what properties should *always* hold for functions dealing with Python source code - and how best to report research results if I can demonstrate that they don't! (for example, lib2to3 has many similar failures but I don't want to open a long list of low-value issues) ---------- components: Library (Lib) messages: 357704 nosy: Zac Hatfield-Dodds, meador.inge priority: normal severity: normal status: open title: Untokenize and retokenize does not round-trip type: behavior versions: Python 3.8 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue38953> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com