On Tue, 11 Oct 2022 at 09:18, Cameron Simpson <c...@cskk.id.au> wrote: > > On 11Oct2022 08:02, Chris Angelico <ros...@gmail.com> wrote: > >There's a huge difference between non-fatal errors and syntactic > >errors. The OP wants the parser to magically skip over a fundamental > >syntactic error and still parse everything else correctly. That's > >never going to work perfectly, and the OP is surprised at this. > > The OP is not surprised by this, and explicitly expressed awareness that > resuming a parse had potential for "misparsing" further code. > > I remain of the opinion that one could resume a parse at the next > unindented line and get reasonable results a lot of the time.
The next line at the same indentation level as the line with the error, or the next flush-left line? Either way, there's a weird and arbitrary gap before you start parsing again, and you still have no indication of what could make sense. Consider: if condition # no colon code else: code To actually "restart" parsing, you have to make a guess of some sort. Maybe you can figure out what the user meant to do, and parse accordingly; but if that's the case, keep going immediately, don't wait for an unindented line. If you want for a blank line followed by an unindented line, that might help with a notion of "next logical unit of code", but it's very much dependent on the coding style, and if you have a codebase that's so full of syntax errors that you actually want to see more than one, you probably don't have a codebase with pristine and beautiful code layout. > In fact, I expect that one could resume tokenising at almost any line > which didn't seem to be inside a string and often get reasonable > results. "Seem to be"? On what basis? > I grew up with C and Pascal compilers which would _happily_ produce many > complaints, usually accurate, and all manner of syntactic errors. They > didn't stop at the first syntax error. Yes, because they work with a much simpler grammar. But even then, most syntactic errors (again, this is not to be confused with semantic errors - if you say "char *x = 1.234;" then there's no parsing ambiguity but it's not going to compile) cause a fair degree of nonsense afterwards. The waters are a bit muddied by some things being called "syntax errors" when they're actually nothing at all to do with the parser. For instance: >>> def f(): ... await q ... File "<stdin>", line 2 SyntaxError: 'await' outside async function This is not what I'm talking about; there's no parsing ambiguity here, and therefore no difficulty whatsoever in carrying on with the parsing. You could ast.parse() this code without an error. But resuming after a parsing error is fundamentally difficult, impossible without guesswork. > All you need in principle is a parser which goes "report syntax error > here, continue assuming <some state>". For Python that might mean > "pretend a missing final colon" or "close open brackets" etc, depending > on the context. If you make conservative implied corrections you can get > a reasonable continued parse, enough to find further syntax errors. And, more likely, you'll generate a lot of nonsense. Take something like this: items = [ item[1], item2], item[3], ] As a human, you can easily see what the problem is. Try teaching a parser how to handle this. Most likely, you'll generate a spurious error - maybe the indentation, maybe the intended end of the list - but there's really only one error here. Reporting multiple errors isn't actually going to be at all helpful. > I remember the Pascal compiler in particular had a really good "you > missed a semicolon _back there_" mode which was almost always correct, a > nice boon when correcting mistakes. > Ahh yes. Design a language with strict syntactic requirements, and it's not too hard to find where the programmer has omitted them. Thing is.... Python just doesn't HAVE those semicolons. Let's say that a variant Python required you to put a U+251C ├ at the start of every statement, and U+2524 ┤ at the end of the statement. A whole lot of classes of error would be extremely easy to notice and correct, and thus you could resume parsing; but that isn't benefiting the programmer any. When you don't have that kind of information duplication, it's a lot harder to figure out how to cheat the fix and go back to parsing. ChrisA -- https://mail.python.org/mailman/listinfo/python-list