[issue42974] tokenize reports incorrect end col offset and line string when input ends without explicit newline

2021-01-20 Thread Brian Romanowski
Brian Romanowski added the comment: I took a look at Parser/tokenizer.c. From what I can tell, the tokenizer does fake a newline character when the input buffer does not end with actual newline characters and that the returned NEWLINE token has an effective length of 1 because of this

[issue42974] tokenize reports incorrect end col offset and line string when input ends without explicit newline

2021-01-20 Thread Brian Romanowski
Brian Romanowski added the comment: Shoot, just realized that consistency isn't the important thing here, the most important thing is that the tokenizer module exactly matches the output of the Python tokenizer. It's possible that my changes violate that constraint, I'll

[issue42974] tokenize reports incorrect end col offset and line string when input ends without explicit newline

2021-01-19 Thread Brian Romanowski
Change by Brian Romanowski : -- keywords: +patch pull_requests: +23085 stage: -> patch review pull_request: https://github.com/python/cpython/pull/24260 ___ Python tracker <https://bugs.python.org/issu

[issue42974] tokenize reports incorrect end col offset and line string when input ends without explicit newline

2021-01-19 Thread Brian Romanowski
New submission from Brian Romanowski : The tokenize module's tokenizer functions output incorrect (or at least misleading) information when the content being tokenized does not end in a line ending character. This is related to the fix for issue<33899> which added the NEWLINE