New submission from Thomas Kluyver: The docs describe the NL token as "Token value used to indicate a non-terminating newline. The NEWLINE token indicates the end of a logical line of Python code; NL tokens are generated when a logical line of code is continued over multiple physical lines."
However, after a comment or a blank line, tokenize emits NL, even when it's not inside a multi-line statement. For example: In [15]: for tok in tokenize.generate_tokens(StringIO('#comment\n').readline): print(tok) TokenInfo(type=54 (COMMENT), string='#comment', start=(1, 0), end=(1, 8), line='#comment\n') TokenInfo(type=55 (NL), string='\n', start=(1, 8), end=(1, 9), line='#comment\n') TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='') This makes it difficult to use tokenize to detect multi-line statements, as we want to do in IPython. In my tests so far, changing two instances of NL to NEWLINE in this block (lines 530 & 533) makes it behave as I expect: http://hg.python.org/cpython/file/a375c3d88c7e/Lib/tokenize.py#l524 ---------- messages: 180846 nosy: takluyver priority: normal severity: normal status: open title: tokenize unconditionally emits NL after comment lines & blank lines versions: Python 2.6, Python 2.7, Python 3.2, Python 3.3 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17061> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com