Nick Coghlan <ncogh...@gmail.com> added the comment: There are a *lot* of characters with semantic significance that are reported by the tokenize module as generic "OP" tokens:
token.LPAR token.RPAR token.LSQB token.RSQB token.COLON token.COMMA token.SEMI token.PLUS token.MINUS token.STAR token.SLASH token.VBAR token.AMPER token.LESS token.GREATER token.EQUAL token.DOT token.PERCENT token.BACKQUOTE token.LBRACE token.RBRACE token.EQEQUAL token.NOTEQUAL token.LESSEQUAL token.GREATEREQUAL token.TILDE token.CIRCUMFLEX token.LEFTSHIFT token.RIGHTSHIFT token.DOUBLESTAR token.PLUSEQUAL token.MINEQUAL token.STAREQUAL token.SLASHEQUAL token.PERCENTEQUAL token.AMPEREQUAL token.VBAREQUAL token.CIRCUMFLEXEQUAL token.LEFTSHIFTEQUAL token.RIGHTSHIFTEQUAL token.DOUBLESTAREQUAL¶ token.DOUBLESLASH token.DOUBLESLASHEQUAL token.AT However, I can't fault tokenize for deciding to treat all of those tokens the same way - for many source code manipulation purposes, these just need to be transcribed literally, and the "OP" token serves that purpose just fine. As the extensive test updates in the current patch suggest, AMK is also correct that changing this away from always returning "OP" tokens (even for characters with more specialised tokens available) would be a backwards incompatible change. I think there are two parts to this problem, one documentation related (affecting 2.7, 3.2, 3.3) and another that would be an actual change in 3.3: 1. First, I think 3.3 should add an "exact_type" attribute to TokenInfo instances (without making it part of the tuple-based API). For most tokens, this would be the same as "type", but for OP tokens, it would provide the appropriate more specific token ID. 2. Second, the tokenize module documentation should state *explicitly* which tokens it collapses down into the generic "OP" token, and explain how to use the "string" attribute to recover the more detailed information. ---------- assignee: -> docs@python components: +Documentation nosy: +docs@python, ncoghlan stage: -> needs patch title: function generate_tokens at tokenize.py yields wrong token for colon -> Add new attribute to TokenInfo to report specific token IDs versions: +Python 2.7, Python 3.3 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue2134> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com