On 2015-03-27 10:15, Steven D'Aprano wrote: > If that's all it is, why don't you just run the tokenizer over it > and see what it says? > > py> from cStringIO import StringIO > py> code = StringIO('spam = "abcd" "efgh"\n') > py> import tokenize > py> for item in tokenize.generate_tokens(code.readline): > ... print item > ... > (1, 'spam', (1, 0), (1, 4), 'spam = "abcd" "efgh"\n') > (51, '=', (1, 5), (1, 6), 'spam = "abcd" "efgh"\n') > (3, '"abcd"', (1, 7), (1, 13), 'spam = "abcd" "efgh"\n') > (3, '"efgh"', (1, 14), (1, 20), 'spam = "abcd" "efgh"\n') > (4, '\n', (1, 20), (1, 21), 'spam = "abcd" "efgh"\n') > (0, '', (2, 0), (2, 0), '') > > > Looks to me that the two string literals each get their own token,
Nice. I haven't played with the tokenize module before, but resolving arguments on comp.lang.python is one of the best possible uses. It was interesting to try other feeders to generate_tokens(), my favorite being >>> import tokenize >>> i = iter(["spam = 'abc' 'def'"]) >>> for item in tokenize.generate_tokens(lambda: next(i)): ... print(item) ... TokenInfo(type=1 (NAME), string='spam', start=(1, 0), end=(1, 4), line="spam = 'abc' 'def'") TokenInfo(type=52 (OP), string='=', start=(1, 5), end=(1, 6), line="spam = 'abc' 'def'") TokenInfo(type=3 (STRING), string="'abc'", start=(1, 7), end=(1, 12), line="spam = 'abc' 'def'") TokenInfo(type=3 (STRING), string="'def'", start=(1, 13), end=(1, 18), line="spam = 'abc' 'def'") TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='') It's also nice to have the translation from token-type to token-type-name in Py3 -tkc -- https://mail.python.org/mailman/listinfo/python-list