Andrew Dalke wrote:
Bengt Richter:
But it does look ahead to recognize += (i.e., it doesn't generate two successive also-legal tokens of '+' and '=') so it seems it should be a simple fix.
But that works precisely because of the greedy nature of tokenization. Given "a+=2" the longest token it finds first is "a" because "a+" is not a valid token. The next token is "+=". It isn't just "+" because "+=" is valid. And the last token is "2".
[...]
You're absolutely right, of course, Andrew, and personally I don't think that this is worth trying to fix. But the original post I responded to was suggesting that an LL(1) grammar couldn't disambiguate "1." and "1..3", which assertion relied on a slight fuzzing of the lines between lexical and syntactical analysis that I didn't want to leave unsharpened.
The fact that Python's existing tokenizer doesn't allow multi-character tokens beginning with a dot after a digit (roughly speaking) is what makes the whole syntax proposal infeasibly hard to adapt to.
regards Steve -- Steve Holden http://www.holdenweb.com/ Python Web Programming http://pydish.holdenweb.com/ Holden Web LLC +1 703 861 4237 +1 800 494 3119 -- http://mail.python.org/mailman/listinfo/python-list