Yes, I'm using this document now but I was wondering if there is a formal spec for lexical grammar? It looks like some part of the doc "http://docs.python.org/py3k/reference/grammar.html" is missing. We can find some replacement in lexical_analysis.html but it seems this document is write for a python user instead of a guy trying to implement python. 在 2011-09-22 00:55:45,"Thomas Jollans" <t...@jollybox.de> 写道: >On 21/09/11 18:33, 程劭非 wrote: >> Thanks Thomas. >> I've read the document >> http://docs.python.org/py3k/reference/lexical_analysis.html >> >> but I worried it might leak some language features like "tab magic". >> >> For I'm working on a parser with JavaScript I need a more strictly defined >> spec. >> >> Currently I have a highlighter here >> ->http://shaofei.name/python/PyHighlighter.html >> (Also the lexer http://shaofei.name/python/PyLexer.html) >> >> As you can see, I just make its behavior align with CPython, but I'm not >> sure what the real python lexical grammar is like. >> >> Does anyone know if there is a lexical grammar spec like other >> languages(e.g. http://bclary.com/2004/11/07/#annex-a)? > >I believe the language documentation on docs.python.org is all the >documentation of the language there is. It may not be completely formal, >and in parts it concentrates not on the actual rules but on the original >implementation, but, as far as I can tell, it tells you everything you >need to know to write a new parser for the Python language, without any >ambiguity. > >You appear to be anxious about implementing the indentation mechanism >correctly. The language documentation describes a behaviour precisely. >What is the problem? > >Thomas > >> >> Please help me. Thanks a lot. >> 在 2011-09-21 19:41:33,"Thomas Jollans" <t...@jollybox.de> 写道: >>> On 21/09/11 11:44, 程劭非 wrote: >>>> Hi, everyone, >>>> I've found there was several tokens used in python's >>>> grammar(http://docs.python.org/reference/grammar.html) but I didn't see >>>> their definition anywhere. The tokens listed here: >>> >>> They should be documented in >>> http://docs.python.org/py3k/reference/lexical_analysis.html - though >>> apparently not using these exact terms. >>> >>>> NEWLINE >>> Trivial: U+000A >>> >>>> ENDMARKER >>> End of file. >>> >>>> NAME >>> documented as "identifier" in 2.3 >>> >>>> INDENT >>>> DEDENT >>> Documented in 2.1.8. >>> >>>> NUMBER >>> Documented in 2.4.3 - 2.4.6 >>> >>>> STRING >>> Documented in 2.4.2 >>> >>>> I've got some infomations from the source >>>> code(http://svn.python.org/projects/python/trunk/Parser/tokenizer.c) but >>>> I'm not sure which feature is only for this specified implementaion. (I >>>> saw tabstop could be modified with comments using "tab-width:", >>>> ":tabstop=", ":ts=" or "set tabsize=", is this feature really in spec?) >>> >>> That sounds like a legacy feature that is no longer used. Somebody >>> familiar with the early history of Python might be able to shed more >>> light on the situation. It is inconsisten with the spec (section 2.1.8): >>> >>> """ >>> Indentation is rejected as inconsistent if a source file mixes tabs and >>> spaces in a way that makes the meaning dependent on the worth of a tab >>> in spaces; a TabError is raised in that case. >>> """ >>> >>> - Thomas >>> -- >>> http://mail.python.org/mailman/listinfo/python-list >> >
-- http://mail.python.org/mailman/listinfo/python-list