Hi all, PHP grammar is far from being complex. It is possible to describe most of the syntax with a simple explanation. Example:
* We can separate a program into several statements. * There're a couple of items that cannot be declared into different places (namespace, use), so consider them as top-statements. * Also, Namespace declaration may contain multiple statements if you define them under brackets. * UseStatement can only be used inside a namespace or inside global scope. * Finally, we support Classes. Now we can describe a good portion of PHP grammar: /* Terminals */ identifier char string integer float boolean /* Grammar Rules */ Literal ::= string | char | integer | float | boolean Qualifier ::= ("private" | "public" | "protected") ["static"] /* Identifiers */ NamespaceIdentifier ::= identifier {"\" identifier} ClassIdentifier ::= identifier MethodIdentifier ::= identifier FullyQualifiedClassIdentifier ::= [NamespaceIdentifier] ClassIdentifier /* Root grammar */ Program ::= {TopStatement} {Statement} TopStatement ::= NamespaceDeclaration | UseStatement | CommentStatement Statement ::= ClassDeclaration | FunctionDeclaration | ... /* Namespace Declaration */ NamespaceDeclaration ::= InlineNamespaceDeclaration | ScopeNamespaceDeclaration InlineNamespaceDeclaration ::= SimpleNamespaceDeclaration ";" {UseDeclaration} {Statement} ScopeNamespaceDeclaration ::= SimpleNamespaceDeclaration "{" {UseDeclaration} {Statement} "}" SimpleNamespaceDeclaration ::= "namespace" NamespaceIdentifier /* Use Statement */ UseStatement ::= "use" SimpleUseStatement {"," SimpleUseStatement} ";" SimpleUseStatement ::= SimpleNamespaceUseStatement | SimpleClassUseStatement SimpleNamespaceUseStatement ::= NamespaceIdentifier ["as" NamespaceIdentifier] SimpleClassUseStatement ::= FullyQualifiedClassIdentifier ["as" ClassIdentifier] /* Comment Declaration */ CommentStatement ::= InlineCommentStatement | MultilineCommentStatement InlineCommentStatement ::= ("//" | "#") string MultilineCommentStatement ::= SimpleMultilineCommentStatement | DocBlockStatement SimpleMultilineCommentStatement ::= "/*" {"*" string} "*/" DocBlockStatement ::= "/**" {"*" string} "*/" /* Class Declaration */ ClassDeclaration ::= SimpleClassDeclaration "{" {ClassMemberDeclaration} "}" SimpleClassDeclaration ::= [abstract] "class" ClassIdentifier ["extends" FullyQualifiedClassIdentifier] ["implements" FullyQualifiedClassIdentifier {"," FullyQualifiedClassIdentifier}] ClassMemberDeclaration ::= ConstDeclaration | PropertyDeclaration | MethodDeclaration ConstDeclaration ::= [DocBlockStatement] "const" identifier "=" Literal ";" PropertyDeclaration ::= [DocBlockStatement] Qualifier Variable ["=" Literal] ";" MethodDeclaration ::= [DocBlockStatement] (PrototypeMethodDeclaration | ComplexMethodDeclaration) PrototypeMethodDeclaration ::= "abstract" Qualifier "function" MethodIdentifier "(" {ArgumentDeclaration} ");" ComplexMethodDeclaration ::= ["final"] Qualifier "function" MethodIdentifier "(" {ArgumentDeclaration} ")" "{" {Statement} "}" ArgumentDeclaration ::= SimpleArgumentDeclatation {"," SimpleArgumentDeclaration} SimpleArgumentDeclaration ::= [TypeHint] Variable ["=" Literal] TypeHint ::= ArrayTypeHint | FullyQualifiedClassIdentifier ArrayTypeHint ::= "array" Now it is easy to continue the work and add missing rules. =) Cheers, On Sat, Jan 1, 2011 at 12:46 PM, Rune Kaagaard <rumi...@gmail.com> wrote: >> There has never been a language grammar, so there's been nothing to refer to >> at all. As for why no one's made one more recently, for fun I snagged the .l >> and .y files from trunk and W3C's version of EBNF from XML. In two hours of >> hacking away, I managed to come up with this sort-of beginning to a grammar, >> which I'm certain contains several errors, and only hints at a syntax: > > I wanted to take your EBNF for a spin so I converted it to a format > that the python module "simpleparse" could read. I ironed out a couple > of kinks and fixed a bug. You can see it here: > > http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/php.ebnf > > Then I created a prettyprinter to output the parsetree of some very > simple PHP code. See it here: > > http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/parse_example.py > > and the output is here: > > http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/parse_example.output > >> Considering what it takes JUST to define namespaces, halt_compiler, basic >> blocks, and the idea of a conditional statement... well, suffice to say the >> "expr" production alone would be triple the size of this. It doesn't help >> that there's no way I'm immediately aware of to check whether a grammar like >> this is accurate. > > Thanks a lot for the example, that does not look so bad :) PHP syntax > is not simple so of course the EBNF will not be either. But still any > EBNF would be a lot better than none! > > Testability is a real issue and makes for a nice catch-22. A > hypothetical roadmap could _maybe_ look like this: > > 1) Create the EBNF and reference implementation while comparing it to > a stable release. > 2) Rewrite the Zend implementation to read from the EBNF. > 3) Repeat for all current releases. > > It's tough to try to guess about things you don't really understand. > Looks like major work though! > >> Nonetheless, it's a significant undertaking to deal with the complexity of >> the language. There are dozens of tiny little edge cases in PHP's parsing >> that require bunches of extra parser rules. An example from above is the >> difference between using "statement" and "inner-statement" for the two >> different forms of "if". Because "statement" includes basic blocks and >> labels, the rule disallows writing "if: { xyz; } endif;", since apparently >> Zend doesn't support arbitrary basic blocks. All those cases wreak havoc on >> the grammar. In its present form, it will never reduce down to something >> nearly as small as Python's. > > Just to have a solid, complete maintained EBNF would be a _major_ leap > forward! > > Thanks for your cool reply! > > Cheers > Rune > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > -- Guilherme Blanco Mobile: +55 (16) 9215-8480 MSN: guilhermebla...@hotmail.com São Paulo - SP/Brazil -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php