On Dec 31, 2010, at 6:54 AM, Enrico Weigelt wrote: >> After enviously looking at pythons grammar >> (http://docs.python.org/dev/reference/grammar.html) I keep feeling >> that PHP is missing out on a lot of interesting meta projects by not >> having an official EBNF. > ACK. PHP also misses a lot of other fundamental specifications > (at least I'm not aware of them). That's probably one of reasons > for the many problems experienced from user and enterprise operator > side: sudden semantic changes. >> Building your own PHP parser is _very_ hard and is PhD (Paul Biggar:) >> level stuff if you wan't to get all the edge cases right. Having _the_ >> official EBNF would make this easier. > Hmm, perhaps it really would make a good PhD project to actually > create a clear specification, a full language report (at least for > the language itself and the core library) and write an tiny reference > implementation. Once that specification is finished, it should become > the official one where official PHP is tested against.
If anyone's curious why this hasn't been done... There has never been a language grammar, so there's been nothing to refer to at all. As for why no one's made one more recently, for fun I snagged the .l and .y files from trunk and W3C's version of EBNF from XML. In two hours of hacking away, I managed to come up with this sort-of beginning to a grammar, which I'm certain contains several errors, and only hints at a syntax: /* http://www.w3.org/TR/REC-xml/#sec-notation */ ws ::= [ \n\r\t]+ string ::= [a-zA-Z_#x7f-#xff] [a-zA-Z0-9_#x7f-#xff]* namespace-name ::= '\\'? string ( '\\' string )* use-declaration ::= 'use' ws+ namespace-name ( ws+ 'as' ws+ string )? ( ws* ',' ws* namespace-name ( ws+ 'as' ws+ string )? )+ ws* ';' constant-declaration ::= 'const' ws+ string ws* '=' ws* static-scalar ( ws* ',' ws* string ws* '=' ws* static-scalar )* ws* ';' inner-statement ::= statement | function-declaration-statement | class-declaration-statement statement ::= unticked-statement | string ':' unticked-statement ::= '{' ws* inner-statement* ws* '}' | 'if' ws* '(' ws* expr ws* ')' ws* statement ws* elseif* ws* else-single? | 'if' ws* '(' ws* expr ws* ')' ws* ':' inner-statement* elseif-2* ws* else-single-2? halt-compiler ::= '__halt_compiler' ws* '(' ws* ')' ws* ';' top-statement ::= inner-statement | halt-compiler | 'namespace' ws+ namespace-name ws* ';' | 'namespace' ( ws+ namespace-name )? ws* '{' ws* top-statement-list ws* '}' | use-declaration | constant-declaration script ::= top-statement* Considering what it takes JUST to define namespaces, halt_compiler, basic blocks, and the idea of a conditional statement... well, suffice to say the "expr" production alone would be triple the size of this. It doesn't help that there's no way I'm immediately aware of to check whether a grammar like this is accurate. Obviously there's room for optimization. An EBNF doesn't have to jump through some of the hoops that a re2c parser backed by a flex lexer does; it could be simplified once all the parser rules were considered. Or it could be written without referring to the parser at all. Whether that would result in a better or worse grammar, I don't know. Nonetheless, it's a significant undertaking to deal with the complexity of the language. There are dozens of tiny little edge cases in PHP's parsing that require bunches of extra parser rules. An example from above is the difference between using "statement" and "inner-statement" for the two different forms of "if". Because "statement" includes basic blocks and labels, the rule disallows writing "if: { xyz; } endif;", since apparently Zend doesn't support arbitrary basic blocks. All those cases wreak havoc on the grammar. In its present form, it will never reduce down to something nearly as small as Python's. -- Gwynne -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php