On Dec 31, 2010, at 6:54 AM, Enrico Weigelt wrote:
>> After enviously looking at pythons grammar
>> (http://docs.python.org/dev/reference/grammar.html) I keep feeling
>> that PHP is missing out on a lot of interesting meta projects by not
>> having an official EBNF.
> ACK. PHP also misses a lot of other fundamental specifications
> (at least I'm not aware of them). That's probably one of reasons
> for the many problems experienced from user and enterprise operator
> side: sudden semantic changes.
>> Building your own PHP parser is _very_ hard and is PhD (Paul Biggar:)
>> level stuff if you wan't to get all the edge cases right. Having _the_
>> official EBNF would make this easier.
> Hmm, perhaps it really would make a good PhD project to actually
> create a clear specification, a full language report (at least for
> the language itself and the core library) and write an tiny reference
> implementation. Once that specification is finished, it should become
> the official one where official PHP is tested against.


If anyone's curious why this hasn't been done...

There has never been a language grammar, so there's been nothing to refer to at 
all. As for why no one's made one more recently, for fun I snagged the .l and 
.y files from trunk and W3C's version of EBNF from XML. In two hours of hacking 
away, I managed to come up with this sort-of beginning to a grammar, which I'm 
certain contains several errors, and only hints at a syntax:

/* http://www.w3.org/TR/REC-xml/#sec-notation */

ws ::= [ \n\r\t]+
string ::= [a-zA-Z_#x7f-#xff] [a-zA-Z0-9_#x7f-#xff]*

namespace-name ::= '\\'? string ( '\\' string )*

use-declaration ::= 'use' ws+ namespace-name ( ws+ 'as' ws+ string )? ( ws* ',' 
ws* namespace-name ( ws+ 'as' ws+ string )? )+ ws* ';'

constant-declaration ::= 'const' ws+ string ws* '=' ws* static-scalar ( ws* ',' 
ws* string ws* '=' ws* static-scalar )* ws* ';'

inner-statement ::= statement | function-declaration-statement | 
class-declaration-statement

statement ::= unticked-statement | string ':'

unticked-statement ::= '{' ws* inner-statement* ws* '}' |
                       'if' ws* '(' ws* expr ws* ')' ws* statement ws* elseif* 
ws* else-single? |
                       'if' ws* '(' ws* expr ws* ')' ws* ':' inner-statement* 
elseif-2* ws* else-single-2?

halt-compiler ::= '__halt_compiler' ws* '(' ws* ')' ws* ';'

top-statement ::= inner-statement |
                  halt-compiler |
                  'namespace' ws+ namespace-name ws* ';' |
                  'namespace' ( ws+ namespace-name )? ws* '{' ws* 
top-statement-list ws* '}' |
                  use-declaration |
                  constant-declaration

script ::= top-statement*

Considering what it takes JUST to define namespaces, halt_compiler, basic 
blocks, and the idea of a conditional statement... well, suffice to say the 
"expr" production alone would be triple the size of this. It doesn't help that 
there's no way I'm immediately aware of to check whether a grammar like this is 
accurate.

Obviously there's room for optimization. An EBNF doesn't have to jump through 
some of the hoops that a re2c parser backed by a flex lexer does; it could be 
simplified once all the parser rules were considered. Or it could be written 
without referring to the parser at all. Whether that would result in a better 
or worse grammar, I don't know.

Nonetheless, it's a significant undertaking to deal with the complexity of the 
language. There are dozens of tiny little edge cases in PHP's parsing that 
require bunches of extra parser rules. An example from above is the difference 
between using "statement" and "inner-statement" for the two different forms of 
"if". Because "statement" includes basic blocks and labels, the rule disallows 
writing "if: { xyz; } endif;", since apparently Zend doesn't support arbitrary 
basic blocks. All those cases wreak havoc on the grammar. In its present form, 
it will never reduce down to something nearly as small as Python's.

-- Gwynne


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to