> There has never been a language grammar, so there's been nothing to refer to 
> at all. As for why no one's made one more recently, for fun I snagged the .l 
> and .y files from trunk and W3C's version of EBNF from XML. In two hours of 
> hacking away, I managed to come up with this sort-of beginning to a grammar, 
> which I'm certain contains several errors, and only hints at a syntax:

I wanted to take your EBNF for a spin so I converted it to a format
that the python module "simpleparse" could read. I ironed out a couple
of kinks and fixed a bug. You can see it here:

http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/php.ebnf

Then I created a prettyprinter to output the parsetree of some very
simple PHP code. See it here:

http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/parse_example.py

and the output is here:

http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/parse_example.output

> Considering what it takes JUST to define namespaces, halt_compiler, basic 
> blocks, and the idea of a conditional statement... well, suffice to say the 
> "expr" production alone would be triple the size of this. It doesn't help 
> that there's no way I'm immediately aware of to check whether a grammar like 
> this is accurate.

Thanks a lot for the example, that does not look so bad :) PHP syntax
is not simple so of course the EBNF will not be either. But still any
EBNF would be a lot better than none!

Testability is a real issue and makes for a nice catch-22. A
hypothetical roadmap could _maybe_ look like this:

1) Create the EBNF and reference implementation while comparing it to
a stable release.
2) Rewrite the Zend implementation to read from the EBNF.
3) Repeat for all current releases.

It's tough to try to guess about things you don't really understand.
Looks like major work though!

> Nonetheless, it's a significant undertaking to deal with the complexity of 
> the language. There are dozens of tiny little edge cases in PHP's parsing 
> that require bunches of extra parser rules. An example from above is the 
> difference between using "statement" and "inner-statement" for the two 
> different forms of "if". Because "statement" includes basic blocks and 
> labels, the rule disallows writing "if: { xyz; } endif;", since apparently 
> Zend doesn't support arbitrary basic blocks. All those cases wreak havoc on 
> the grammar. In its present form, it will never reduce down to something 
> nearly as small as Python's.

Just to have a solid, complete maintained EBNF would be a _major_ leap forward!

Thanks for your cool reply!

Cheers
Rune

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to