Hi all,
PHP grammar is far from being complex. It is possible to describe most
of the syntax with a simple explanation.
Example:
* We can separate a program into several statements.
* There're a couple of items that cannot be declared into different
places (namespace, use), so consider them as top-statements.
* Also, Namespace declaration may contain multiple statements if you
define them under brackets.
* UseStatement can only be used inside a namespace or inside global scope.
* Finally, we support Classes.
Now we can describe a good portion of PHP grammar:
/* Terminals */
identifier
char
string
integer
float
boolean
/* Grammar Rules */
Literal ::= string | char | integer | float | boolean
Qualifier ::= ("private" | "public" | "protected") ["static"]
/* Identifiers */
NamespaceIdentifier ::= identifier {"\" identifier}
ClassIdentifier ::= identifier
MethodIdentifier ::= identifier
FullyQualifiedClassIdentifier ::= [NamespaceIdentifier] ClassIdentifier
/* Root grammar */
Program ::= {TopStatement} {Statement}
TopStatement ::= NamespaceDeclaration | UseStatement | CommentStatement
Statement ::= ClassDeclaration | FunctionDeclaration | ...
/* Namespace Declaration */
NamespaceDeclaration ::= InlineNamespaceDeclaration | ScopeNamespaceDeclaration
InlineNamespaceDeclaration ::= SimpleNamespaceDeclaration ";"
{UseDeclaration} {Statement}
ScopeNamespaceDeclaration ::= SimpleNamespaceDeclaration "{"
{UseDeclaration} {Statement} "}"
SimpleNamespaceDeclaration ::= "namespace" NamespaceIdentifier
/* Use Statement */
UseStatement ::= "use" SimpleUseStatement {"," SimpleUseStatement} ";"
SimpleUseStatement ::= SimpleNamespaceUseStatement | SimpleClassUseStatement
SimpleNamespaceUseStatement ::= NamespaceIdentifier ["as" NamespaceIdentifier]
SimpleClassUseStatement ::= FullyQualifiedClassIdentifier ["as" ClassIdentifier]
/* Comment Declaration */
CommentStatement ::= InlineCommentStatement | MultilineCommentStatement
InlineCommentStatement ::= ("//" | "#") string
MultilineCommentStatement ::= SimpleMultilineCommentStatement |
DocBlockStatement
SimpleMultilineCommentStatement ::= "/*" {"*" string} "*/"
DocBlockStatement ::= "/**" {"*" string} "*/"
/* Class Declaration */
ClassDeclaration ::= SimpleClassDeclaration "{" {ClassMemberDeclaration} "}"
SimpleClassDeclaration ::= [abstract] "class" ClassIdentifier
["extends" FullyQualifiedClassIdentifier] ["implements"
FullyQualifiedClassIdentifier {"," FullyQualifiedClassIdentifier}]
ClassMemberDeclaration ::= ConstDeclaration | PropertyDeclaration |
MethodDeclaration
ConstDeclaration ::= [DocBlockStatement] "const" identifier "=" Literal ";"
PropertyDeclaration ::= [DocBlockStatement] Qualifier Variable ["=" Literal] ";"
MethodDeclaration ::= [DocBlockStatement] (PrototypeMethodDeclaration
| ComplexMethodDeclaration)
PrototypeMethodDeclaration ::= "abstract" Qualifier "function"
MethodIdentifier "(" {ArgumentDeclaration} ");"
ComplexMethodDeclaration ::= ["final"] Qualifier "function"
MethodIdentifier "(" {ArgumentDeclaration} ")" "{" {Statement} "}"
ArgumentDeclaration ::= SimpleArgumentDeclatation {","
SimpleArgumentDeclaration}
SimpleArgumentDeclaration ::= [TypeHint] Variable ["=" Literal]
TypeHint ::= ArrayTypeHint | FullyQualifiedClassIdentifier
ArrayTypeHint ::= "array"
Now it is easy to continue the work and add missing rules. =)
Cheers,
On Sat, Jan 1, 2011 at 12:46 PM, Rune Kaagaard <[email protected]> wrote:
>> There has never been a language grammar, so there's been nothing to refer to
>> at all. As for why no one's made one more recently, for fun I snagged the .l
>> and .y files from trunk and W3C's version of EBNF from XML. In two hours of
>> hacking away, I managed to come up with this sort-of beginning to a grammar,
>> which I'm certain contains several errors, and only hints at a syntax:
>
> I wanted to take your EBNF for a spin so I converted it to a format
> that the python module "simpleparse" could read. I ironed out a couple
> of kinks and fixed a bug. You can see it here:
>
> http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/php.ebnf
>
> Then I created a prettyprinter to output the parsetree of some very
> simple PHP code. See it here:
>
> http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/parse_example.py
>
> and the output is here:
>
> http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/parse_example.output
>
>> Considering what it takes JUST to define namespaces, halt_compiler, basic
>> blocks, and the idea of a conditional statement... well, suffice to say the
>> "expr" production alone would be triple the size of this. It doesn't help
>> that there's no way I'm immediately aware of to check whether a grammar like
>> this is accurate.
>
> Thanks a lot for the example, that does not look so bad :) PHP syntax
> is not simple so of course the EBNF will not be either. But still any
> EBNF would be a lot better than none!
>
> Testability is a real issue and makes for a nice catch-22. A
> hypothetical roadmap could _maybe_ look like this:
>
> 1) Create the EBNF and reference implementation while comparing it to
> a stable release.
> 2) Rewrite the Zend implementation to read from the EBNF.
> 3) Repeat for all current releases.
>
> It's tough to try to guess about things you don't really understand.
> Looks like major work though!
>
>> Nonetheless, it's a significant undertaking to deal with the complexity of
>> the language. There are dozens of tiny little edge cases in PHP's parsing
>> that require bunches of extra parser rules. An example from above is the
>> difference between using "statement" and "inner-statement" for the two
>> different forms of "if". Because "statement" includes basic blocks and
>> labels, the rule disallows writing "if: { xyz; } endif;", since apparently
>> Zend doesn't support arbitrary basic blocks. All those cases wreak havoc on
>> the grammar. In its present form, it will never reduce down to something
>> nearly as small as Python's.
>
> Just to have a solid, complete maintained EBNF would be a _major_ leap
> forward!
>
> Thanks for your cool reply!
>
> Cheers
> Rune
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
--
Guilherme Blanco
Mobile: +55 (16) 9215-8480
MSN: [email protected]
São Paulo - SP/Brazil
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php