Hi all,

PHP grammar is far from being complex. It is possible to describe most
of the syntax with a simple explanation.
Example:

* We can separate a program into several statements.
* There're a couple of items that cannot be declared into different
places (namespace, use), so consider them as top-statements.
* Also, Namespace declaration may contain multiple statements if you
define them under brackets.
* UseStatement can only be used inside a namespace or inside global scope.
* Finally, we support Classes.

Now we can describe a good portion of PHP grammar:

/* Terminals */
identifier
char
string
integer
float
boolean

/* Grammar Rules */
Literal ::= string | char | integer | float | boolean

Qualifier ::= ("private" | "public" | "protected") ["static"]

/* Identifiers */
NamespaceIdentifier ::= identifier {"\" identifier}
ClassIdentifier ::= identifier
MethodIdentifier ::= identifier
FullyQualifiedClassIdentifier ::= [NamespaceIdentifier] ClassIdentifier

/* Root grammar */
Program ::= {TopStatement} {Statement}

TopStatement ::= NamespaceDeclaration | UseStatement | CommentStatement
Statement ::= ClassDeclaration | FunctionDeclaration | ...

/* Namespace Declaration */
NamespaceDeclaration ::= InlineNamespaceDeclaration | ScopeNamespaceDeclaration
InlineNamespaceDeclaration ::= SimpleNamespaceDeclaration ";"
{UseDeclaration} {Statement}
ScopeNamespaceDeclaration ::= SimpleNamespaceDeclaration "{"
{UseDeclaration} {Statement} "}"
SimpleNamespaceDeclaration ::= "namespace" NamespaceIdentifier

/* Use Statement */
UseStatement ::= "use" SimpleUseStatement {"," SimpleUseStatement} ";"
SimpleUseStatement ::= SimpleNamespaceUseStatement | SimpleClassUseStatement
SimpleNamespaceUseStatement ::= NamespaceIdentifier ["as" NamespaceIdentifier]
SimpleClassUseStatement ::= FullyQualifiedClassIdentifier ["as" ClassIdentifier]

/* Comment Declaration */
CommentStatement ::= InlineCommentStatement | MultilineCommentStatement
InlineCommentStatement ::= ("//" | "#") string
MultilineCommentStatement ::= SimpleMultilineCommentStatement |
DocBlockStatement
SimpleMultilineCommentStatement ::= "/*" {"*" string} "*/"
DocBlockStatement ::= "/**" {"*" string} "*/"

/* Class Declaration */
ClassDeclaration ::= SimpleClassDeclaration "{" {ClassMemberDeclaration} "}"
SimpleClassDeclaration ::= [abstract] "class" ClassIdentifier
["extends" FullyQualifiedClassIdentifier] ["implements"
FullyQualifiedClassIdentifier {"," FullyQualifiedClassIdentifier}]

ClassMemberDeclaration ::= ConstDeclaration | PropertyDeclaration |
MethodDeclaration
ConstDeclaration ::= [DocBlockStatement] "const" identifier "=" Literal ";"
PropertyDeclaration ::= [DocBlockStatement] Qualifier Variable ["=" Literal] ";"
MethodDeclaration ::= [DocBlockStatement] (PrototypeMethodDeclaration
| ComplexMethodDeclaration)

PrototypeMethodDeclaration ::= "abstract" Qualifier "function"
MethodIdentifier "(" {ArgumentDeclaration} ");"
ComplexMethodDeclaration ::= ["final"] Qualifier "function"
MethodIdentifier "(" {ArgumentDeclaration} ")" "{" {Statement} "}"
ArgumentDeclaration ::= SimpleArgumentDeclatation {","
SimpleArgumentDeclaration}
SimpleArgumentDeclaration ::= [TypeHint] Variable ["=" Literal]
TypeHint ::= ArrayTypeHint | FullyQualifiedClassIdentifier
ArrayTypeHint ::= "array"


Now it is easy to continue the work and add missing rules. =)



Cheers,

On Sat, Jan 1, 2011 at 12:46 PM, Rune Kaagaard <rumi...@gmail.com> wrote:
>> There has never been a language grammar, so there's been nothing to refer to 
>> at all. As for why no one's made one more recently, for fun I snagged the .l 
>> and .y files from trunk and W3C's version of EBNF from XML. In two hours of 
>> hacking away, I managed to come up with this sort-of beginning to a grammar, 
>> which I'm certain contains several errors, and only hints at a syntax:
>
> I wanted to take your EBNF for a spin so I converted it to a format
> that the python module "simpleparse" could read. I ironed out a couple
> of kinks and fixed a bug. You can see it here:
>
> http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/php.ebnf
>
> Then I created a prettyprinter to output the parsetree of some very
> simple PHP code. See it here:
>
> http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/parse_example.py
>
> and the output is here:
>
> http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/parse_example.output
>
>> Considering what it takes JUST to define namespaces, halt_compiler, basic 
>> blocks, and the idea of a conditional statement... well, suffice to say the 
>> "expr" production alone would be triple the size of this. It doesn't help 
>> that there's no way I'm immediately aware of to check whether a grammar like 
>> this is accurate.
>
> Thanks a lot for the example, that does not look so bad :) PHP syntax
> is not simple so of course the EBNF will not be either. But still any
> EBNF would be a lot better than none!
>
> Testability is a real issue and makes for a nice catch-22. A
> hypothetical roadmap could _maybe_ look like this:
>
> 1) Create the EBNF and reference implementation while comparing it to
> a stable release.
> 2) Rewrite the Zend implementation to read from the EBNF.
> 3) Repeat for all current releases.
>
> It's tough to try to guess about things you don't really understand.
> Looks like major work though!
>
>> Nonetheless, it's a significant undertaking to deal with the complexity of 
>> the language. There are dozens of tiny little edge cases in PHP's parsing 
>> that require bunches of extra parser rules. An example from above is the 
>> difference between using "statement" and "inner-statement" for the two 
>> different forms of "if". Because "statement" includes basic blocks and 
>> labels, the rule disallows writing "if: { xyz; } endif;", since apparently 
>> Zend doesn't support arbitrary basic blocks. All those cases wreak havoc on 
>> the grammar. In its present form, it will never reduce down to something 
>> nearly as small as Python's.
>
> Just to have a solid, complete maintained EBNF would be a _major_ leap 
> forward!
>
> Thanks for your cool reply!
>
> Cheers
> Rune
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>



-- 
Guilherme Blanco
Mobile: +55 (16) 9215-8480
MSN: guilhermebla...@hotmail.com
São Paulo - SP/Brazil

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to