On Fri, Jun 02, 2006 at 02:17:25PM +0800, Shu-chun Weng wrote:
>  1. Spaces at beginning and end of rule blocks should be ignored
>     since space before and after current rule are most likely be
>     defined in rules using current one.
>  1a. I'm not sure if it's "clear" to define as this, but the spaces
>      around the rule-level alternative could also be ignored.  

At one point I had been exploring along similar lines, but at the
moment I'd say we don't want to do this.  See below for an example...

>      For instance, look at the rule FunctionAppExpr defined in
>      MiniPerl6 grammar file.
> 
>        rule FunctionAppExpr
> {<Variable>|<Constants>|<ArrayRef>|<FunctionName>[<?ws>?<'('><?ws>?<Parameters><')'>]?}

FWIW, I'd go ahead and write this as a token statement instead of
a rule:

    token FunctionAppExpr {
        | <Variable>
        | <Constants>
        | <ArrayRef>
        | <FunctionName> [ <?ws> \( <?ws> <Parameters> \) ]?
    }

In fact, now that I've written the above I'm more inclined to say 
it's not a good idea to ignore some whitespace in rule definitions
but not others.  Consider:

    rule FunctionAppExpr {
        | <Variable>
        | <Constants>
        | <ArrayRef>
        | <FunctionName>[ \( <Parameters> \) ]?
    }

Can we quickly determine where the <?ws> are being generated? 
What if the [...] portion had an alternation in it?

(And, if we ignore leading/trailing whitespace in rule blocks, do 
we also ignore leading/trailing whitespace in subpatterns?)

In a couple of grammars I've developed already (especially the
one used for pgc.pir), having whitespace at the beginning of rules
and around alternations become <?ws> is useful and important.
In these cases, ignoring such whitespace would mean adding explicit
<?ws> in the rule to get things to work.  At that point it feels like
waterbed theory -- by "improving" things for the FunctionAppExpr
rule above we're pushing the complexity somewhere else.

In general I'd say that in a production such as FunctionAppExpr
where there are just a few places that need <?ws>, then it's
better to use 'token' and explicitly indicate the allowed
whitespace.

(Side observation: in  ...|<FunctionName>[<?ws>?<'('><?ws>?<Parameters><')'>]?}
above, there's no whitespace between <Parameters> and the closing paren.
Why not?)

>  2. I am not sure the default rule of <ws>, I couldn't found it in
>     S05.  Currently the engine use :P5/\s+/ but I would like it to
>     be :P/\s*/ when it's before or after non-words and remains
>     the same (\s+) otherwise.

PGE does the "\s* when before or after non-words and \s+ otherwise"
explicitly in its <ws> rule, which is written in PIR.  (Being able
to write subrules procedurally is I<really> nice.)  

In P5 it'd probably be something like 

    (?:(?<!\w)|(?!\w))\s*|\s+

or maybe better is

    (?:(?<!\w)|(?!\w)|\s)\s*

Pm

Reply via email to