Re: Parsing a language with optional spaces

2020-07-07 Thread John P. Hartmann

On 7/7/20 05:35, Akim Demaille wrote:

I believe you need to read again the documentation of /

'r/s'


It is not as simple as that.  As I don't speak BASIC, let me rephrase 
this problem in FORTRAN IV which is also "blank agnostic":


DO   =  ,  [, ]

It is not until you reach the comma after the first expression that you 
know whether the statement is the beginning of a loop or it is an 
assignment.  And the expression can contain commas in function calls, 
which defeats any trivial lookahead scanning.  E.g.,


D O 17 6PQ R=FUN X(1 4, V 8)

is an assignment to variable DO176PQR.  The function arguments can also 
be expressions that contain function calls.


As you can see, this more or less defeats any attempt to write a lex 
scanner.  And you cannot just squeeze out all blanks in a front end 
because "Hollerith fields" can contain blanks that are significant (must 
remain).




Re: Parsing a language with optional spaces

2020-07-07 Thread uxio prego
Hi,

> On 7 Jul 2020, at 10:55, John P. Hartmann  wrote:
> 
> On 7/7/20 05:35, Akim Demaille wrote:
>> I believe you need to read again the documentation of /
>> 'r/s'
> 
> It is not as simple as that.  As I don't speak BASIC, let me rephrase this 
> problem in FORTRAN IV which is also "blank agnostic":
> 
> DO   =  ,  [, ]
> 
> It is not until you reach the comma after the first expression that you know 
> whether the statement is the beginning of a loop or it is an assignment.  And 
> the expression can contain commas in function calls, which defeats any 
> trivial lookahead scanning.  E.g.,
> 
> D O 17 6PQ R=FUN X(1 4, V 8)
> 
> is an assignment to variable DO176PQR.  The function arguments can also be 
> expressions that contain function calls.
> 
> As you can see, this more or less defeats any attempt to write a lex scanner. 
>  And you cannot just squeeze out all blanks in a front end because "Hollerith 
> fields" can contain blanks that are significant (must remain).

Then you couple the squeeze out all blanks approach with BEGIN/END %x regions?

https://lists.gnu.org/archive/html/help-bison/2020-07/msg00012.html

FBCC uses regions, sorry can't find proper documentation but 
https://bellard.org//fbcc/

I'd rather shoot myself on a foot than use regions. IDK if that closes a loop 
on the elegance question. But the tool's been there since forever.




Re: Parsing a language with optional spaces

2020-07-07 Thread Akim Demaille
Hi John,

> Le 7 juil. 2020 à 10:55, John P. Hartmann  a écrit :
> 
> On 7/7/20 05:35, Akim Demaille wrote:
>> I believe you need to read again the documentation of /
>> 'r/s'
> 
> It is not as simple as that.

Actually the message you are quoting was really just an answer to Maury,
for BASIC.

> As I don't speak BASIC, let me rephrase this problem in FORTRAN IV which is 
> also "blank agnostic":
> 
> DO   =  ,  [, ]
> 
> It is not until you reach the comma after the first expression that you know 
> whether the statement is the beginning of a loop or it is an assignment.  And 
> the expression can contain commas in function calls, which defeats any 
> trivial lookahead scanning.  E.g.,
> 
> D O 17 6PQ R=FUN X(1 4, V 8)
> 
> is an assignment to variable DO176PQR.  The function arguments can also be 
> expressions that contain function calls.
> 
> As you can see, this more or less defeats any attempt to write a lex scanner. 
>  And you cannot just squeeze out all blanks in a front end because "Hollerith 
> fields" can contain blanks that are significant (must remain).

I still think you can address this case with Flex, but I agree it's
going to be painful.  I would go for something like

sp   [ \t]*
do   D{sp}O

id   [a-zA-Z]({sp}[a-zA-Z_0-9]+)*

etc.

This is tedious.  In Vcsn I had implemented the "shuffle" operator
which would have been helpful
(https://www.lrde.epita.fr/dload/vcsn/latest/notebooks/expression.shuffle.html).
"Shuffle" is definitely a valid operator: the shuffling of rational
languages is a rational language, so it is mathematically sound.

Cheers!