Forcing multiple parse stacks to 'reduce'

2005-02-28 Thread Derek M Jones
All,

I have written a parser for C that processes
a single statement or declaration at a time.
So after each statement/declaration yyparse
returns.

Originally I used various ad-hoc rules in yylex
to figure out which was the last token of a
statement/declaration and then returned a zero
value as the subsequent token.

These ad-hoc rules are complicated (for instance,
a ; can occur in a for loop header and a { can
occur in a cast) and I had the 'great' idea of allowing
the token following the last token of a statement/declaration
to 'naturally' cause yyparse to generate the outstanding
reductions.  So in:

w=3;
y=5;

when yylex returns id (because it has encountered y)
the sequence w=3; reduces to statement and yyparse
returns (yes, I need to hang onto y in order to correctly
parse the next statement).

In some cases multiple parse stacks are outstanding,
and in most cases the following token causes the expected
return to deterministic parsing and subsequent reduction to
the accept state.  But in some cases it does not.  For instance,
in

typedef x y;

by the time the ; is encountered there are three parse stacks.
Depending on the token that follows the ; these three parse stacks
may or may not be reduced to a single stack (which then reduces
to a declaration).

In the case:

typedef x, y;
typedef i, j;

the second typedef token is shifted onto all three stacks and
subsequent tokens are processed like a declaration (which they
do form part of)!  So I don't get a parse of a single declaration
(in fact yyparse eventually reports an ambiguity).

I know that for some grammars the optimizations performed by
Bison result in ambiguities being reported (i.e., the BOGUS
example in 5.7 of bison.info).

In my case no ambiguity is being reported (I am in glr mode
after all ;-).  But the parse stacks are not being reduced as
expected (all three are in the state I would expect them to be
in when the second typedef token is encountered).

Does anybody have any ideas on how I can force yyparse
to consider reductions in the number of parse stacks and
thence reductions to the accept state?

derek

--
Derek M Jones   tel: +44 (0) 1252 520 
667
Knowledge Software Ltdmailto:[EMAIL PROTECTED]
Applications Standards Conformance Testing   http://www.knosof.co.uk




___
Help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison


Re: Forcing multiple parse stacks to 'reduce'

2005-02-28 Thread Hans Aberg
At 17:03 + 2005/02/28, Derek M Jones wrote:
>I have written a parser for C that processes
>a single statement or declaration at a time.
>So after each statement/declaration yyparse
>returns.

Bison is clearly not built to handle such applications. The normal thing is
to handle the whole language in one go. (How do you handle environments,
like "{...}" statement-by-statement?)

>Originally I used various ad-hoc rules in yylex
>to figure out which was the last token of a
>statement/declaration and then returned a zero
>value as the subsequent token.
>
>These ad-hoc rules are complicated (for instance,
>a ; can occur in a for loop header and a { can
>occur in a cast) and I had the 'great' idea of allowing
>the token following the last token of a statement/declaration
>to 'naturally' cause yyparse to generate the outstanding
>reductions.  So in:
>
>w=3;
>y=5;
>
>when yylex returns id (because it has encountered y)
>the sequence w=3; reduces to statement and yyparse
>returns (yes, I need to hang onto y in order to correctly
>parse the next statement).
>
>In some cases multiple parse stacks are outstanding,
>and in most cases the following token causes the expected
>return to deterministic parsing and subsequent reduction to
>the accept state.  But in some cases it does not.  For instance,
>in
>
>typedef x y;
>
>by the time the ; is encountered there are three parse stacks.
>Depending on the token that follows the ; these three parse stacks
>may or may not be reduced to a single stack (which then reduces
>to a declaration).
>
>In the case:
>
>typedef x, y;
>typedef i, j;
>
>the second typedef token is shifted onto all three stacks and
>subsequent tokens are processed like a declaration (which they
>do form part of)!  So I don't get a parse of a single declaration
>(in fact yyparse eventually reports an ambiguity).
>
>I know that for some grammars the optimizations performed by
>Bison result in ambiguities being reported (i.e., the BOGUS
>example in 5.7 of bison.info).
>
>In my case no ambiguity is being reported (I am in glr mode
>after all ;-).  But the parse stacks are not being reduced as
>expected (all three are in the state I would expect them to be
>in when the second typedef token is encountered).
>
>Does anybody have any ideas on how I can force yyparse
>to consider reductions in the number of parse stacks and
>thence reductions to the accept state?

In general, there is none, it seems. The idea of the GLR is that
reduce/reduce conflicts may split the deterministic parser, until the
ambiguity is resolved later in the parsing. The manual mentions the use of
%dprec to choose between different actions, but that seems to be it. (I am
not using %glr myself, so I am guessing here.)

You might try to rewrite your input grammar so that when the ";" of a
typedef is encountered, it reduces.

Alternatively, you might try an entirely new approach to your whole program
setup. You have not told us why you need this feature, to halt after each
statement. Is it imperative that the parser halts right after the statement
is finished? If not, assuming that the you have written a correct parser,
you might attempt to halt it in the actions. The feature is similar to that
of so called coroutines (part of Simula, for example). Coroutines can be
implemented using threads. So you might use threads to achieve the same
effect: Put the parser in a thread. Activate it. When an action where you
want the parser to halt is executed, as a part of the action, halt the
parser thread. Then do the other stuff you want to do. When parsing needs to
be resumed, reactivate the parser thread.

  Hans Aberg




___
Help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison


Re: Forcing multiple parse stacks to 'reduce'

2005-02-28 Thread Derek M Jones
Hans,

>>I have written a parser for C that processes
>>a single statement or declaration at a time.
>>So after each statement/declaration yyparse
>>returns.
>
>Bison is clearly not built to handle such applications. The normal thing is
>to handle the whole language in one go. (How do you handle environments,
>like "{...}" statement-by-statement?)

I don't see why not.  Bison is built to handle grammars.
My grammar does not support sequences of  statements
or declarations.  The whole language, in my case, consists of
one statement or one declaration (which includes the
header of a function definition).  Characters such as { }
are not part of the grammar (they are handled by a higher
level routine).

My aim is to parse the visible source (i.e., preprocessing
directives are ignored).  Error recovery is better if
statements/declarations are parsed individually.  'Errors'
might be caused by conditional preprocessor directives
that select how a statement, for instance, should be processed.
Single statement/declaration at a time localises syntax errors.

>>the accept state.  But in some cases it does not.  For instance,
>>in
>>
>>typedef x y;
>>
>>by the time the ; is encountered there are three parse stacks.
>>Depending on the token that follows the ; these three parse stacks
>>may or may not be reduced to a single stack (which then reduces
>>to a declaration).
>>
>>In the case:
>>
>>typedef x, y;
>>typedef i, j;
>>
>>the second typedef token is shifted onto all three stacks and
>>subsequent tokens are processed like a declaration (which they
>>do form part of)!  So I don't get a parse of a single declaration
>>(in fact yyparse eventually reports an ambiguity).
>>
>>I know that for some grammars the optimizations performed by
>>Bison result in ambiguities being reported (i.e., the BOGUS
>>example in 5.7 of bison.info).
>>
>>In my case no ambiguity is being reported (I am in glr mode
>>after all ;-).  But the parse stacks are not being reduced as
>>expected (all three are in the state I would expect them to be
>>in when the second typedef token is encountered).
>>
>>Does anybody have any ideas on how I can force yyparse
>>to consider reductions in the number of parse stacks and
>>thence reductions to the accept state?
>
>In general, there is none, it seems. The idea of the GLR is that
>reduce/reduce conflicts may split the deterministic parser, until the
>ambiguity is resolved later in the parsing. The manual mentions the use of
>%dprec to choose between different actions, but that seems to be it. (I am
>not using %glr myself, so I am guessing here.)

%dprec is designed to resolve ambiguities between multiple parse
stacks.  My multiple parse stacks are in agreement with each other.  

>You might try to rewrite your input grammar so that when the ";" of a
>typedef is encountered, it reduces.

My question is how do I do that?  My grammar is copied
straight from annex A of the C Standard (plus six or so %dprecs
and two changes from right recursion to left recursion).

I am guessing that some form of lazy evaluation is being used
internally by Bison.  After all, in most cases an end-of-file
indicator is used to force reductions.

>Alternatively, you might try an entirely new approach to your whole program
>setup. You have not told us why you need this feature,

See above.


derek

--
Derek M Jones   tel: +44 (0) 1252 520 
667
Knowledge Software Ltdmailto:[EMAIL PROTECTED]
Applications Standards Conformance Testing   http://www.knosof.co.uk




___
Help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison


Identifying rule responsible for lookahead

2005-02-28 Thread Soumitra Kumar
Following is a sample grammar. There is one r/r
conflict.

% cat test.y
%token YYID YYDOT
%%
identifier : hier_id
;
hier_id : simple_id
| hier_id opt_select YYDOT simple_id
;
opt_select :
| opt_select '[' expr ']'
;

simple_id : YYID ;
expr : hier_id
| function_call
;
function_call : expr YYDOT YYID
;
% bison -r all -v test.y -g
test.y: conflicts: 1 reduce/reduce

>From test.output:
state 10

3 hier_id: hier_id . opt_select YYDOT simple_id
4 opt_select: .  [YYDOT, '[']
5   | . opt_select '[' expr ']'
7 expr: hier_id .  [YYDOT, ']']

YYDOT reduce using rule 4 (opt_select)
YYDOT [reduce using rule 7 (expr)]

YYDOT is in the lookahead set of rule 7 (expr:
hier_id) because of rule 9 (function_call: expr YYDOT
YYID).

If nonterminal expr is used is many rules, it gets
difficult to find out which usage is causing problem.

Basically I want to annotate the rules responsible for
a lookahead token as follows.

state 10

3 hier_id: hier_id . opt_select YYDOT simple_id
4 opt_select: .  [YYDOT (3), '[' (5)]
5   | . opt_select '[' expr ']'
7 expr: hier_id .  [YYDOT (9), ']' (5)]

YYDOT reduce using rule 4 (opt_select)
YYDOT [reduce using rule 7 (expr)]

I was wondering if there is a way to get this
information out of bison.

Please help me.
-Soumitra.




__ 
Do you Yahoo!? 
Yahoo! Mail - Helps protect you from nasty viruses. 
http://promotions.yahoo.com/new_mail


___
Help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison


Re: Identifying rule responsible for lookahead

2005-02-28 Thread Henrik Sorensen
Your grammar is ambigious.
It can be seen if you make the following transformation, factoring out the 
null transition of opt_select, and you will see the shift/reduce conflict:
%token YYID YYDOT
%%
identifier : hier_id;
hier_id : simple_id
| hier_id  YYDOT simple_id
| hier_id opt_select YYDOT simple_id
;
opt_select : '[' expr ']'
| opt_select '[' expr ']';
simple_id : YYID ;
expr : hier_id
| function_call;
function_call : expr YYDOT YYID
;


now you see the function_call is the same as the added hier_id YYDOT simple_id

hope this helps

Henrik


On Monday 28 February 2005 21.46, Soumitra Kumar wrote:
> %token YYID YYDOT
> %%
> identifier : hier_id
>         ;
> hier_id : simple_id
>         | hier_id opt_select YYDOT simple_id
>         ;
> opt_select :
>         | opt_select '[' expr ']'
>         ;
>
> simple_id : YYID ;
> expr : hier_id
>         | function_call
>         ;
> function_call : expr YYDOT YYID
>         ;


___
Help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison


Re: Forcing multiple parse stacks to 'reduce'

2005-02-28 Thread Frank Heckenbach
Derek M Jones wrote:

> >>I have written a parser for C that processes
> >>a single statement or declaration at a time.
> >>So after each statement/declaration yyparse
> >>returns.
> >
> >Bison is clearly not built to handle such applications. The normal thing is
> >to handle the whole language in one go. (How do you handle environments,
> >like "{...}" statement-by-statement?)
> 
> I don't see why not.  Bison is built to handle grammars.

Because bison expects the lexer to return a EOF (or perhaps more
precisely, end-of-grammar) token to signal the end of input (of one
"grammar", i.e. one statement in your case). So you'd have to supply
this token, which is exactly the problem you're having ...

Frank

-- 
Frank Heckenbach, [EMAIL PROTECTED]
http://fjf.gnu.de/
GnuPG and PGP keys: http://fjf.gnu.de/plan (7977168E)


___
Help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison