Re: Token value in custom error reporting

2020-06-18 Thread Daniele Nicolodi
On 18/06/2020 00:39, Akim Demaille wrote:
> 
> 
>> Le 18 juin 2020 à 07:49, Daniele Nicolodi  a écrit :
>>
>> Hi Akim,
>>
>> On 17/06/2020 23:43, Akim Demaille wrote:
>>> I think it's a mistake to try to use the semantic value in error messages.
>>
>> The goal would not be to use the semantic value in the error message,
>> but to use additional context attached to the token by the lexer to
>> decide how to report the error.
> 
> Would you have an example of what you mean?

Sure, but it is rather contrived :-)

I am working on a project that is based on Flex and Bison 3.4. The code
goes through some contortions so that the lexer can report errors to the
parser. I would like to make use of some of the features introduced in
Bison 3.6 to try to avoid the most ugly ones.

In the existing code, on error the lexer emits a LEX_ERROR token. This
results in a grammar error that triggers error recovery (good) but also
in an extra error emitted by Bison (bad). Right now the code checks the
error messages in yyerror() and suppresses the unwanted error reporting
if it contains the string "LEX_ERROR".

I thought I could made use of the newly introduced
yyreport_syntax_error() to simply check if the token that caused the
error is LEX_ERROR and don't emit an error then. I haven't coded this,
but I think this can be done.

However, it would be nice if I could avoif having error reporting code
in two places. I thought that one way to do it would be not to report
the error in the lexer but to attach the error information as semantic
value to the LEX_ERROR token, and do the error reporting from
yyreport_syntax_error().

Simply returning YYerror from the lexer on error would solve half of the
problem in a straight forward way. However, would like to avoid to have
to change the token stream emitted by the lexer as it can be seen as API
in this case. And it does not solve the (minor) issue of having error
reorting code in two places.

Did I say that the example is contrived?

Thank you.

Cheers,
Dan



Re: Redefining the literal string associated to the YYerror symbol

2020-06-18 Thread Daniele Nicolodi
On 18/06/2020 00:44, Akim Demaille wrote:
> There is no way to rename it, and it wouldn't make sense as the error
> token is never presented as an "expected token".  The error token never
> shows to the (end) user.  It appears in the debug traces, but that's
> for the developer.

What about YYEOF and YYUNDEF? Those appear in error messages.

I use a function like this to get token names to used to print a dump of
the token stream of a lexer:

const char* token_name(int token)
{
return yysymbol_name(YYTRANSLATE(token));
}

However I have a ton of tests that expect the lexer to emit a
"LEX_ERROR" token on error and I am considering to use YYerror special
token to report errors instead. Thus the question if I can rename
YYerror from "error" to "LEX_ERROR".

The fix is rather easy:

const char* token_name(int token)
{
if (token == YYerror)
return "LEX_ERROR";
return yysymbol_name(YYTRANSLATE(token));
}

I just anted to check if there is a built in mechanism to achieve the
same. However, to support renaming YYEOF and YYUNDEF (which appear in
ereor messages, if I am not mistaken) would probably be something good
to have.

Cheers,
Dan



Re: Token value in custom error reporting

2020-06-18 Thread Adrian Vogelsgesang
Hi Akim, hi Daniele,

actually, I would have a use case for accessing the semantic value for error 
message formatting, too...
Probably less contrived ;)

I am using bison for a SQL parser. One of the most common mistakes I see in 
queries written by beginners is that they get confused between string-literals 
(single-quotes) and identifiers (double-quotes). I.e. they write the query
> SELECT * FROM 'MyTable'
instead of
> SELECT * FROM "MyTable"

This will be tokenized as "keyword star keyword string-literal".
The parser expects " keyword star keyword identifier" though.

It would probably help, if the error message would include something like
> Hint: Did you mean to refer to table "MyTable" instead of the string-literal 
> 'MyTable'?
> If so, use double-quotes.

Now, I don't want to display this hint everytime when the user gave us a 
string-literal, but the grammar expected an identifier.
I would only want to display this hint if there actually exists a table by this 
name. And for that check, I would like to access the semantic value.

That being said, this is clearly a nice-to-have and by no means necessary.
Also, I guess I would run into different issues, first (e.g., when encountering 
an error I would like to check "given the current state, would an identifier be 
valid". I don't think bison currently offers the API to do so)

Cheers,
Adrian


On 18/06/2020, 10:24, "help-bison on behalf of Daniele Nicolodi" 
 wrote:

On 18/06/2020 00:39, Akim Demaille wrote:
> 
> 
>> Le 18 juin 2020 à 07:49, Daniele Nicolodi  a écrit :
>>
>> Hi Akim,
>>
>> On 17/06/2020 23:43, Akim Demaille wrote:
>>> I think it's a mistake to try to use the semantic value in error 
messages.
>>
>> The goal would not be to use the semantic value in the error message,
>> but to use additional context attached to the token by the lexer to
>> decide how to report the error.
> 
> Would you have an example of what you mean?

Sure, but it is rather contrived :-)

I am working on a project that is based on Flex and Bison 3.4. The code
goes through some contortions so that the lexer can report errors to the
parser. I would like to make use of some of the features introduced in
Bison 3.6 to try to avoid the most ugly ones.

In the existing code, on error the lexer emits a LEX_ERROR token. This
results in a grammar error that triggers error recovery (good) but also
in an extra error emitted by Bison (bad). Right now the code checks the
error messages in yyerror() and suppresses the unwanted error reporting
if it contains the string "LEX_ERROR".

I thought I could made use of the newly introduced
yyreport_syntax_error() to simply check if the token that caused the
error is LEX_ERROR and don't emit an error then. I haven't coded this,
but I think this can be done.

However, it would be nice if I could avoif having error reporting code
in two places. I thought that one way to do it would be not to report
the error in the lexer but to attach the error information as semantic
value to the LEX_ERROR token, and do the error reporting from
yyreport_syntax_error().

Simply returning YYerror from the lexer on error would solve half of the
problem in a straight forward way. However, would like to avoid to have
to change the token stream emitted by the lexer as it can be seen as API
in this case. And it does not solve the (minor) issue of having error
reorting code in two places.

Did I say that the example is contrived?

Thank you.

Cheers,
Dan





RE: Regarding building bison-3.4.2 on Windows

2020-06-18 Thread Singh, Binay
Thanks for your suggestion for newer version.

If I build it on POSIX / Linux system , will that be compatible to windows ?
My target is to build the source code provided  for Windows.
Do you provide installable for Windows ?

Regards,
Binay



-Original Message-
From: Akim Demaille 
Sent: Thursday, June 18, 2020 11:00 AM
To: Singh, Binay (SecDB) [Engineering] 
Cc: Bison Help 
Subject: Re: Regarding building bison-3.4.2 on Windows

Hi Binay,

> Le 17 juin 2020 à 18:26, Singh, Binay  a écrit :
>
>
> Hi Support team ,
>
>
> I am trying to build  GNU bison-3.4.2

The current version is 3.6.4, there's not much point in trying to build 3.4.2.

> on Windows / Visual studio 15 . I am not able to find any info in Readme or 
> Install files.
> Please let me know if there is any documentation .

The INSTALL file contains all the information to build Bison on POSIX systems.  
You'll have to build Bison from the POSIX subsystem.

Cheers!



Your Personal Data: We may collect and process information about you that may 
be subject to data protection laws. For more information about how we use and 
disclose your personal data, how we protect your information, our legal basis 
to use your information, your rights and who you can contact, please refer to: 
www.gs.com/privacy-notices


Re: Token value in custom error reporting

2020-06-18 Thread Hans Åberg


> On 18 Jun 2020, at 10:24, Daniele Nicolodi  wrote:
> 
> On 18/06/2020 00:39, Akim Demaille wrote:
>> 
>> Would you have an example of what you mean?
> …
> In the existing code, on error the lexer emits a LEX_ERROR token. This
> results in a grammar error that triggers error recovery (good) but also
> in an extra error emitted by Bison (bad). Right now the code checks the
> error messages in yyerror() and suppresses the unwanted error reporting
> if it contains the string "LEX_ERROR”.

In my C++ parser, the lexer has rule
.  { return my_parser::token::token_error; }

When it is triggers, I get the error:
  :21.1: error: syntax error, unexpected token error

It might be nicer to actually write out this token, though.





Re: Token value in custom error reporting

2020-06-18 Thread Akim Demaille
Daniele,

> Le 18 juin 2020 à 10:24, Daniele Nicolodi  a écrit :
> 
>> Would you have an example of what you mean?
> 
> Sure, but it is rather contrived :-)
> 
> I am working on a project that is based on Flex and Bison 3.4. The code
> goes through some contortions so that the lexer can report errors to the
> parser. I would like to make use of some of the features introduced in
> Bison 3.6 to try to avoid the most ugly ones.
> 
> In the existing code, on error the lexer emits a LEX_ERROR token. This
> results in a grammar error that triggers error recovery (good) but also
> in an extra error emitted by Bison (bad). Right now the code checks the
> error messages in yyerror() and suppresses the unwanted error reporting
> if it contains the string "LEX_ERROR".

That's the exact use case for returning YYerror from the scanner.

> Simply returning YYerror from the lexer on error would solve half of the
> problem in a straight forward way.

I don't see what is the other half.

> However, would like to avoid to have
> to change the token stream emitted by the lexer as it can be seen as API
> in this case. And it does not solve the (minor) issue of having error
> reporting code in two places.

In this case, it cannot be in a single place.  There are details in
the scanner that irrelevant to the parser.  It is really up to the
scanner to forge the error message.  It still can use yyerror to
emit it though.

> Did I say that the example is contrived?

:)

In your present case, the right answer is returning YYerror.  I don't see
a need for something else so far :(


Re: Token value in custom error reporting

2020-06-18 Thread Akim Demaille
Hi Adrian,

> Le 18 juin 2020 à 11:26, Adrian Vogelsgesang  a 
> écrit :
> 
> Hi Akim, hi Daniele,
> 
> It would probably help, if the error message would include something like
>> Hint: Did you mean to refer to table "MyTable" instead of the string-literal 
>> 'MyTable'?
>> If so, use double-quotes.

That does sound like a nice idea.

However, you might want to add a rule for this in your grammar: accept
identifiers where strings are expected, and generates the error message
there.  You could even register the error, and yet continue with the
identifier turned into a string.  Not just better diagnostics, but also
better error recovery.

It might be impractical if that generates tons of new conflicts, of course.

> That being said, this is clearly a nice-to-have and by no means necessary.
> Also, I guess I would run into different issues, first (e.g., when 
> encountering an error I would like to check "given the current state, would 
> an identifier be valid". I don't think bison currently offers the API to do 
> so)

Well, in a way it does: check if it's in the expected tokens.  Agreed,
it's more costly than it could be, but it's available.

I'm not opposed to extending the API, but it does frighten me:
API are really binding us, and make evolutions much more difficult.

Cheers!


Re: Token value in custom error reporting

2020-06-18 Thread Akim Demaille



> Le 18 juin 2020 à 14:54, Hans Åberg  a écrit :
> 
> In my C++ parser, the lexer has rule
> .  { return my_parser::token::token_error; }
> 
> When it is triggers, I get the error:
>  :21.1: error: syntax error, unexpected token error
> 
> It might be nicer to actually write out this token, though.

I have already explained what I don't think this is a good idea.

https://lists.gnu.org/r/help-bison/2020-06/msg00017.html

I also have explained that scanner errors should be handled
by the scanner.  For instance, in the bistro, you can read:

int
yylex (const char **line, YYSTYPE *yylval, YYLTYPE *yylloc)
{
  int c;

[...]

  switch (c)
{
[...]
  // Stray characters.
default:
  yyerror (yylloc, "syntax error: invalid character: %c", c);
  return TOK_YYerror;
}
}

Cheers!


Re: Regarding building bison-3.4.2 on Windows

2020-06-18 Thread Akim Demaille



> Le 18 juin 2020 à 13:35, Singh, Binay  a écrit :
> 
> Thanks for your suggestion for newer version.
> 
> If I build it on POSIX / Linux system , will that be compatible to windows ?

I think that's the whole point of the POSIX subsystem.  But I'm not a Windows 
user, I just don't know.

> My target is to build the source code provided  for Windows.

Bison is agnostic about the target platform.

> Do you provide installable for Windows ?

Sorry, nope.  Have a look at Chocolatey.

https://chocolatey.org/search?q=bison

It seems they could use some help to upgrade bison.




Re: Token value in custom error reporting

2020-06-18 Thread Adrian Vogelsgesang
Hi Akim,

> That being said, this is clearly a nice-to-have and by no means necessary.
> Also, I guess I would run into different issues, first (e.g., when 
encountering an error I would like to check "given the current state, would an 
identifier be valid". I don't think bison currently offers the API to do so) 
Well, in a way it does: check if it's in the expected tokens.  Agreed,
it's more costly than it could be, but it's available.

You are right. Bison already supports getting the expected tokens. It seems, I 
completely spaced out while typing the original message.

I'm not opposed to extending the API, but it does frighten me:
API are really binding us, and make evolutions much more difficult.

I completely agree, and I am not convinced myself that there are enough 
compelling use cases to add this new functionality to the API.
I just wanted to share my personal use case. Maybe others have additional use 
cases - if not, I guess I my use case alone isn't worth it. It's a nice-to-have 
anyway...

Cheers,
Adrian



Re: Token value in custom error reporting

2020-06-18 Thread Hans Åberg


> On 18 Jun 2020, at 18:56, Akim Demaille  wrote:
> 
>> Le 18 juin 2020 à 14:54, Hans Åberg  a écrit :
>> 
>> In my C++ parser, the lexer has rule
>> .  { return my_parser::token::token_error; }
>> 
>> When it is triggers, I get the error:
>> :21.1: error: syntax error, unexpected token error
>> 
>> It might be nicer to actually write out this token, though.
> 
> I have already explained what I don't think this is a good idea.
> 
> https://lists.gnu.org/r/help-bison/2020-06/msg00017.html
> 
> I also have explained that scanner errors should be handled
> by the scanner.  For instance, in the bistro, you can read:
> 
> int
> yylex (const char **line, YYSTYPE *yylval, YYLTYPE *yylloc)
> {
>  int c;
> 
> [...]
> 
>  switch (c)
>{
> [...]
>  // Stray characters.
>default:
>  yyerror (yylloc, "syntax error: invalid character: %c", c);
>  return TOK_YYerror;
>}
> }
> 
> Cheers!

Is that not the case, which I responded to, where you get double error 
messages, both from the lexer and parser?







Re: Redefining the literal string associated to the YYerror symbol

2020-06-18 Thread Akim Demaille
Hi Daniele,

> Le 18 juin 2020 à 10:35, Daniele Nicolodi  a écrit :
> 
> On 18/06/2020 00:44, Akim Demaille wrote:
>> There is no way to rename it, and it wouldn't make sense as the error
>> token is never presented as an "expected token".  The error token never
>> shows to the (end) user.  It appears in the debug traces, but that's
>> for the developer.
> 
> What about YYEOF and YYUNDEF? Those appear in error messages.

Yes, they do.  However YYUNDEF should never be returned from the
scanner.  The scanner should catch the error, report the error
and tell the parser to enter error recovery.  That's YYerror.

YYUNDEF exists because it is "ok" to return chars, say '+', etc.
So when the parser receives a char, say 'x', it has to check if it
has support for it, and if not it maps it to YYUNDEF internally,
so that "it knows how to deal with it".

But to my eyes, this is truly a programmer error.  The scanner
should never return "non existing tokens".  So I won't pay much
attention on how YYUNDEF looks like in the error messages–it should
never be there.

YYEOF is a different beast, and you can change it.
https://www.gnu.org/software/bison/manual/html_node/Token-I18n.html



> However I have a ton of tests that expect the lexer to emit a
> "LEX_ERROR" token on error and I am considering to use YYerror special
> token to report errors instead. Thus the question if I can rename
> YYerror from "error" to "LEX_ERROR".
> 
> The fix is rather easy:
> 
> const char* token_name(int token)
> {
>if (token == YYerror)
>return "LEX_ERROR";
>return yysymbol_name(YYTRANSLATE(token));
> }

It looks reasonable.


Cheers!


Re: Token value in custom error reporting

2020-06-18 Thread Akim Demaille



> Le 18 juin 2020 à 19:11, Hans Åberg  a écrit :
> 
> 
>> On 18 Jun 2020, at 18:56, Akim Demaille  wrote:
>> 
>>> Le 18 juin 2020 à 14:54, Hans Åberg  a écrit :
>>> 
>>> In my C++ parser, the lexer has rule
>>> .  { return my_parser::token::token_error; }
>>> 
>>> When it is triggers, I get the error:
>>> :21.1: error: syntax error, unexpected token error
>>> 
>>> It might be nicer to actually write out this token, though.
>> 
>> I have already explained what I don't think this is a good idea.
>> 
>> https://lists.gnu.org/r/help-bison/2020-06/msg00017.html
>> 
>> I also have explained that scanner errors should be handled
>> by the scanner.  For instance, in the bistro, you can read:
>> 
>> int
>> yylex (const char **line, YYSTYPE *yylval, YYLTYPE *yylloc)
>> {
>> int c;
>> 
>> [...]
>> 
>> switch (c)
>>   {
>> [...]
>> // Stray characters.
>>   default:
>> yyerror (yylloc, "syntax error: invalid character: %c", c);
>> return TOK_YYerror;
>>   }
>> }
>> 
>> Cheers!
> 
> Is that not the case, which I responded to, where you get double error 
> messages, both from the lexer and parser?

No, that's the whole point of YYerror.

In the news of 3.6:

*** Returning the error token

  When the scanner returns an invalid token or the undefined token
  (YYUNDEF), the parser generates an error message and enters error
  recovery.  Because of that error message, most scanners that find lexical
  errors generate an error message, and then ignore the invalid input
  without entering the error-recovery.

  The scanners may now return YYerror, the error token, to enter the
  error-recovery mode without triggering an additional error message.  See
  the bistromathic for an example.




Re: Token value in custom error reporting

2020-06-18 Thread Hans Åberg


> On 18 Jun 2020, at 19:21, Akim Demaille  wrote:
> 
>> Le 18 juin 2020 à 19:11, Hans Åberg  a écrit :
>> 
>>> On 18 Jun 2020, at 18:56, Akim Demaille  wrote:
>>> 
>>> I have already explained what I don't think this is a good idea.
>>> 
>>> https://lists.gnu.org/r/help-bison/2020-06/msg00017.html
>>> 
>>> I also have explained that scanner errors should be handled
>>> by the scanner.  For instance, in the bistro, you can read:
>>> 
>>> int
>>> yylex (const char **line, YYSTYPE *yylval, YYLTYPE *yylloc)
>>> {
>>> int c;
>>> 
>>> [...]
>>> 
>>> switch (c)
>>>  {
>>> [...]
>>>// Stray characters.
>>>  default:
>>>yyerror (yylloc, "syntax error: invalid character: %c", c);
>>>return TOK_YYerror;
>>>  }
>>> }
>>> 
>>> Cheers!
>> 
>> Is that not the case, which I responded to, where you get double error 
>> messages, both from the lexer and parser?
> 
> No, that's the whole point of YYerror.
> 
> In the news of 3.6:
> 
> *** Returning the error token
> 
>  When the scanner returns an invalid token or the undefined token
>  (YYUNDEF), the parser generates an error message and enters error
>  recovery.  Because of that error message, most scanners that find lexical
>  errors generate an error message, and then ignore the invalid input
>  without entering the error-recovery.
> 
>  The scanners may now return YYerror, the error token, to enter the
>  error-recovery mode without triggering an additional error message.  See
>  the bistromathic for an example.

Ah, I thought one should have something like that.

Otherwise, in your link above you suggest not using the semantic value in error 
messages, but when using locations, it contains the token delimitations. So 
there seems to be no advantage letting the lexer generating the error.





Re: Token value in custom error reporting

2020-06-18 Thread Akim Demaille



> Le 18 juin 2020 à 20:46, Hans Åberg  a écrit :
> 
> Otherwise, in your link above you suggest not using the semantic value in 
> error messages, but when using locations, it contains the token 
> delimitations. So there seems to be no advantage letting the lexer generating 
> the error.

It is still useful for the scanner to emit the error message, because the 
parser has no idea what is wrong.  Granted, it knows _where_ it's wrong, but 
not _why_:
- invalid character?
- not-closed string?
- invalid escape sequence?
- out-of-range literal number?
- etc.

The scanner faces the error, *it* should say what's wrong.


Re: Redefining the literal string associated to the YYerror symbol

2020-06-18 Thread Akim Demaille
Hi Daniele,

> Le 18 juin 2020 à 19:20, Akim Demaille  a écrit :
> 
>> However I have a ton of tests that expect the lexer to emit a
>> "LEX_ERROR" token on error and I am considering to use YYerror special
>> token to report errors instead. Thus the question if I can rename
>> YYerror from "error" to "LEX_ERROR".
>> 
>> The fix is rather easy:
>> 
>> const char* token_name(int token)
>> {
>>   if (token == YYerror)
>>   return "LEX_ERROR";
>>   return yysymbol_name(YYTRANSLATE(token));
>> }
> 
> It looks reasonable.

Actually, why do you check the token name rather than the token kind itself?


Re: Redefining the literal string associated to the YYerror symbol

2020-06-18 Thread Daniele Nicolodi
On 19/06/2020 00:13, Akim Demaille wrote:
> Hi Daniele,
> 
>> Le 18 juin 2020 à 19:20, Akim Demaille  a écrit :
>>
>>> However I have a ton of tests that expect the lexer to emit a
>>> "LEX_ERROR" token on error and I am considering to use YYerror special
>>> token to report errors instead. Thus the question if I can rename
>>> YYerror from "error" to "LEX_ERROR".
>>>
>>> The fix is rather easy:
>>>
>>> const char* token_name(int token)
>>> {
>>>   if (token == YYerror)
>>>   return "LEX_ERROR";
>>>   return yysymbol_name(YYTRANSLATE(token));
>>> }
>>
>> It looks reasonable.
> 
> Actually, why do you check the token name rather than the token kind itself?

Don't ask me, I didn't write the tests :)

The tests are written in Python (as most of the application). It may be
that when they were written it was deemed easier to emit the token names
from the lexer rather than export an enum mapping token names to token
kinds.

Cheers,
Dan