Re: bison for nlp

2018-11-08 Thread Hans Åberg

> On 7 Nov 2018, at 10:09, r0ller  wrote:
> 
> Numbering tokens was introduced in the very beginning and has been questioned 
> by myself quite a many times if it's still needed. I didn't give a hard try 
> to get rid of it mainly due to one reason: I want to have an error handling 
> that tells in case of an error which symbols could be accepted instead of the 
> erroneous one just as bison itself does it but in a structured way (as bison 
> returns that info in an error message string). Though, I could not come up 
> with any better idea when it comes to remapping a token to a symbol.

If the token numbers are replaced by strings "…", the Bison parser will print 
those, and they can also be used in the grammar. Would that suffice?



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: are there user defined infix operators?

2018-11-08 Thread Hans Åberg


> On 2 Nov 2018, at 17:53, Uxio Prego  wrote:
> 
> More specifically, I'm curious to know if Bison can modify precedences
> at parsing time according user sentences, now referring as user not the
> programmer who wrote the *.y doc but the programmer writing a program
> parsed by the parser generated from the *.y doc.

You can't but:

You can write general rules say for prefix, infix, and postfix operators, and 
then the actions put them onto to a stack with precedences and another for 
values. Then, when a new operator comes by, let the operators on the stack with 
higher precedences act on the value stack until something with a lower 
precedence appears, and put the new operator onto the stack. Continue until the 
end symbol comes by that has the lowest precedence. Operator associativity is 
handled by viewing left and right hand side precedences as different.



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: are there user defined infix operators?

2018-11-08 Thread Uxio Prego
Hey, thank you very much! I had lost all faith.

> You can write general rules say for prefix, infix, and postfix operators, 
> [...]

For simplicity I would be happy to consider only infix operators.

> [...] the actions put them onto to a stack with precedences and another for
> values. Then, when a new operator comes by, let the operators on the
> stack with higher precedences act on the value stack until something with
> a lower precedence appears, [...]

I read this twice and didn't understand anything. I read it once again and now
I understand you are proposing that when operators are used, I don’t really
use the syntax tree I'm generating with Bison _straightly_, but a more complex
syntax tree I'd be generating combining the natural tree that arises from the
grammar and other information in those data structures you propose. Did I
understand that right?

> On 8 Nov 2018, at 14:48, Hans Åberg  wrote:
> 
> 
>> On 2 Nov 2018, at 17:53, Uxio Prego  wrote:
>> 
>> More specifically, I'm curious to know if Bison can modify precedences
>> at parsing time according user sentences, now referring as user not the
>> programmer who wrote the *.y doc but the programmer writing a program
>> parsed by the parser generated from the *.y doc.
> 
> You can't but:
> 
> You can write general rules say for prefix, infix, and postfix operators, and 
> then the actions put them onto to a stack with precedences and another for 
> values. Then, when a new operator comes by, let the operators on the stack 
> with higher precedences act on the value stack until something with a lower 
> precedence appears, and put the new operator onto the stack. Continue until 
> the end symbol comes by that has the lowest precedence. Operator 
> associativity is handled by viewing left and right hand side precedences as 
> different.
> 
> 


___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: bison for nlp

2018-11-08 Thread r0ller
Hi Hans,

Sorry, I don't really get it:( What do you mean by replacing tokens by strings? 
How can that be done?

Best regards,
r0ller

 Eredeti levél 
Feladó: Hans Åberg < haber...@telia.com (Link -> mailto:haber...@telia.com) >
Dátum: 2018 november 8 14:28:03
Tárgy: Re: bison for nlp
Címzett: r0ller < r0l...@freemail.hu (Link -> mailto:r0l...@freemail.hu) >
 
> On 7 Nov 2018, at 10:09, r0ller  wrote:
>
> Numbering tokens was introduced in the very beginning and has been questioned 
> by myself quite a many times if it's still needed. I didn't give a hard try 
> to get rid of it mainly due to one reason: I want to have an error handling 
> that tells in case of an error which symbols could be accepted instead of the 
> erroneous one just as bison itself does it but in a structured way (as bison 
> returns that info in an error message string). Though, I could not come up 
> with any better idea when it comes to remapping a token to a symbol.

If the token numbers are replaced by strings "…", the Bison parser will print 
those, and they can also be used in the grammar. Would that suffice?
 
___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: are there user defined infix operators?

2018-11-08 Thread Akim Demaille
Hi Uxio, hi Hans,

You cannot use Bison to resolve dynamically your precedence if
you have a free set of levels.  But if you have a fixed number
of level, say 10, then you could define ten tokens for each level,
and give them the precedence you want.  Then, in the scanner,
map each operator to the corresponding level, storing the actual
operator as a semantic value.  The scanner could use a map for
instance to decide to which token you map each operator.

That wouldn’t be of much help if you also want to play with
associativity.  Maybe using even more tokens to denote the different
possibilities.


___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: bison for nlp

2018-11-08 Thread Hans Åberg


> On 8 Nov 2018, at 20:48, r0ller  wrote:
> 
> Sorry, I don't really get it:( What do you mean by replacing tokens by 
> strings? How can that be done?

Write
  %token t_ENG_Adv "English adverb"

Then, in error message, the Bison parser will write "English adverb", and you 
can also use it in the grammar instead of t_ENG_Adv.

>  Eredeti levél 
> Feladó: Hans Åberg < haber...@telia.com (Link -> mailto:haber...@telia.com) >
> Dátum: 2018 november 8 14:28:03
> Tárgy: Re: bison for nlp
> Címzett: r0ller < r0l...@freemail.hu (Link -> mailto:r0l...@freemail.hu) >
>  
> > On 7 Nov 2018, at 10:09, r0ller  wrote:
> >
> > Numbering tokens was introduced in the very beginning and has been 
> > questioned by myself quite a many times if it's still needed. I didn't give 
> > a hard try to get rid of it mainly due to one reason: I want to have an 
> > error handling that tells in case of an error which symbols could be 
> > accepted instead of the erroneous one just as bison itself does it but in a 
> > structured way (as bison returns that info in an error message string). 
> > Though, I could not come up with any better idea when it comes to remapping 
> > a token to a symbol.
> 
> If the token numbers are replaced by strings "…", the Bison parser will print 
> those, and they can also be used in the grammar. Would that suffice?
>  


___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: are there user defined infix operators?

2018-11-08 Thread Hans Åberg

> On 8 Nov 2018, at 20:21, Uxio Prego  wrote:
> 
>> You can write general rules say for prefix, infix, and postfix operators, 
>> [...]
> 
> For simplicity I would be happy to consider only infix operators.

For fixity overloads (same name but different fixity), one can have the 
overloads used in C/C++: prefix and postfix (as in ++), or prefix and infix (as 
in -), but not infix and postfix. This requires extra processing, keeping track 
of the token that was before; Bison cannot do that, so the lexer must do it. A 
grammar might look like, with the lexer figuring out what to return:

%left binary_operator
%left prefix_operator
%left binary_or_postfix_operator
%left postfix_operator

%%

expression:
value
  | prefix_operator expression
  | expression postfix_operator
  | expression binary_or_postfix_operator // Postfix operator
  | expression binary_or_postfix_operator expression // Binary operator:
  | expression binary_operator expression
;

>> [...] the actions put them onto to a stack with precedences and another for
>> values. Then, when a new operator comes by, let the operators on the
>> stack with higher precedences act on the value stack until something with
>> a lower precedence appears, [...]
> 
> I read this twice and didn't understand anything. I read it once again and now
> I understand you are proposing that when operators are used, I don’t really
> use the syntax tree I'm generating with Bison _straightly_, but a more complex
> syntax tree I'd be generating combining the natural tree that arises from the
> grammar and other information in those data structures you propose. Did I
> understand that right?

Take a simple example, a + b*c #, where # is the end marker. First put the a on 
the value stack, and the + on the operator stack, and then the b on the value 
stack. When the * comes by, it has higher precedence than the + on top of the 
operator stack, so it must be stacked. Then the c comes by, so put it on the 
value stack. Finally the end marker #, which has lower precedence than *, so 
let * operate on the value stack, and put back its value, b*c. Next is the +, 
and # has lower precedence, so + operates on the value stack, computing a + 
(b*c), which is put back onto the value stack. Then the operator stack empty, 
so the process is finished, and the value stack has the value.

One can also use a single stack, and an operator precedence grammar [1]. It 
might give better error reporting, but then you need to figure out how to 
integrate it into the Bison grammar.

1. https://en.wikipedia.org/wiki/Operator-precedence_grammar



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: are there user defined infix operators?

2018-11-08 Thread Hans Åberg


> On 8 Nov 2018, at 21:19, Akim Demaille  wrote:
> 
> Hi Uxio, hi Hans,

Hi Akim,

> You cannot use Bison to resolve dynamically your precedence if
> you have a free set of levels.  But if you have a fixed number
> of level, say 10, then you could define ten tokens for each level,
> and give them the precedence you want.  Then, in the scanner,
> map each operator to the corresponding level, storing the actual
> operator as a semantic value.  The scanner could use a map for
> instance to decide to which token you map each operator.

That is also a possibility, but make it at least 20 to cover C/C++ [1], as the 
10 or so that Haskell admits is too limited. But it becomes problematic if the 
number of levels is large, like 1200 as in SWI-Prolog.

> That wouldn’t be of much help if you also want to play with
> associativity.  Maybe using even more tokens to denote the different
> possibilities.

I recall that the Haskell interpreter Hugs [2] used something like that.


1. https://en.cppreference.com/w/cpp/language/operator_precedence
2. https://wiki.haskell.org/Hugs



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: are there user defined infix operators?

2018-11-08 Thread Uxio Prego
> You cannot use Bison to resolve dynamically your precedence if
> you have a free set of levels. But if you have a fixed number
> of level, say 10, [...]

Fixed number seems perfectly enough to me.

> you could define ten tokens for each level,
> map each operator to the corresponding level, storing the actual
> operator as a semantic value. The scanner could use a map for
> instance to decide to which token you map each operator.
> 
> That wouldn't be of much help if you also want to play with
> associativity. Maybe using even more tokens to denote the different
> possibilities.

I didn't understand all of this at first, but after reading the Hans
example too, I think I understand every part of this now. Thank you!

I'm not going to exploit this right now, but I rest assured to know
a way to explore if I ever need to get there.

___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: are there user defined infix operators?

2018-11-08 Thread Uxio Prego
> For fixity overloads (same name but different fixity), one can have the 
> overloads used in C/C++: prefix and postfix (as in ++), or prefix and infix 
> (as in -), but not infix and postfix. This requires extra processing, keeping 
> track of the token that was before; Bison cannot do that, so the lexer must 
> do it. A grammar might look like, with the lexer figuring out what to return:
> [...]

This seems very insightful, thank you.

> Take a simple example, a + b*c #, where # is the end marker. First put the a 
> on the value stack, and the + on the operator stack, and then the b on the 
> value stack. When the * comes by, it has higher precedence than the + on top 
> of the operator stack, so it must be stacked. Then the c comes by, so put it 
> on the value stack. Finally the end marker #, which has lower precedence than 
> *, so let * operate on the value stack, and put back its value, b*c. Next is 
> the +, and # has lower precedence, so + operates on the value stack, 
> computing a + (b*c), which is put back onto the value stack. Then the 
> operator stack empty, so the process is finished, and the value stack has the 
> value.
> [...]

The example and explanation are worth a thousand words,
thank you very much. So I use a simple grammar like that, and
the stack data structures, and if necessary feed the lexer back
with data from the parser once the user requests some infix
operators.

I'm not going to exploit this right now, but I rest assured to know
a way to explore if I ever need to get there.
___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: bison for nlp

2018-11-08 Thread r0ller
Hi Hans,

Wow, I did not know that feature. Is such a  string only used when creating an 
error message? Will the token still have an integer assigned that can be used 
in the actions or do I get there the string as well instead of an integer? By 
the way, I'll still get the error message as a string I guess, right?

Best regards,
r0ller

 Eredeti levél 
Feladó: Hans Åberg < haber...@telia.com (Link -> mailto:haber...@telia.com) >
Dátum: 2018 november 8 21:25:03
Tárgy: Re: bison for nlp
Címzett: r0ller < r0l...@freemail.hu (Link -> mailto:r0l...@freemail.hu) >
 
> On 8 Nov 2018, at 20:48, r0ller  wrote:
>
> Sorry, I don't really get it:( What do you mean by replacing tokens by 
> strings? How can that be done?
Write
%token t_ENG_Adv "English adverb"
Then, in error message, the Bison parser will write "English adverb", and you 
can also use it in the grammar instead of t_ENG_Adv.
>  Eredeti levél 
> Feladó: Hans Åberg < haber...@telia.com (Link -> mailto:haber...@telia.com) >
> Dátum: 2018 november 8 14:28:03
> Tárgy: Re: bison for nlp
> Címzett: r0ller < r0l...@freemail.hu (Link -> mailto:r0l...@freemail.hu) >
>
> > On 7 Nov 2018, at 10:09, r0ller  wrote:
> >
> > Numbering tokens was introduced in the very beginning and has been 
> > questioned by myself quite a many times if it's still needed. I didn't give 
> > a hard try to get rid of it mainly due to one reason: I want to have an 
> > error handling that tells in case of an error which symbols could be 
> > accepted instead of the erroneous one just as bison itself does it but in a 
> > structured way (as bison returns that info in an error message string). 
> > Though, I could not come up with any better idea when it comes to remapping 
> > a token to a symbol.
>
> If the token numbers are replaced by strings "…", the Bison parser will print 
> those, and they can also be used in the grammar. Would that suffice?
>
 
___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: are there user defined infix operators?

2018-11-08 Thread Hans Åberg


> On 8 Nov 2018, at 22:34, Uxio Prego  wrote:
> 
>> Take a simple example, a + b*c #, where # is the end marker. First put the a 
>> on the value stack, and the + on the operator stack, and then the b on the 
>> value stack. When the * comes by, it has higher precedence than the + on top 
>> of the operator stack, so it must be stacked. Then the c comes by, so put it 
>> on the value stack. Finally the end marker #, which has lower precedence 
>> than *, so let * operate on the value stack, and put back its value, b*c. 
>> Next is the +, and # has lower precedence, so + operates on the value stack, 
>> computing a + (b*c), which is put back onto the value stack. Then the 
>> operator stack empty, so the process is finished, and the value stack has 
>> the value.
>> [...]
> 
> The example and explanation are worth a thousand words,
> thank you very much. So I use a simple grammar like that, and
> the stack data structures, and if necessary feed the lexer back
> with data from the parser once the user requests some infix
> operators.

It is only if you want to have a prefix and an infix or postfix operator with 
the same name, like operator- or operator++ in C++, that there is a need for 
handshake between the lexer and the parser, and it suffices with a boolean 
value that tells whether the token last seen is a prefix operator. Initially 
set to false, the prefix operators set it to true in the parser, and all other 
expression tokens set it to false. Then, when the lexer sees an operator that 
can be both a prefix and an infix or postfix, it uses this value to 
disambiguate. I leave it to you to figure out the cases, it is not that hard, 
just a bit fiddly. :-)



___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison


Re: bison for nlp

2018-11-08 Thread Akim Demaille


> Le 8 nov. 2018 à 22:35, r0ller  a écrit :
> 
> Hi Hans,
> 
> Wow, I did not know that feature. Is such a  string only used when creating 
> an error message?

It also becomes an alias in the grammar file itself.  For instance with

%token PLUS "+"

in addition to

exp: exp PLUS exp

you can also write

exp: exp "+" exp

which I find much clearer.  It's probably not helpful in your
case though.  I would also recommend that you have a look at the
documentation of api.token.prefix.

> Will the token still have an integer assigned that can be used in the actions 
> or do I get there the string as well instead of an integer?

No, it’s an alias, they have a single number.

> By the way, I’ll still get the error message as a string I guess, right?

Yes.  Some day we will work on improving error message generation,
there is much demand.
___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

Re: bison for nlp

2018-11-08 Thread Akim Demaille
Hi!

> Le 7 nov. 2018 à 10:09, r0ller  a écrit :
> 
> Hi Akim,
> 
> The file hi_nongen.y is just left there as the last version that I wrote 
> manually:) If you check out any other hi.y files in the platform specific 
> directories (e.g. the one for the online demo is 
> https://github.com/r0ller/alice/blob/master/hi_js/hi.y but you can have a 
> look in hi_android or hi_desktop as well) you’ll see how they look like 
> nowadays.

You have tons of

   logger::singleton()==NULL?(void)0:logger::singleton()->log(2,"vm is NULL!");

you could introduce logger::log, or whatever free function,
that does that for you instead of having to deal with that
in every call site.

> Numbering tokens was introduced in the very beginning and has been questioned 
> by myself quite a many times if it's still needed. I didn’t give a hard try 
> to get rid of it mainly due to one reason: I want to have an error handling 
> that tells in case of an error which symbols could be accepted instead of the 
> erroneous one just as bison itself does it but in a structured way (as bison 
> returns that info in an error message string).

Where are these numbers used?

> Though, I could not come up with any better idea when it comes to remapping a 
> token to a symbol. As far as I know bison uses internally the tokens and not 
> the symbols for the terminals and it's not possible to get back a symbol 
> belonging to a certain token. That's it roughly but I'd be glad to get rid of 
> it. However, if it's not possible and poses no problems then I can live with 
> it. By the way, are there any number ranges or specific numbers that are 
> reserved?

Some numbers are reserved, yes: 0 for eof and 256 for error (per POSIX).  For 
error, Bison can accommodate if you use 256.  EOF must be 0.


> Not using the C++ features of bison has historical reasons: I started writing 
> the project in C and even back then I used yacc which I later replaced with 
> bison. When I started to shift the project to C++ I was glad that it still 
> worked with the generated C parser and since then I never had time to make 
> such an excursion but it'd be great. I also must admit that I wasn't really 
> aware of it. The only thing I read somewhere was that bison has a C++ wrapper 
> but have never taken any steps into that direction.

I don’t know what you mean here: this is bison itself, there’s
no need for a wrapper, and the deterministic parser itself is
genuine C++, not C++ wrapping C.  The GLR parser in C++ though _is_
a wrapper for the C GLR parser.

> Now I think I'll find some time for it -at least to check it out:) Could you 
> give me any links pointing to any tutorial or something like that? It’d be 
> very kind if you could help me in taking the first steps, thanks!

I would very like to have your opinion on the open section of the
documentation about C++.  It’s recent, and it probably needs polishing.

https://www.gnu.org/software/bison/manual/bison.html#A-Simple-C_002b_002b-Example


___
help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison

3.2.1.0... 🚀 bison is released [stable]

2018-11-08 Thread Akim Demaille
We would have been happy not to have to announce the release of Bison 3.2.1,
which fixes portability issues of Bison 3.2.

Bison 3.2 brought massive improvements to the deterministic C++ skeleton,
lalr1.cc.  When variants are enabled and the compiler supports C++11 or
better, move-only types can now be used for semantic values.  C++98 support
is not deprecated.  Please see the NEWS below for more details.

Many thanks to Frank Heckenbach for paving the way for this release with his
implementation of a skeleton in C++17, and to Nelson H. F. Beebe for testing
exhaustively portability issues.

==

Bison is a general-purpose parser generator that converts an annotated
context-free grammar into a deterministic LR or generalized LR (GLR) parser
employing LALR(1) parser tables.  Bison can also generate IELR(1) or
canonical LR(1) parser tables. Once you are proficient with Bison, you can
use it to develop a wide range of language parsers, from those used in
simple desk calculators to complex programming languages.

Bison is upward compatible with Yacc: all properly-written Yacc grammars
ought to work with Bison with no change. Anyone familiar with Yacc should be
able to use Bison with little trouble. You need to be fluent in C or C++
programming in order to use Bison. Java is also supported.

Here is the GNU Bison home page:
   https://gnu.org/software/bison/

==

Here are the compressed sources:
  https://ftp.gnu.org/gnu/bison/bison-3.2.1.tar.gz   (4.1MB)
  https://ftp.gnu.org/gnu/bison/bison-3.2.1.tar.xz   (2.1MB)

Here are the GPG detached signatures[*]:
  https://ftp.gnu.org/gnu/bison/bison-3.2.1.tar.gz.sig
  https://ftp.gnu.org/gnu/bison/bison-3.2.1.tar.xz.sig

Use a mirror for higher download bandwidth:
  https://www.gnu.org/order/ftp.html

[*] Use a .sig file to verify that the corresponding file (without the
.sig suffix) is intact.  First, be sure to download both the .sig file
and the corresponding tarball.  Then, run a command like this:

  gpg --verify bison-3.2.1.tar.gz.sig

If that command fails because you don't have the required public key,
then run this command to import it:

  gpg --keyserver keys.gnupg.net --recv-keys 0DDCAA3278D5264E

and rerun the 'gpg --verify' command.

This release was bootstrapped with the following tools:
  Autoconf 2.69
  Automake 1.16.1
  Flex 2.6.4
  Gettext 0.19.8.1
  Gnulib v0.1-2176-ga79f2a287

NEWS

* Noteworthy changes in release 3.2.1 (2018-11-09) [stable]

** Bug fixes

  Several portability issues have been fixed in the build system, in the
  test suite, and in the generated parsers in C++.

* Noteworthy changes in release 3.2 (2018-10-29) [stable]

** Backward incompatible changes

  Support for DJGPP, which have been unmaintained and untested for years, is
  obsolete.  Unless there is activity to revive it, it will be removed.

** Changes

  %printers should use yyo rather than yyoutput to denote the output stream.

  Variant-based symbols in C++ should use emplace() rather than build().

  In C++ parsers, parser::operator() is now a synonym for the parser::parse.

** Documentation

  A new section, "A Simple C++ Example", is a tutorial for parsers in C++.

  A comment in the generated code now emphasizes that users should not
  depend upon non-documented implementation details, such as macros starting
  with YY_.

** New features

*** C++: Support for move semantics (lalr1.cc)

  The lalr1.cc skeleton now fully supports C++ move semantics, while
  maintaining compatibility with C++98.  You may now store move-only types
  when using Bison's variants.  For instance:

%code {
  #include 
  #include 
}

%skeleton "lalr1.cc"
%define api.value.type variant

%%

%token  INT "int";
%type > int;
%type >> list;

list:
  %empty{}
| list int  { $$ = std::move($1); $$.emplace_back(std::move($2)); }

int: "int"  { $$ = std::make_unique($1); }

*** C++: Implicit move of right-hand side values (lalr1.cc)

  In modern C++ (C++11 and later), you should always use 'std::move' with
  the values of the right-hand side symbols ($1, $2, etc.), as they will be
  popped from the stack anyway.  Using 'std::move' is mandatory for
  move-only types such as unique_ptr, and it provides a significant speedup
  for large types such as std::string, or std::vector, etc.

  If '%define api.value.automove' is set, every occurrence '$n' is replaced
  by 'std::move ($n)'.  The second rule in the previous grammar can be
  simplified to:

list: list int  { $$ = $1; $$.emplace_back($2); }

  With automove enabled, the semantic values are no longer lvalues, so do
  not use the swap idiom:

list: list int  { std::swap($$, $1); $$.emplace_back($2); }

  This idiom is anyway obsolete: it is preferable to move than to swap.

  A warning is issued when automove is enabled, and a value is used several
  times.