Re: how to get left hand side symbol in action
Hi Akim, I agree, the logging could look better:) You're also right about that I could introduce some placeholder symbols for my own framework and then I could generate the parent symbol in user defined action implementations as well. I also considered that option. The only thing is that first I wanted to check if bison provides such a feature out of the box and when Hans gave me the answer, I was happy that I don't have to develop anything on top:) -"I don't understand why you don't put the actions in the db itself, but that's your choice." The users can put the action implementation directly in the db as well. So using the external files is just a convenience feature as you don't have to put manually an action implementation to a field of a db record which I find a bit cumbersome compared to putting there only the name of a file. DB editors usually won't offer syntax highlight for example. This makes life easier I think especially if you want to use the same action implementation for more than one rule. Which is currently a bit hindered by not being able to get the lhs symbol as just because of that you still need as many separate files as many different lhs symbols you have for the rules you want to use the implementation for. But as you also mentioned, I could as well introduce my own symbols which could be replaced during generating the bison source. So no worries:) -"Don't do that. I clearly stated that it is not guaranteed to work." Ok, I'll indicate in the guide that it's a misuse of bison internals and if one wants to avoid that shall put the action implementations in separate files and use the lhs symbol of the rule literally. "Also, am I understanding that you don't actually felt the need for the feature in practice, but only for teaching something?" Kind of, as it's not crucial:) I bumped into the issue (of creating different action implementations just because of the lhs symbol) when writing the guide (so yes, teaching others how to use the framework) and thought about how I could make it more convenient. So I googled and finally asked the question here as I haven't found anything. The rest of the story you know:) Best regards, r0ller Eredeti levél Feladó: Akim Demaille < a...@lrde.epita.fr (Link -> mailto:a...@lrde.epita.fr) > Dátum: 2019 május 10 08:47:53 Tárgy: Re: how to get left hand side symbol in action Címzett: r0ller < r0l...@freemail.hu (Link -> mailto:r0l...@freemail.hu) > Hi r0ller, > Le 6 mai 2019 à 21:13, r0ller a écrit : > > Hi Akim, > > [...] > A : B C > { > const node_info& main_node=sparser->get_node_info($1); > const node_info& dependent_node=sparser->get_node_info($2); > sparser->add_feature_to_leaf(main_node,"main_verb"); > std::string parent_symbol="A";//<--Here the symbol is hardcoded > logger::singleton()==NULL?(void)0:logger::singleton()->log(0,parent_symbol+"->"+main_node.symbol+" > "+dependent_node.symbol); > $$=sparser->combine_nodes(parent_symbol,main_node,dependent_node); > } > > D : E F > { > const node_info& main_node=sparser->get_node_info($1); > const node_info& dependent_node=sparser->get_node_info($2); > sparser->add_feature_to_leaf(main_node,"main_verb"); > std::string parent_symbol="D";//<--Here the symbol is hardcoded > logger::singleton()==NULL?(void)0:logger::singleton()->log(0,parent_symbol+"->"+main_node.symbol+" > "+dependent_node.symbol); > $$=sparser->combine_nodes(parent_symbol,main_node,dependent_node); > } I believe we already discussed about this, but I don't understand why your API for logging looks this way. You should use a free standing function that encapsulates all this 'singleton()' sequence you repeat. But back to the point: you are *generating* the grammar, so I do not understand why *you* don't generate the name directly. Actually, I guess most of your action can synthesized from data stored in your data base, in addition to the symbol names. > This means, that one would need to save these two "snippets" in two different > files, say snippet1 and snippet2 and put the two file names in the action > fields of the corresponding db records. I don't understand why you don't put the actions in the db itself, but that's your choice. Still, you can perfectly introduce your own syntax for your external files, and process them before pasting them. Say > std::string parent_symbol="%LHS_NAME%"; > These examples you can find in the tech guide towards the end of section > 'Generating syntactic rules and machine learning': > https://github.com/r0ller/alice/wiki/Technical-Guide-and-Documentation#Option-2-3 > > and the other one towards the end of section 'Tagging': > https://github.com/r0ller/alice/wiki/Technical-Guide-and-Documentation#Tagging > > Though, as I changed it today after Hans's suggestion, you'll now find the > same implementation at those two places. Don't do that. I clearly stated that it is not guaranteed to work. Also, am I understanding that you don't actually felt the need f
Re: how to get left hand side symbol in action
> On 10 May 2019, at 07:24, Akim Demaille wrote: > >> In practice you just need the symbol name as is. Nobody needs the >> translation, > > I beg to disagree. Nobody should translate the keyword "break", > but > >> # bison /tmp/foo.y >> /tmp/foo.y:1.7: erreur: erreur de syntaxe, : inattendu, attendait char ou >> identifier ou >>1 | %token: FOO >> | ^ > > looks stupid; "char", "identifier" and "" should be translated. I think it should only output whatever is in the yytname_ table. Does not the translation taking place dynamically of that in the error message? ___ help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison
Re: how to get left hand side symbol in action
On Freitag, 10. Mai 2019 07:24:51 CEST Akim Demaille wrote: > Hi Christian, Hi Akim! > Aren't you referring to LA correction, as implemented in Bison? > > https://www.gnu.org/software/bison/manual/html_node/LAC.html Well no, it is has somewhat in common, but the use cases are differently, see below. > > That's actually a very common required feature in practice. The main use > > cases are: > > > > 1. Auto completion features > > > > 2. Human friendly error messages > > I think you are referring to the name of the tokens, not all the symbols. > For the error messages, it makes sense. Although I am now more convinced > that most of the time, error messages are more readable when you quote > exactly the source than when you print token names. No, I really mean the non-terminal symbols. Because sometimes [not always ;-)] I don't use the classic approach of separate lexer and parser, but rather let the parser do the lexer job as well, like: FOO : 'F''O''O' ; which can avoid complex and error prone push and pop constructs that would be required with a separate lexer approach and certain complex grammars. So effectively some of the non-terminals (from Bison point of view) are in such specific use cases "terminals" from use case point of view. But one of the main problems of this approach is that e.g. the default syntax error messages would no longer be useful at all. Because you would get a message like: Unexpected 'b', expecting 'F'. Instead of e.g. Unexpected 'bla', expecting 'FOO'. > > I do need these features for almost all parsers, hence for years (since > > not > > available directly with Bison) I have a huge bunch of code on top of the > > internal skeleton code to achieve them. > > Is there available for reading somewhere? Well, I asked couple years ago on the list if anybody was interested in what I am doing to achieve these kinds of features. But got no positive reply, hence I did not invest time to write about it in detail yet. However I am not sure if my approach would be of use for you anyway. Basically what I am doing (roughly described) is C++ code on top of the internal generated parser tables which replicate what the skeleton LALR(1) parser would do. So the algorithm takes a parser state as argument: std::vector& stack and then it walks the possible routes starting from there (accessing the auto generated tables directly), that is pushing for each route/branch on stack, reducing whenever required etc. And to detect endless recursions while doing this, I always track the complete parser history with typedef std::set< std::vector > YYStackHistory; So each entry in this history set is a previous parser symbol stack, and when I reach a new parser state I check if the current parser stack already exists in that history set and if it does exist already, then it reached an endless recursion and I abort the algorithm for that individual branch and then continue with the next branch, and so on. That full history tracking might take a lot of memory of course though, but eventually the algorithm ends up returning a result: std::map& expectedSymbols where the result map (not a multi map actually) contains the possible next grammar rules for the previously supplied parser state; key being the symbol name, value is a struct (BisonSymbolInfo) holding a) the sequence of characters expected next for satisfying that grammar symbol and b) a flag whether the symbol is (considered as) terminal or a "real" non-terminal (see below). So that C++ code is for me the basis for retrieving the next possible grammar rules for any given parser state, which I use both for error messsages as well as for auto completion features. Another problem with my pseudo-terminals (example FOO above): From Bison point of view, everything in my grammar are now non-terminals, and I don't want to manually mark individual grammar rules to be either considered as "real" non- terminals and others to be considered as "terminals", So I automated that by checking whether the same symbol can be resolved in more than one way, then it is a "real" non-terminal, and by checking if the right hand side of the rule only contains characters, then it is considered as "terminal. And that fundamental information is automatically used to provide appropriate error messages and auto completion in a fully automated way. > Was the feature always fitting perfectly? Never ever did it result in something somewhat incorrect? I did not make a proof of correctness of the algorithm. As you know you have to spend a huge amount of time to really proof this, which I did not. Especially when you are working on commercial projects you also have to be pragmatic sometimes and also accept the chance of imperfectness for the sake of getting forward and weigh costs on usefulness. The main issue was catching endless recursions that I solved like described above. I have not
Re: how to get left hand side symbol in action
> On 10 May 2019, at 07:24, Akim Demaille wrote: > > 1. there is a real and valid need for the feature, which I still need > to be convinced of, especially because symbol names are technical > details! One can also write better error messages by using these internal yytname_ table names: If one checks on a lookup table whether the name has been already defined and it is, then one can give information about that already present name. For example: “name” { std::optional x0 = my::symbol_table.find($x.text); if (x0) { throw syntax_error(@x, "Name " + $x.text + " already defined in this scope as " + yytnamerr_(yytname_[x0->first - 255])); } … } Right now, all parts are internal and may change: the token translation x0->first - 255, yytname_ lookup, and error message cleanup yytnamerr_. ___ help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison
Re: how to get left hand side symbol in action
On May 10, 2019, at 5:21 AM, Hans Åberg wrote: > > >> On 10 May 2019, at 07:24, Akim Demaille wrote: >> >>> In practice you just need the symbol name as is. Nobody needs the >>> translation, >> >> I beg to disagree. Nobody should translate the keyword "break", >> but >> >>> # bison /tmp/foo.y >>> /tmp/foo.y:1.7: erreur: erreur de syntaxe, : inattendu, attendait char ou >>> identifier ou >>> 1 | %token: FOO >>> | ^ >> >> looks stupid; "char", "identifier" and "" should be translated. > > I think it should only output whatever is in the yytname_ table. Does not the > translation taking place dynamically of that in the error message? I don’t think it’s worthwhile for bison to support internationalization in the generated parser. In fact, I do not think error messages in the generated parser should be provided by bison, except as a trivial default. Instead, I’d like to see bison call a function (which I provide) when a syntax error occurs, with a list of what was expected along with what was encountered. Obviously, they should be provided without translation. That would be the most flexible. BTW, I suggested this in a message dates 6 Feb 2019, with more details, about what such an API might look like. Derek ___ help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison
Re: how to get left hand side symbol in action
On May 10, 2019, at 6:47 AM, Hans Åberg wrote: > > >> On 10 May 2019, at 07:24, Akim Demaille wrote: >> >> 1. there is a real and valid need for the feature, which I still need >> to be convinced of, especially because symbol names are technical >> details! > > One can also write better error messages by using these internal yytname_ > table names: > > If one checks on a lookup table whether the name has been already defined and > it is, then one can give information about that already present name. For > example: > “name” { > std::optional x0 = > my::symbol_table.find($x.text); > > if (x0) { >throw syntax_error(@x, "Name " + $x.text + " already defined in this > scope as " > + yytnamerr_(yytname_[x0->first - 255])); > } >… > } > > Right now, all parts are internal and may change: the token translation > x0->first - 255, yytname_ lookup, and error message cleanup yytnamerr_. I agree with this — this has been an annoyance for me for years. Derek ___ help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison
Re: how to get left hand side symbol in action
On Freitag, 10. Mai 2019 06:57:00 CEST Derek Clegg wrote: > In fact, I do not think error messages in the generated parser should be > provided by bison, except as a trivial default. Instead, I’d like to see > bison call a function (which I provide) when a syntax error occurs, with a > list of what was expected along with what was encountered. Obviously, they > should be provided without translation. That would be the most flexible. Yep, I can just agree. I actually almost never use the built-in error messages except of for early prototyping or something like that. Because being honest, probably everybody has a completely different opinion about how an appropriate representation of an error message in a final software release should look like exactly. Best regards, Christian Schoenebeck ___ help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison
Re: how to get left hand side symbol in action
Hey Christian, Thanks a lot for taking the time to give details about your use case. > Le 10 mai 2019 à 15:11, Christian Schoenebeck a > écrit : > > On Freitag, 10. Mai 2019 07:24:51 CEST Akim Demaille wrote: > >> Aren't you referring to LA correction, as implemented in Bison? >> >> https://www.gnu.org/software/bison/manual/html_node/LAC.html > > Well no, it is has somewhat in common, but the use cases are differently, see > below. And there was nothing that could be shared? A lot of what you described below looks like what LAC does. But I am definitely not a LAC expert, nor one of your application, obviously :) The more I read what you do, the more I think it's the same thing. But then of course one issue is that LAC is supported by yacc.c only currently, one would need to port it to lalr1.cc. >> I think you are referring to the name of the tokens, not all the symbols. >> For the error messages, it makes sense. Although I am now more convinced >> that most of the time, error messages are more readable when you quote >> exactly the source than when you print token names. > > No, I really mean the non-terminal symbols. Because sometimes [not always > ;-)] > I don't use the classic approach of separate lexer and parser, but rather let > the parser do the lexer job as well, like: > > FOO : 'F''O''O' ; Ok. But then we face exactly what I'm saying: you are constrained by the syntax of symbols. If you had say regular expressions in your grammar, you would be forced to write regular_expresion: ... and display "regular_expression" in your messages. That's ugly. Using symbol identifiers is not correct. It does not fully fit your need, it just "mostly works". I can't bake that into Bison. That's exactly why, as I already said, tokens have user friendly *names* in addition to the pure identifiers. Names are ok, identifiers are not. So *if* we want some feature like this, we would have to support naming non terminal symbols. > which can avoid complex and error prone push and pop constructs that would be > required with a separate lexer approach and certain complex grammars. I would like to include scannerless parsing in Bison in the future. I have no idea when I will actually work on this, but that's definitely something I have in mind. That would be Bison 4 :) >>> I do need these features for almost all parsers, hence for years (since >>> not >>> available directly with Bison) I have a huge bunch of code on top of the >>> internal skeleton code to achieve them. >> >> Is there available for reading somewhere? > > [...] > However I am not sure if my approach would be of use for you anyway. I was really curious of understanding your use case, not really looking for more features to maintain :) Do you disable the default reductions? Reading what you do, it seems that it would make your computations more accurate. > eventually the algorithm ends up > returning a result: > > std::map& expectedSymbols > > where the result map (not a multi map actually) contains the possible next > grammar rules for the previously supplied parser state; key being the symbol > name, value is a struct (BisonSymbolInfo) holding a) the sequence of > characters expected next for satisfying that grammar symbol and b) a flag > whether the symbol is (considered as) terminal or a "real" non-terminal (see > below). I guess BisonSymbolInfo is the most significant difference with LAC, isn't it? But the traversal is probably the same. > Another problem with my pseudo-terminals (example FOO above): From Bison > point > of view, everything in my grammar are now non-terminals, and I don't want to > manually mark individual grammar rules to be either considered as "real" non- > terminals and others to be considered as "terminals", So I automated that by > checking whether the same symbol can be resolved in more than one way, then > it > is a "real" non-terminal, and by checking if the right hand side of the rule > only contains characters, then it is considered as "terminal. And that > fundamental information is automatically used to provide appropriate error > messages and auto completion in a fully automated way. It's unclear to me whether you do this at generation time, or at parse time. >> Was the feature always fitting perfectly? Never ever did it result in >> something somewhat incorrect? > > I did not make a proof of correctness of the algorithm. I was very ambiguous here, sorry! I meant the feature of using symbol names. I understand that you want to be able to manipulate the symbols themselves. What I am arguing about it that you probably don't need them as strings. I tend to think you need them as an enum, just like the tokens, so that you can map them to some real string or whatever other treatment. But handing them as strings used as keys in some container would be a useless cost compared to an enum. >> I beg to disagree. Nobody should translate the keyword
Re: how to get left hand side symbol in action
> Le 10 mai 2019 à 16:28, Christian Schoenebeck a > écrit : > > On Freitag, 10. Mai 2019 06:57:00 CEST Derek Clegg wrote: >> In fact, I do not think error messages in the generated parser should be >> provided by bison, except as a trivial default. Instead, I’d like to see >> bison call a function (which I provide) when a syntax error occurs, with a >> list of what was expected along with what was encountered. Obviously, they >> should be provided without translation. That would be the most flexible. > > Yep, I can just agree. Except that you don't. The API that was proposed initially did not fit your need. At least, that was my understanding of https://lists.gnu.org/archive/html/help-bison/2019-02/msg00018.html Listen guys (Derek, Christian): how about you submit a patch of what you think the feature should be? Then it will be _much_ easier to discuss. First, agree together (you two), then see what is the reaction here, including mine, obviously (since in practice, AFAICT, nobody else is offering to spend time to maintain Bison :-). I plan to release 3.4 soon, and to revive what I had done for the internationalisation of error messages. If we coordinate ourselves correctly, the focus of 3.5 could be error messages. Cheers! ___ help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison
Re: how to get left hand side symbol in action
> On May 10, 2019, at 10:43 AM, Akim Demaille wrote: > > > >> Le 10 mai 2019 à 16:28, Christian Schoenebeck a >> écrit : >> >> On Freitag, 10. Mai 2019 06:57:00 CEST Derek Clegg wrote: >>> In fact, I do not think error messages in the generated parser should be >>> provided by bison, except as a trivial default. Instead, I’d like to see >>> bison call a function (which I provide) when a syntax error occurs, with a >>> list of what was expected along with what was encountered. Obviously, they >>> should be provided without translation. That would be the most flexible. >> >> Yep, I can just agree. > > Except that you don't. The API that was proposed initially did not > fit your need. At least, that was my understanding of > > https://lists.gnu.org/archive/html/help-bison/2019-02/msg00018.html > > Listen guys (Derek, Christian): how about you submit a patch of what > you think the feature should be? I believe that my suggestion in the above post would be a good starting point. While Christian’s concern is valid, I think my proposal would be an improvement over the current method of error reporting. Just to reiterate, I suggested this approach: > ...replace the code starting with > >char const* yyformat = YY_NULLPTR; >switch (yycount) > { > #define YYCASE_(N, S)\ >case N: \ > yyformat = S; \ >break > … > return yyres; > > in yysyntax_error_ with a call like > >my_yysyntax_error(yyarg, yycount); > > where “my_yysyntax_error” is a user-defined function which does the work of > handling syntax error reporting. > > In C++, we could be fancier: have two functions, one for a “pure” syntax > error: > >void my_yysyntax_error(); > > and one for syntax errors involving unexpected tokens: > > void my_yysyntax_error(std::string unexpected_token, > std::vector expected_tokens); > > The call would be something like this: > >if (yycount == 0) { > my_yysyntax_error(); >} else { > my_yysyntax_error(yyarg[0], std::vector{yyarg + 1, yyarg + > yycount}); >} ___ help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison
Re: how to get left hand side symbol in action
On Freitag, 10. Mai 2019 19:34:54 CEST Akim Demaille wrote: > And there was nothing that could be shared? A lot of what you described > below looks like what LAC does. But I am definitely not a LAC expert, nor > one of your application, obviously :) Well, you can look at this example: http://svn.linuxsampler.org/cgi-bin/viewvc.cgi/linuxsampler/trunk/src/network/lscp.y?view=markup And before you say it: I know that the grammar has conflicts, just ignore the grammar. The relevant C++ code regarding this topic is below the grammar definition in that .y file. And the result of that source looks/behaves like this: http://doc.linuxsampler.org/LSCP_Shell/ Now that I look at it, I remember this old example had bugs, for instance the on the screen shot pretty much looks like a wrong suggestion. But it is just an example anyway and you can see the source. > > No, I really mean the non-terminal symbols. Because sometimes [not always > > ;-)] I don't use the classic approach of separate lexer and parser, but > > rather let the parser do the lexer job as well, like: > > > > FOO : 'F''O''O' ; > > Ok. But then we face exactly what I'm saying: you are constrained > by the syntax of symbols. If you had say regular expressions in > your grammar, you would be forced to write > > regular_expresion: ... > > and display "regular_expression" in your messages. That's ugly. > Using symbol identifiers is not correct. It does not fully fit your > need, it just "mostly works". I can't bake that into Bison. Of course I understand your point. The difference in viewing this is just that you want to see the final solution in Bison to be as clean as possible, whereas most other people are already fine with the fact that symbol names can act as unique identifiers which easily can be converted into any other kind of representation, e.g. by simply doing a table/map lookup in their yyerror() implementation to solve your example. > Do you disable the default reductions? Reading what you do, it seems > that it would make your computations more accurate. I was not even aware that I could disable them, so not yet. Might be worth a try. :) Thanks! > I guess BisonSymbolInfo is the most significant difference with LAC, > isn't it? But the traversal is probably the same. Well, in the end the algorithm I described is more or less just replicating what LALR(1) does for the actual purpose of resolving the next possible symbols. So the root purpose is getting access to grammar details in certain states which would otherwise not be available with the stock skeleton on its own. As far as I can see it LAC would not provide those information required, would it? I mean how would you e.g. implement auto completion features with LAC? Plus the docs say LAC might end up in endless recursions. > > Another problem with my pseudo-terminals (example FOO above): From Bison > > point of view, everything in my grammar are now non-terminals, and I > > don't want to manually mark individual grammar rules to be either > > considered as "real" non- terminals and others to be considered as > > "terminals", So I automated that by checking whether the same symbol can > > be resolved in more than one way, then it is a "real" non-terminal, and > > by checking if the right hand side of the rule only contains characters, > > then it is considered as "terminal. And that fundamental information is > > automatically used to provide appropriate error messages and auto > > completion in a fully automated way. > > It's unclear to me whether you do this at generation time, or at parse time. At parse time. > I understand that you want to be able to manipulate the symbols themselves. > What I am arguing about it that you probably don't need them as strings. > I tend to think you need them as an enum, just like the tokens, so that > you can map them to some real string or whatever other treatment. But > handing them as strings used as keys in some container would be a useless > cost compared to an enum. Enums would be more clean of course (even though string comparisons are cheap nowadays). But there should still be an easy way IMO to convert that enum into a string representation. Otherwise developers would always need to write tables manually for doing these enum -> string conversions. Best regards, Christian Schoenebeck ___ help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison