On Sun, Mar 10, 2019 at 6:48 AM <sylvain.bertr...@gmail.com> wrote: > > On Sun, Mar 10, 2019 at 06:17:16AM +0100, Markus Wichmann wrote: > > Well, other people have made that point before: Why use a regex to > > identify a token when a simple loop will do? > > > > So for lexing, usually a simple token parser in C will do the job > > better. And for parsing, you get the problem that yacc will create an > > LALR parser, which is a bottom-up parser. Which may be faster but > > doesn't allow for good error messages on faulty input ("error: expected > > this, that, or the other token before this one"). That's why top-down > > recursive-descent parsers (or LL(1) parsers) are superior. Maybe > > supplemented with a shunting-yard algorithm to get the binary > > expressions right without having to call layer after layer of functions. > > This is exactly what I am experiencing while coding this little/simple custom > language parser. > Yep, I guess lex/yacc (then GNU flex/GNU bison) are inappropriate, I even > would > generalize to they do not belong in "suckless".
There's options. Have you tried Lemon Parser [0] or miniyacc + qbe [1][2]? ucpp [3] lexes/parses C-like languages with C pre-processing. re2c [4] is a great lexer. Crockford prefers Pratt's Top-Down Operator Precedence [5][6] and his webpage source repo even includes a nifty lexer that is easy to translate from JS to C [7]. HTH, [0] https://www.hwaci.com/sw/lemon/ [1] http://c9x.me/yacc/ [2] http://c9x.me/compile/ [3] https://github.com/lpsantil/ucpp [4] http://re2c.org/ [5] http://crockford.com/javascript/tdop/tdop.html [6] https://www.oilshell.org/blog/2016/11/02.html [7] https://github.com/douglascrockford/TDOP/blob/master/tokens.js