Hi Daniele, > Which other scanners do people use? For what it’s worth, we are using a hand-rolled scanner. Seemed just the fastest way to get rolling and the easiest to maintain.
Also, it allowed us to embed a few hacks directly inside the scanner: E.g. in a few places our grammar is not actually LR1. Only in very few edge cases, though, so that we don’t want to use GLR. Hence, our scanner does a lookahead and, e.g., upon encountering the token “WITH” looks at the following token. If the next token is “TIMESTAMP”, it produces “WITH_LA” instead of just “WITH”. Thereby, we get 1 look-ahead from the scanner. Combined with the 1 lookahead provided by bison, we can now parse our LR2 grammar. Not sure if this would have been possible also with flex – but given we have a hand-rolled parser it was straightforward. You can find a similar hack also in https://github.com/postgres/postgres/blob/master/src/backend/parser/gram.y#L721, if you look for the WITH_LA keywords. Postgres is using a flex scanner and then stacks a custom layer between flex and bison which introduces the additional maintenance overhead. Cheers, Adrian From: help-bison <help-bison-bounces+avogelsgesang=tableau....@gnu.org> on behalf of Daniele Nicolodi <dani...@grinta.net> Date: Friday, 3 July 2020 at 23:15 To: Bison Help <help-bison@gnu.org> Subject: Which lexer do people use? Hello, the historical pairing is using Flex with Bison. However, while Bison is under active development and seems to be a very solid code base, there isn't much activity on the Flex side https://github.com/westes/flex<https://github.com/westes/flex> and Flex codebase and capabilities show their age. I recently became aware of RE/flex https://www.genivia.com/reflex.html<https://www.genivia.com/reflex.html> which seems very promising. However, it only generates a C++ scanner which may be (I haven't tried) to retro-fit into existing C projects to, for example, gain full unicode (in its utf8 encoded form) support. Has anyone tried to hammer a C++ scanner peg generated by RE/flex into a C grammar hole generated by Bison? Which other scanners do people use? Thank you. Cheers, Dan