For someone who really wants to learn this well, rather than just make a
lexer and move on...
I started learning the pertinent theory (such as automata NFA/DFA, and
classes of formal languages) from the original red dragon book:
https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools
After that background reading... In Racket, a very simple way to start
learning, for a tokenizer requiring no lookahead beyond immediate
character... is to have your input come from a Racket input-port, code a
DFA that in each state uses Racket `read-char` to get the next character
and does `cond` and `case` based on that char, to decide which state
procedure to call next with a tail call. Also, start by having your
tokenizer procedure return a single token per call, not a list of
multiple tokens.
That's a very simple way, and then you'll start to see ways to handle
tokens requiring lookahead, and start to see performance optimizations
you could do (there are many, depending on the language).
Other approaches you'll see people suggest often conflate different
classes of languages, add complexity unnecessarily, and obscure the
mechanics. Starting with the simplest approach might give you a better
understanding, which does come in handy when you're later trying to
write performance-sensitive parsing code (such as for networking protocols).
Also, don't be in a hurry to implement syntax extension DSLs for this,
until you're comfortable with different classes of languages and with
optimizations -- just keep doing it with plain Racket code, until you
know how to do it "manually". (An LALR parser is a different matter,
and you might want to do a DSL or abstracting procedure for that sooner,
but right now you're just doing simple lexing.)
BTW, I recommend keeping your Racket symbols all-lowercase, not
uppercase. Uppercase symbols are *way* too useful as Racket syntax
transformer pattern variables, to use them for any other purpose.
Neil V.
--
You received this message because you are subscribed to the Google Groups "Racket
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.