For someone who really wants to learn this well, rather than just make a lexer and move on...

I started learning the pertinent theory (such as automata NFA/DFA, and classes of formal languages) from the original red dragon book:
https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools

After that background reading... In Racket, a very simple way to start learning, for a tokenizer requiring no lookahead beyond immediate character... is to have your input come from a Racket input-port, code a DFA that in each state uses Racket `read-char` to get the next character and does `cond` and `case` based on that char, to decide which state procedure to call next with a tail call. Also, start by having your tokenizer procedure return a single token per call, not a list of multiple tokens.

That's a very simple way, and then you'll start to see ways to handle tokens requiring lookahead, and start to see performance optimizations you could do (there are many, depending on the language).

Other approaches you'll see people suggest often conflate different classes of languages, add complexity unnecessarily, and obscure the mechanics. Starting with the simplest approach might give you a better understanding, which does come in handy when you're later trying to write performance-sensitive parsing code (such as for networking protocols).

Also, don't be in a hurry to implement syntax extension DSLs for this, until you're comfortable with different classes of languages and with optimizations -- just keep doing it with plain Racket code, until you know how to do it "manually". (An LALR parser is a different matter, and you might want to do a DSL or abstracting procedure for that sooner, but right now you're just doing simple lexing.)

BTW, I recommend keeping your Racket symbols all-lowercase, not uppercase. Uppercase symbols are *way* too useful as Racket syntax transformer pattern variables, to use them for any other purpose.

Neil V.

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to