On 13/08/2019 16:32, Brad Gilbert wrote: > Perhaps the biggest one may be the one about passing around “fates”. > (I barely understand the basics of this.)
The optimization opportunity Brad is refering to here is relevant mostly to grammars with deeply nested multi-tokens: Longest-Token-Matching is implemented by creating an NFA (nondeterministic finite automaton) out of all the longest declarative prefixes that go into a given multi, be it with explicit | operators or with "proto token"/"multi token" etc. The NFA runs and collects a list of potential decision outcomes that are valid. Think of them simply as "which method to call" and ignore explicit | operators for the moment. An important thing to note about the NFA that get created is that in order to have the actual full "longest declarative prefix", they need to incorporate every subrule that can be reached. Take this simplified example: A <term> can be a <literal>, a <variable>, a <subroutinecall>, etc. a <literal> can be an <integer>, a <floatingpoint>, a <complex>, etc. an <integer> can be <simpleInteger> or <basedInteger> a <simpleinteger> is basically <[-+]><[1..9]><[0..9]>* a <basedInteger> is basically <[-+]>0<[obdx]><[1..9 a..f]><[0..9 a..f]>* Now what does the regex engine currently do when it sees "0xaffe"? 1. It runs the NFA for <term>, which is able to successfully match the entirety of "0xaffe" and points towards "literal" as the result. 2. Then it calls the <literal> subrule. 3. It runs the NFA for <literal>, which is able to successfully match the entirety of "0xaffe" and points towards "integer" as the result. 4. Then it calls the <integer> subrule. 5. It runs the NFA for <integer>, which is able to successfully match the entirety of "0xaffe" and points towards "basedInteger" as the result. 6. Then it calls the <basedInteger> subrule. 7. Then it parses the whole thing. As you can see, it runs three separate NFAs against the same text, starting before 0 and ending after e. The idea of passing on fates is that the first NFA already knows that after <literal> is called, it will directly call into <integer>, and then directly into <basedInteger>. However, it does not currently have a way to communicate deeper results to the regex engine. I hope that explanation makes sense even if you're not deep in the regex engine with your head already. Kind Regards - Timo