On 13/08/2019 16:32, Brad Gilbert wrote:

> Perhaps the biggest one may be the one about passing around “fates”.
> (I barely understand the basics of this.)


The optimization opportunity Brad is refering to here is relevant mostly
to grammars with deeply nested multi-tokens:

Longest-Token-Matching is implemented by creating an NFA
(nondeterministic finite automaton) out of all the longest declarative
prefixes that go into a given multi, be it with explicit | operators or
with "proto token"/"multi token" etc.

The NFA runs and collects a list of potential decision outcomes that are
valid. Think of them simply as "which method to call" and ignore
explicit | operators for the moment.

An important thing to note about the NFA that get created is that in
order to have the actual full "longest declarative prefix", they need to
incorporate every subrule that can be reached.

Take this simplified example:

A <term> can be a <literal>, a <variable>, a <subroutinecall>, etc.
a <literal> can be an <integer>, a <floatingpoint>, a <complex>, etc.
an <integer> can be <simpleInteger> or <basedInteger>
a <simpleinteger> is basically <[-+]><[1..9]><[0..9]>*
a <basedInteger> is basically <[-+]>0<[obdx]><[1..9 a..f]><[0..9 a..f]>*

Now what does the regex engine currently do when it sees "0xaffe"?

 1. It runs the NFA for <term>, which is able to successfully match the
    entirety of "0xaffe" and points towards "literal" as the result.
 2. Then it calls the <literal> subrule.
 3. It runs the NFA for <literal>, which is able to successfully match
    the entirety of "0xaffe" and points towards "integer" as the result.
 4. Then it calls the <integer> subrule.
 5. It runs the NFA for <integer>, which is able to successfully match
    the entirety of "0xaffe" and points towards "basedInteger" as the
    result.
 6. Then it calls the <basedInteger> subrule.
 7. Then it parses the whole thing.

As you can see, it runs three separate NFAs against the same text,
starting before 0 and ending after e.

The idea of passing on fates is that the first NFA already knows that
after <literal> is called, it will directly call into <integer>, and
then directly into <basedInteger>. However, it does not currently have a
way to communicate deeper results to the regex engine.

I hope that explanation makes sense even if you're not deep in the regex
engine with your head already.

Kind Regards
  - Timo

Reply via email to