ENB: An excellent new scanning pattern

Edward K. Ream Wed, 30 Nov 2016 06:00:19 -0800

This is an Engineering Notebook post.  Feel free to ignore.  However, it 
contains a new coding pattern, for you code affectionados.


*tl;dr:* Setting a skip count encourages multi-line pattern matching. This 
can drastically simplify some importers.

*A new scanning pattern*

The new rst importer overrides i.v2_gen_lines.  Here is the interesting 
part:

    skip = 0
    lines = g.splitLines(s)
    for i, line in enumerate(lines):
        if trace: g.trace('%2s %r' % (i+1, line))
        if skip > 0:
            skip -= 1
        elif self.is_lookahead_overline(i, lines):
            level = self.ch_level(line[0])
            self.make_node(level, lines[i+1])
            skip = 2
        elif self.is_lookahead_underline(i, lines):
            level = self.ch_level(lines[i+1][0])
            self.make_node(level, line)
            skip = 1
        elif i == 0:
            p = self.make_dummy_node('!Dummy chapter')
            self.add_line(p, line)
        else:
            p = self.stack[-1]
            self.add_line(p, line)

There are several things about this code worth special mention:

1. Unlike most importers, this truly is line-oriented code.  The enumerate 
loop handles each line without *any* character scanning.  Actually, that's 
not quite true. Lines are scanned to determine whether they are 
under/overlines, but that's a nit...

2. The truly clever thing about this code is that it uses a skip count.  
This allows the code to look ahead *naturally*. The lookahead methods scan, 
as their names imply, lines *after* the line returned by enumerate.

The lookahead methods are very simple pattern matchers.  When they match, 
it is easy for the main line code to deal with the match, using 1 or two 
*following* lines. The skip logic then ensures that the main line never 
looks at the lines a second time.

Folks, this is important. Without it, matching patterns becomes a complex 
mess. The markdown importer is an example.

3. The skip pattern allows us to use enumerate more often.  Previously, I 
have used this kind of code when the 'i' variable can change:

    i = 0
    lines = g.splitLines(s)
    while i < len(lines):
        progress = i
        line = lines[i]
        << code that may change i >>
        assert progress < i, (i, line)

It's not terrible, but this is *so* much better:

    skip = 0
    lines = g.splitLines(s)
    for i, line in enumerate(lines):
        if skip:
            skip -= 1
        elif lookahead_n_lines(lines):
                # Handle n lines.
                skip = n
        elif lookahead_m_lines(lines):
                # Handle m lines.
                skip = m
        else:
            ...

*Summary*

The new coding pattern encourages multi-line pattern matching.  This can 
*drastically* simplify code.

The new coding pattern makes it possible to use enumerate in more 
situations. Not huge, but not nothing.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

ENB: An excellent new scanning pattern

Reply via email to