This post will be pre-writing for a theory of operation of Leo's new (V2) 
import code.  *It will be of interest only to those who might want to 
create their own importer*.  Feel free to skip. If you read on, I hope to 
convince you that creating new importers is straightforward.

This is a long post. tl;dr: as usual, skip to the summary.

While writing this post, I realized that *details get in the way of 
understanding*.  So I am going to keep them to a minimum here.  Really :-) 
Otoh, this post really does tell you *everything* you need to know to write 
a new state scanner.


*Progress Report*
The V2 code is already a great success. The javascript importer is 
working.  It's the easiest case because it generates section references 
instead of @others. The perl adapter does generate @others.  I expect to 
complete it soon. Python is the hardest case. It may take a day or two more.

v2_gen_lines is crucial method. It is even simpler than envisioned in the 
middle-of-the-night 
post <https://groups.google.com/d/msg/leo-editor/lm0x1pDZFy8/TkY-UfDTBAAJ>.

The latest version of v2_gen_lines is here on GitHub 
<https://github.com/leo-editor/leo-editor/blob/master/leo/plugins/importers/basescanner.py>.
 
Once on the page, search for def v2_gen_lines. If you have access to Leo, 
you can study this code in leoPluginsRef.leo: Plugins-->Importer 
plugins-->@file importers/basescanner.py

*Executive Overview*

On Oct 28 I documented the V1 code here 
<https://groups.google.com/d/msg/leo-editor/RDi2jffWjzI/K-mh4H5QBQAJ>. The 
big picture remains unchanged:

1. Both the V1 and V2 importers copy *entire lines* from a input file to 
Leo nodes. This makes the new importers much less error prone than the 
legacy (character-by-character) importers.

2. These importers know *nothing *about the language being imported. They 
know *only* how to scan tokens *accurately.*  This makes the line-oriented 
importers simple and robust.

3. Importers are simple to write because hidden *infrastructure* in 
importers.basescanner.py handles most details.

*The scanning machine*

Leo's scanning code is like a mechanical contraption containing three 
(different-sized) slots in various places. Each slot holds a different 
*cassette 
*that controls part of the machine's operation. The picture is this: to 
adapt the machine to a new task, you remove the old cassettes from their 
slots and replace them with new cassettes.

To change what the machine does, you *don't* have to understand the innards 
of the machine! You only have to know how to create new cassettes. 

Similarly, Leo importers consist of three *simple* adapter classes, a 
*controller*, a *scanner*, and a *state*, all defined in a same file. These 
classes are the cassettes that modify the infrastructure.

You can easily define an importer for a new language X *by using the 
classes from an existing importer as a template*. Just create controller, 
scanner and state classes in leo/plugins/importers/x.py.

Leo's scanning machine is *not* a pipeline, but it *is* easily customizable.

The following sections discuss the three adapter classes in more detail.* 
Please *study the javascript importer on GitHub 
<https://github.com/leo-editor/leo-editor/blob/master/leo/plugins/importers/javascript.py>as
 
you read along. If you have access to Leo, the javascript importer is in 
leoPluginsRef.leo: Plugins-->Importer plugins-->@file 
importers/javascript.py

*The controller*

The javascript controller is JS_ImportController.  It consists only of a 
ctor that inits the base class with various keyword arguments. Most will go 
away once the changeover to the V2 code is complete. The only important 
argument is:

    state = JS_Scanner(c)

The argument tells the infrastructure what the scanner class is.  Yes, the 
keyword arg should be "scanner", but that can't happen until after the 
changeover.  Sigh.

*The state*

The javascript state is JS_ScanState.  This state should *not* be a 
subclass of any other state class.

States contain:

1. *State data*.  A context, and one or more counts.

*Important*: the context is non-empty if and only if the line being scanned 
is contained in a multiline string or comment or some other special case.

The javascript importer needs to keep track of both curly brackets and 
parens, so this class contains .context, .curlies and .parens ivars.

2. *Rich comparisons*, __eq__, __gt__, etc. 

v2_gen_line's helper, cut_stack, uses these to compare the new state to the 
states on a stack.  As shown in 3. below, the begins_block and 
continues_block methods often use these methods.

These comparisons are a bit tricky for javascript because there are two 
counts involved.  The count of curly brackets overrides the paren count.

*Important*: The __eq__ method must return True if self.context is 
non-empty.  This ensures that we never change blocks in the middle of a 
multi-line construct.

3. *begins_block and continues_block methods*. As their name implies, these 
methods tell v2_gen_line whether the just scanned line should remain in the 
present block (node), start a new node, or terminate one or more nodes.

For most (all?) languages except Python, these methods can be defined in 
terms of the rich comparisons:

def v2_continues_block(self, prev_state):
    '''Return True if the just-scanned lines should be placed in the inner 
block.'''
    return self == prev_state

def v2_starts_block(self, prev_state):
    '''Return True if the just-scanned line starts an inner block.'''
    return self > prev_state

*The scanner*

The javascript scanner is JS_Scanner.  It must be a subclass LineScanner so 
that it can init and access the infrastructure. For V2, the ctor just inits 
the base class.

The javascript scanner defines the v2_scan_line method, and its helper, 
skip_possible_regex. v2_scan_line computes a new state is given the 
previous state and the next input line.

*Summary*

- importers/javascript.py 
<https://github.com/leo-editor/leo-editor/blob/master/leo/plugins/importers/javascript.py>
 
is the javascript importer.

- importers/basescanner.p 
<https://github.com/leo-editor/leo-editor/blob/master/leo/plugins/importers/basescanner.py>y
 
contains all the import infrastructure.

- To create an importer for a new language, *use the classes from an 
existing importer as a template*.

- All importers define three classes: a controller, a scanner, and a state.

- The *controller *class tells the infrastructure what scanner to use.

- The *scanner *class inits the infrastructure. This class must define the 
all-important v2_scan_line method.  This method returns a new state, given 
the previous state and the next input line.

- The *state *class defines a context and one or more counts.  *The context 
is non-empty whenever a line ends inside a multi-line comment or string or 
starts a continued line*.

The state class must also define a full set of rich comparison operators 
and the starts_block and continues_block methods. The 
starts/continues_block methods are usually defined in terms of rich 
comparisons.

- Recent work has simplified the interfaces between the infrastructure and 
the controller, scanner and state classes.  That work seems complete, so 
the interfaces (and these docs) are not likely to change significantly.

- The *code* will collapse further once the changeover to the V2 base is 
complete. ### comments highlight code that will disappear after we 
transition to V2.

We have come a long long way since the legacy character-based parser code. 
Again, I'm writing this in the middle of the night.  It's hard to contain 
my excitement.

And that's it. All question and comments welcome. 

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

Reply via email to