Re: Highlighter data type

Reiner Pope Sun, 26 Jun 2011 18:57:38 -0700

On 27/06/11 07:26, Jean-Philippe Bernardy wrote:

On Sun, Jun 26, 2011 at 5:34 PM, Reiner Pope<reiner.p...@gmail.com>  wrote:

On a similar note, would it be acceptable to join the 'hlRun' and 'hlFocus'
fields. Currently they are:


hlRun :: Scanner Point Char ->  Point ->  cache ->  cache,
hlFocus :: Map WindowRef Region ->  cache ->  cache

I am not sure... However, maybe you would enjoy the following
explanation on what "focusing" is all about -- I don't think I have
documented it anywhere. Focusing on a particular point allows better
asymptotic performance.

The idea is that focusing is called when a window is "scrolled", so
that the AST does not have to be re-traversed from the root every time
a display is needed. Otherwise, the AST has to be traversed in
pre-order to find the portion that corresponds to the windows, at
every refresh.

You may retort: "Why don't you cache the beginning/end of the regions
corresponding to each node in the AST?" Otherwise, it would break the
"online" properties of the parser. (See
www.cse.chalmers.se/~bernardy/FunctionalIncrementalParsing.pdf for the
explanation of the online parsing)

This makes sense. I haven't really understood the code that implementsthis, though.


I read your paper a little while ago and it was very interesting :)

It seems that we could allow simpler parsers (more on this later)

I dare say that a sensible overall plan about parsers should be well
thought out before putting the current code under the knife :). I am
curious to hear your plans! I could also let you know about mine if
you are interested.

Ok, I'll explain in some more detail. Let me first say that my thoughtson this are only thoughts, and I haven't yet written much code to backthem up. (On first reading, you might want to skip to the closing "---")

---

I'm interested in making a better LaTeX mode for yi, and for this I needbetter parsing. Some things I'd eventually like are:


1. Syntax checking with error reporting. For example, if I'd like to
   report if \frac{}{} is given too few arguments, or if it is used
   outside maths mode.
2. Reference tracking, like RefTeX in Emacs. To support this, we would
   need to know where all the \label{} macros are in the document, both
   before and after the point.
3. graphical rendering of equations

While incremental parsing seems a good fit for syntax checking, itdoesn't seem such a good fit for the other two applications:


 * incremental parsing doesn't provide efficient support for "looking
   into the future", as is needed for tracking references
 * the rendering chain I've played with for equations is
   TeX->MathML->(SVG->)screen. Some basic experiments show that this is
   expensive, and it could be significantly faster to cache results
   somewhere in the process (either svg or directly in the mathml
   rendering library). As far as I'm aware, the incremental parsing can
   only cache the parsing progress, and not the results of it.

I thought it might be fun to try something along the lines of EdwardKmett's talks on monoidal parsing[1], in particular his Bitonicdatatype. The key point about LaTeX (at least the way I write it) isthat it is "almost" context free: roughly speaking, the only non-localcontext you need is the stack of \begin{}-\end{} environments you areinside, which is typically quite shallow.

To handle the above, I would design the parser cache along the followinglines:


-- the tree of \begin{}s and \end{}s, annotated with positions
data ContextTree = ...

-- splittable sequence of all \label{}s in the document
data LabelList = ...

-- splittable sequence of cached renders of maths for the current windowregion

data MathRenders

type ParserCache = (ContextStack, LabelList, Map WindowRef MathRenders)

If the user has modified a region, then here's how we update the parsercache (note that the parser needs to know both the beginning and the endof the user's modification). All these computations are done strictly:


1. Update the ContextTree. This can be done quickly: by splitting the
   tree, we can reuse the parts of the tree from both before and after
   the changes. We just need to look for \begin{}s and \end{}s in the
   dirty region, and then call "mergeContextTree leftTree dirtyTree
   rightTree". Merging trees takes time proportional to the depth of
   the trees, which I assume is quite small (<10, typically).
2. Using the now-correct ContextTree, update the LabelList and
   MathRenders. These both use the same split-and-merge strategy as the
   ContextTree, but note that for these, parsing the dirty region
   requires the (correct) ContextTree. (For instance, when generating
   MathRenders, we need to know whether we are in maths mode)

(It might be possible to fuse the three traversals of the dirty regioninto one.)

Note that I've made the MathRenders cache a per-window cache. I'm notsure if it's necessary (it would be feasible to maintain anentire-document splittable-sequence of MathRenders, just like theLabelList), but it seems a reasonable optimisation to perform: by onlycaching the MathRenders for the visible portion of the document, we cansave memory and a little bit of time when splitting the sequences.

I realise such an implementation would be quite complicated and wouldhave to implement a lot of parsing from scratch, but I'd like to try itfirst and then see if I can generalise/abstract.

---

The above is an (unimplemented!) example of a realistic parser which:

 * doesn't use the incremental parsing framework;
 * doesn't use the Scanner framework, and would perform at least a
   little worse if it were forced to use the scanner;
 * needs to know the window focuses during parsing;
 * needs to know the end of the dirty region, not just the beginning

I would like the Highlighter datatype to support a parser like this. TheHighlighter datatype is already mostly abstracted from the incrementalparsing framework; I'm just looking to go a little bit further.

I am definitely not proposing changes to the implementation ofincremental parsing (I don't fully understand the implementation), I'mjust proposing a change to the interface. Indeed, I can automaticallyconvert "old" syntax highlighters into "new" ones:


data HighlighterOld cache syntax =
 SynHLOld { hlStartCacheOld :: cache,
            hlRunOld :: Scanner Point Char -> Point -> cache -> cache,
            hlGetTreeOld :: cache -> WindowRef -> syntax,
            hlFocusOld :: Map WindowRef Region -> cache -> cache
          }

data HighlighterNew cache syntax =
  SynHLNew { hlStartCacheNew :: cache,
             hlRunNew :: Rope ->    -- contents of the whole buffer
                         Region ->  -- dirty region
                         Map WindowRef Region -> -- visible regions
                         cache -> cache,
             hlGetTreeNew :: cache -> WindowRef -> syntax
  }

convert :: HighlighterOld cache syntax -> HighlighterNew cache syntax
convert SynHLOld{..} = SynHLNew {
    hlStartCacheNew = hlStartCacheOld,
    hlGetTreeNew = hlGetTreeOld,

hlRunNew = \rope rgn focusMap cache -> hlFocusOld focusMap .hlRunOld (rope2scanner rope) (regionStart rgn) $ cache

-- this is currently defined in Yi.Buffer.Implementation under someother name:

rope2scanner :: Rope -> Scanner Point Char

I hope that makes sense.

I'd certainly be interested in your ideas for how I could implement theLaTeX parser, and also your general plans for parsers that you mentioned.

which would allow merging hlRun and hlFocus as described above. In fact, I
tested this by moving the calls to 'pureM clearAllSyntaxAndHideSelection'
and 'pureM focusAllSyntax' next to each other, and nothing appears to have
been broken.

Did you also test asymptotic performance as hinted above?
(Quite a pain to do, I'll grant you that much).

No, it was just a very quick test. I don't see why the performanceshould change, though.


Cheers,
Reiner

[1] http://comonad.com/reader/2009/iteratees-parsec-and-monoid/

--
Yi development mailing list
yi-devel@googlegroups.com
http://groups.google.com/group/yi-devel

Re: Highlighter data type

Reply via email to