On 27/06/11 07:26, Jean-Philippe Bernardy wrote:
On Sun, Jun 26, 2011 at 5:34 PM, Reiner Pope<reiner.p...@gmail.com>  wrote:
On a similar note, would it be acceptable to join the 'hlRun' and 'hlFocus'
fields. Currently they are:

hlRun :: Scanner Point Char ->  Point ->  cache ->  cache,
hlFocus :: Map WindowRef Region ->  cache ->  cache
I am not sure... However, maybe you would enjoy the following
explanation on what "focusing" is all about -- I don't think I have
documented it anywhere. Focusing on a particular point allows better
asymptotic performance.

The idea is that focusing is called when a window is "scrolled", so
that the AST does not have to be re-traversed from the root every time
a display is needed. Otherwise, the AST has to be traversed in
pre-order to find the portion that corresponds to the windows, at
every refresh.

You may retort: "Why don't you cache the beginning/end of the regions
corresponding to each node in the AST?" Otherwise, it would break the
"online" properties of the parser. (See
www.cse.chalmers.se/~bernardy/FunctionalIncrementalParsing.pdf for the
explanation of the online parsing)
This makes sense. I haven't really understood the code that implements this, though.

I read your paper a little while ago and it was very interesting :)
It seems that we could allow simpler parsers (more on this later)
I dare say that a sensible overall plan about parsers should be well
thought out before putting the current code under the knife :). I am
curious to hear your plans! I could also let you know about mine if
you are interested.
Ok, I'll explain in some more detail. Let me first say that my thoughts on this are only thoughts, and I haven't yet written much code to back them up. (On first reading, you might want to skip to the closing "---")
---
I'm interested in making a better LaTeX mode for yi, and for this I need better parsing. Some things I'd eventually like are:

1. Syntax checking with error reporting. For example, if I'd like to
   report if \frac{}{} is given too few arguments, or if it is used
   outside maths mode.
2. Reference tracking, like RefTeX in Emacs. To support this, we would
   need to know where all the \label{} macros are in the document, both
   before and after the point.
3. graphical rendering of equations

While incremental parsing seems a good fit for syntax checking, it doesn't seem such a good fit for the other two applications:

 * incremental parsing doesn't provide efficient support for "looking
   into the future", as is needed for tracking references
 * the rendering chain I've played with for equations is
   TeX->MathML->(SVG->)screen. Some basic experiments show that this is
   expensive, and it could be significantly faster to cache results
   somewhere in the process (either svg or directly in the mathml
   rendering library). As far as I'm aware, the incremental parsing can
   only cache the parsing progress, and not the results of it.

I thought it might be fun to try something along the lines of Edward Kmett's talks on monoidal parsing[1], in particular his Bitonic datatype. The key point about LaTeX (at least the way I write it) is that it is "almost" context free: roughly speaking, the only non-local context you need is the stack of \begin{}-\end{} environments you are inside, which is typically quite shallow.

To handle the above, I would design the parser cache along the following lines:

-- the tree of \begin{}s and \end{}s, annotated with positions
data ContextTree = ...

-- splittable sequence of all \label{}s in the document
data LabelList = ...

-- splittable sequence of cached renders of maths for the current window region
data MathRenders

type ParserCache = (ContextStack, LabelList, Map WindowRef MathRenders)

If the user has modified a region, then here's how we update the parser cache (note that the parser needs to know both the beginning and the end of the user's modification). All these computations are done strictly:

1. Update the ContextTree. This can be done quickly: by splitting the
   tree, we can reuse the parts of the tree from both before and after
   the changes. We just need to look for \begin{}s and \end{}s in the
   dirty region, and then call "mergeContextTree leftTree dirtyTree
   rightTree". Merging trees takes time proportional to the depth of
   the trees, which I assume is quite small (<10, typically).
2. Using the now-correct ContextTree, update the LabelList and
   MathRenders. These both use the same split-and-merge strategy as the
   ContextTree, but note that for these, parsing the dirty region
   requires the (correct) ContextTree. (For instance, when generating
   MathRenders, we need to know whether we are in maths mode)

(It might be possible to fuse the three traversals of the dirty region into one.)

Note that I've made the MathRenders cache a per-window cache. I'm not sure if it's necessary (it would be feasible to maintain an entire-document splittable-sequence of MathRenders, just like the LabelList), but it seems a reasonable optimisation to perform: by only caching the MathRenders for the visible portion of the document, we can save memory and a little bit of time when splitting the sequences.

I realise such an implementation would be quite complicated and would have to implement a lot of parsing from scratch, but I'd like to try it first and then see if I can generalise/abstract.
---

The above is an (unimplemented!) example of a realistic parser which:

 * doesn't use the incremental parsing framework;
 * doesn't use the Scanner framework, and would perform at least a
   little worse if it were forced to use the scanner;
 * needs to know the window focuses during parsing;
 * needs to know the end of the dirty region, not just the beginning

I would like the Highlighter datatype to support a parser like this. The Highlighter datatype is already mostly abstracted from the incremental parsing framework; I'm just looking to go a little bit further.

I am definitely not proposing changes to the implementation of incremental parsing (I don't fully understand the implementation), I'm just proposing a change to the interface. Indeed, I can automatically convert "old" syntax highlighters into "new" ones:

data HighlighterOld cache syntax =
 SynHLOld { hlStartCacheOld :: cache,
            hlRunOld :: Scanner Point Char -> Point -> cache -> cache,
            hlGetTreeOld :: cache -> WindowRef -> syntax,
            hlFocusOld :: Map WindowRef Region -> cache -> cache
          }

data HighlighterNew cache syntax =
  SynHLNew { hlStartCacheNew :: cache,
             hlRunNew :: Rope ->    -- contents of the whole buffer
                         Region ->  -- dirty region
                         Map WindowRef Region -> -- visible regions
                         cache -> cache,
             hlGetTreeNew :: cache -> WindowRef -> syntax
  }

convert :: HighlighterOld cache syntax -> HighlighterNew cache syntax
convert SynHLOld{..} = SynHLNew {
    hlStartCacheNew = hlStartCacheOld,
    hlGetTreeNew = hlGetTreeOld,
hlRunNew = \rope rgn focusMap cache -> hlFocusOld focusMap . hlRunOld (rope2scanner rope) (regionStart rgn) $ cache
  }

-- this is currently defined in Yi.Buffer.Implementation under some other name:
rope2scanner :: Rope -> Scanner Point Char

I hope that makes sense.

I'd certainly be interested in your ideas for how I could implement the LaTeX parser, and also your general plans for parsers that you mentioned.

which would allow merging hlRun and hlFocus as described above. In fact, I
tested this by moving the calls to 'pureM clearAllSyntaxAndHideSelection'
and 'pureM focusAllSyntax' next to each other, and nothing appears to have
been broken.
Did you also test asymptotic performance as hinted above?
(Quite a pain to do, I'll grant you that much).

No, it was just a very quick test. I don't see why the performance should change, though.

Cheers,
Reiner

[1] http://comonad.com/reader/2009/iteratees-parsec-and-monoid/

--
Yi development mailing list
yi-devel@googlegroups.com
http://groups.google.com/group/yi-devel

Reply via email to