On 27/06/11 07:26, Jean-Philippe Bernardy wrote:
On Sun, Jun 26, 2011 at 5:34 PM, Reiner Pope<reiner.p...@gmail.com> wrote:
On a similar note, would it be acceptable to join the 'hlRun' and 'hlFocus'
fields. Currently they are:
hlRun :: Scanner Point Char -> Point -> cache -> cache,
hlFocus :: Map WindowRef Region -> cache -> cache
I am not sure... However, maybe you would enjoy the following
explanation on what "focusing" is all about -- I don't think I have
documented it anywhere. Focusing on a particular point allows better
asymptotic performance.
The idea is that focusing is called when a window is "scrolled", so
that the AST does not have to be re-traversed from the root every time
a display is needed. Otherwise, the AST has to be traversed in
pre-order to find the portion that corresponds to the windows, at
every refresh.
You may retort: "Why don't you cache the beginning/end of the regions
corresponding to each node in the AST?" Otherwise, it would break the
"online" properties of the parser. (See
www.cse.chalmers.se/~bernardy/FunctionalIncrementalParsing.pdf for the
explanation of the online parsing)
This makes sense. I haven't really understood the code that implements
this, though.
I read your paper a little while ago and it was very interesting :)
It seems that we could allow simpler parsers (more on this later)
I dare say that a sensible overall plan about parsers should be well
thought out before putting the current code under the knife :). I am
curious to hear your plans! I could also let you know about mine if
you are interested.
Ok, I'll explain in some more detail. Let me first say that my thoughts
on this are only thoughts, and I haven't yet written much code to back
them up. (On first reading, you might want to skip to the closing "---")
---
I'm interested in making a better LaTeX mode for yi, and for this I need
better parsing. Some things I'd eventually like are:
1. Syntax checking with error reporting. For example, if I'd like to
report if \frac{}{} is given too few arguments, or if it is used
outside maths mode.
2. Reference tracking, like RefTeX in Emacs. To support this, we would
need to know where all the \label{} macros are in the document, both
before and after the point.
3. graphical rendering of equations
While incremental parsing seems a good fit for syntax checking, it
doesn't seem such a good fit for the other two applications:
* incremental parsing doesn't provide efficient support for "looking
into the future", as is needed for tracking references
* the rendering chain I've played with for equations is
TeX->MathML->(SVG->)screen. Some basic experiments show that this is
expensive, and it could be significantly faster to cache results
somewhere in the process (either svg or directly in the mathml
rendering library). As far as I'm aware, the incremental parsing can
only cache the parsing progress, and not the results of it.
I thought it might be fun to try something along the lines of Edward
Kmett's talks on monoidal parsing[1], in particular his Bitonic
datatype. The key point about LaTeX (at least the way I write it) is
that it is "almost" context free: roughly speaking, the only non-local
context you need is the stack of \begin{}-\end{} environments you are
inside, which is typically quite shallow.
To handle the above, I would design the parser cache along the following
lines:
-- the tree of \begin{}s and \end{}s, annotated with positions
data ContextTree = ...
-- splittable sequence of all \label{}s in the document
data LabelList = ...
-- splittable sequence of cached renders of maths for the current window
region
data MathRenders
type ParserCache = (ContextStack, LabelList, Map WindowRef MathRenders)
If the user has modified a region, then here's how we update the parser
cache (note that the parser needs to know both the beginning and the end
of the user's modification). All these computations are done strictly:
1. Update the ContextTree. This can be done quickly: by splitting the
tree, we can reuse the parts of the tree from both before and after
the changes. We just need to look for \begin{}s and \end{}s in the
dirty region, and then call "mergeContextTree leftTree dirtyTree
rightTree". Merging trees takes time proportional to the depth of
the trees, which I assume is quite small (<10, typically).
2. Using the now-correct ContextTree, update the LabelList and
MathRenders. These both use the same split-and-merge strategy as the
ContextTree, but note that for these, parsing the dirty region
requires the (correct) ContextTree. (For instance, when generating
MathRenders, we need to know whether we are in maths mode)
(It might be possible to fuse the three traversals of the dirty region
into one.)
Note that I've made the MathRenders cache a per-window cache. I'm not
sure if it's necessary (it would be feasible to maintain an
entire-document splittable-sequence of MathRenders, just like the
LabelList), but it seems a reasonable optimisation to perform: by only
caching the MathRenders for the visible portion of the document, we can
save memory and a little bit of time when splitting the sequences.
I realise such an implementation would be quite complicated and would
have to implement a lot of parsing from scratch, but I'd like to try it
first and then see if I can generalise/abstract.
---
The above is an (unimplemented!) example of a realistic parser which:
* doesn't use the incremental parsing framework;
* doesn't use the Scanner framework, and would perform at least a
little worse if it were forced to use the scanner;
* needs to know the window focuses during parsing;
* needs to know the end of the dirty region, not just the beginning
I would like the Highlighter datatype to support a parser like this. The
Highlighter datatype is already mostly abstracted from the incremental
parsing framework; I'm just looking to go a little bit further.
I am definitely not proposing changes to the implementation of
incremental parsing (I don't fully understand the implementation), I'm
just proposing a change to the interface. Indeed, I can automatically
convert "old" syntax highlighters into "new" ones:
data HighlighterOld cache syntax =
SynHLOld { hlStartCacheOld :: cache,
hlRunOld :: Scanner Point Char -> Point -> cache -> cache,
hlGetTreeOld :: cache -> WindowRef -> syntax,
hlFocusOld :: Map WindowRef Region -> cache -> cache
}
data HighlighterNew cache syntax =
SynHLNew { hlStartCacheNew :: cache,
hlRunNew :: Rope -> -- contents of the whole buffer
Region -> -- dirty region
Map WindowRef Region -> -- visible regions
cache -> cache,
hlGetTreeNew :: cache -> WindowRef -> syntax
}
convert :: HighlighterOld cache syntax -> HighlighterNew cache syntax
convert SynHLOld{..} = SynHLNew {
hlStartCacheNew = hlStartCacheOld,
hlGetTreeNew = hlGetTreeOld,
hlRunNew = \rope rgn focusMap cache -> hlFocusOld focusMap .
hlRunOld (rope2scanner rope) (regionStart rgn) $ cache
}
-- this is currently defined in Yi.Buffer.Implementation under some
other name:
rope2scanner :: Rope -> Scanner Point Char
I hope that makes sense.
I'd certainly be interested in your ideas for how I could implement the
LaTeX parser, and also your general plans for parsers that you mentioned.
which would allow merging hlRun and hlFocus as described above. In fact, I
tested this by moving the calls to 'pureM clearAllSyntaxAndHideSelection'
and 'pureM focusAllSyntax' next to each other, and nothing appears to have
been broken.
Did you also test asymptotic performance as hinted above?
(Quite a pain to do, I'll grant you that much).
No, it was just a very quick test. I don't see why the performance
should change, though.
Cheers,
Reiner
[1] http://comonad.com/reader/2009/iteratees-parsec-and-monoid/
--
Yi development mailing list
yi-devel@googlegroups.com
http://groups.google.com/group/yi-devel