How is peasy (https://github/chaosim/peasy) simpler and more powerful than 
other parser tools?

Simpler and more powerful? Maybe it is a bit contrary to common sense. True or 
false? To affirm, please give this project (https://github.com/chaosim/peasy) a 
glimpse at first. Because of being simple , you can comprehend it in a little 
while; because of being powerful, it will deserve your every minute.

All along, the design of programming languages ​​is a complex task. Because we 
need to understand the esoteric compiler theory and technology, and one of the 
most critical and very difficult part is to define the rules of the new 
language and to parse with them.To solve this problem, there have been many 
theories , techniques and tools . These tools can be roughly divided into two 
categories: one is parser generator, another is to write the parser by hand 
with or without a parser library.

The parser generator generally based on some type of formal languages ​​, by 
which the generator make the lexer and parser from some rules. Strong aspect of 
these tools is that they can produce the most efficient parser support for 
their own form of language types , according to the characteristics of formal 
language , generally ensures linear time complexity. Commonly divided into 
three types : LR technology , LL technology , PEG technology. LR technology 
language has shift/reduction , from the bottom up , the most right derived and 
other technical features , they can only support LR languages ​​, more 
accurately, mainly LALR variants , plus some special treatment for the priority 
of operator. One of the most famous was undoubtedly the lex / yacc and its 
numerous subsequent variants ( such as bison, jison (javascript language ), ply 
(python lex / yacc) , etc. ) . LL technology has a top-down form of the 
language , recursive descent , the most left-dereived and other technical 
features , although the concept is easier to understand than the LR language , 
but because LL covers less languages, so not used as universal as lex / yacc. 
Representative products are antlr. PEG technology refers to parsing expression 
grammar parser generator. peg is a kind of formal grammar particularly suited 
to be parsed, currently there are already many tools appear. The most common 
method is packrat  algorithm based implementations. Such as rats, ometa (there 
are many versions , such as ometa / squeak, ometa-js, pyMeta etc. ), pyPEG 
under python, peg.js  under javascript, treetop and citrus under ruby, and so 
on.

No matter what type of parser generator , the method to use them is to design 
rules for these tools , and then the tools generate parser. This is the 
difficult aspects. First, the object program is generated based on the state 
transfer and stack process, it is difficult to understand, debug and modify. 
Second, we must understand parser theory and technology of these tools is based 
on, the rules used by these tools is actually a domain-specific language , 
their expressivenes is very limited, while we must understand and become 
familiar with the new language in order to design rules. Third, we must embed 
certain processing ( abstract syntax tree structure , semantics, error 
reporting and handling , etc) in the form of object parser language  into 
lex/parser rules. Difficulties in these areas prevent the most programmers to 
easily use this type of tool.
    
Meanwhile, people have also been pursuing a more relaxed and flexible approach 
to write parsers. Most of these methods produce tools can be classified as 
combinational parser library , or peg grammar-based peg library . Use of such 
libraries, programmers can use their own language in daily use to design a new 
universal language , parsing text, so they are used more freguently. 
Representative works is parsec under haskell language  which maybe is the most 
mature and powerful. However, because haskell is too academic, and parsec is 
not universally popular. c + + language has boost phoenix library, probably 
because it depends on the c++ template, an advanced language features, it has 
not been widely used too. Python has pyparsing, which is used by many users. 
For specific questions , there are many applications do not use any tools or 
libraries, but manually write the entire parser , for example: cpython 
implementation, esprima for javascript. However, unfortunately,  in their 
grammar and parsing of these tools, there are two obvious difficulties not been 
solved: the first is left recursive grammar, because left-recursive function 
will lead to an unconditional infinite recursion. The second is the parsing 
efficiency. In order to obtain a linear time complexity , we should only parse 
same syntax component only once at any position. To achieve the latter, the 
result of parsing is required. Meanwhile , the parser also need to be 
considered in conjunction to produce abstract syntax trees , semantic 
processing , error handling , etc., and these libraries maybe try to keep away 
from some of the problems (in the combined libraries left recursion are 
generally fobidden), or in order to solve these problems the program becomes 
very complex , difficult to understand , use, modify and extend.

Can be seen from the above description , parsing is a very common programming 
needs , but also has a very high technical difficulty , resulting in a large 
number of theories, technique, and tools you have seen above, in this regard a 
variety of technical papers, code and tools can be used simply to describe the 
voluminous , but still did not achieve the desired results .


with peasy, a complete and elegant solution to all these problems emerge for 
the first time. On one hand you can say peasy the most simple and elegant , but 
also has the most powerful and adaptability , and does not need to sacrifice 
speed and efficiency.
 Its characteristics are as follows:

* for the first time, peasy provides a simple and complete solution to the 
problem of left recursion in the hand-written recursive descent parser manner, 
no matter direct or indirect left-recursive rule . To handle left recursive in 
peasy, only one function is needed (only 20 lines of code in coffeescript).
* peasy allows programmers to write the parser with common programming 
language, and the parser and the input to the grammar rules are similar to the 
parser generator , both concise and readable , easy to modify and extend.
* the simple and flexible caching mechanism in peasy can improve the time 
efficiency of parsing, we can maintain a linear time for liner time complexity 
grammar by caching. For complex grammar and data , we can also improve time 
efficiency by caching mechanism, and avoide the exponential, cube or square 
time complexity like some other common parsing algorithm. peasy's cache 
implementation is very simple, only one function (only ten lines of code under 
coffeescript  ) .
* the entire peasy implementation is very simple. only two hundred lines of 
code under cofeescript. The concept is simple. Only two kind of components: 
some match functions which operate on the parsed data and parsing pointer and 
some combinational functions which generate match function. all of these 
functions are very simple , usually only a few lines of code , And they exists 
in the code of peasy for demonstration objective, can be deleted, modified and 
replaced, according to the specific needs.

In fact, instead of to be considered as a library or a tool, maybe it's better 
to view peasy as a method, a way or some example to write parsers mannually. 
This will allow for a greater degree of power.

peasy provide specific examples to support the above conclusion. The most 
obvious example is the common programming languages ​​in order to adapt and 
modify their parser tool to prevent left- recursive grammar . For example 
python grammar (http://docs.python.org/2/reference/grammar.html), javascript 
grammar (http://www-archive.mozilla.org/js/language/grammar14.html), while by 
using peasy, all binary arithmetic expression can be condensed into a simple 
left -recursive rules , left recursion elimination is not needed any more. For 
expression parsing, in order to further improve time efficiency , peasy also 
provides an example , has many different priorities for computing the call 
stack example can skip directly to the highest priority constant direct 
expression resolves to the final rather than, as is currently common practice, 
the rules of writing as follows : or -> (or | | and) | and; ...; add -> add + 
mul | mul; mul -> mul * atom | atom. Rare is , peasy while achieving enhanced 
capabilities of these expressions can also ensure that the linear time 
complexity of the parsing process. Want to know how peasy achieve these 
results? Please give a glimpse to peasy on 
github(https://github.com/chaosim/peasy).

the first two of following links is the messages I posted when I started to get 
the inspiration to start writing peasy. Because there are some of other tasks 
to finish, intermediate interrupted for some time. Not long ago, I use peasy to 
parse something in another projects , and I further optimize the api of peasy, 
and which is now very simple and elegant, and I am satisfied :) As for the most 
critical left recursive , and caching algorithms , withstood the subsequent 
examples of the test , there is no any change. Together, these examples also 
continues to demonstrate the advantages of peasy .

Finally, I need to mention the impact of language on thought: Without 
coffeescript language, maybe it's difficult for me to have these ideas. in 
coffeescript,  the -> function notation and "assignment is an expression" 
making special concise readable grammar. In python similar results can be 
achieved, need to go through some twists and turns, and that is not so natural 
as in coffeescript.There should exists no big obstacle in other dynamic 
languages, but they is not as elegant as coffeescript tool. This point is 
obvious with the difference between the javascript code of peasy and its 
coffeescript source.

messages I posted in google groups before:
https://groups.google.com/forum/#!search/left$20recursive/comp.lang.javascript/GN7b9Tr6j98/DoiHzi9i77oJ
https://groups.google.com/forum/#!search/%E5%B7%A6%E9%80%92%E5%BD%92/python-cn/Uqr-Xf3I3LM/oUd3Sj4HvxYJ


lr grammar and lex/yacc, bison, ply
http://en.wikipedia.org/wiki/LR_parser
http://dinosaur.compilertools.net/
http://en.wikipedia.org/wiki/Yacc
http://www.dabeaz.com/ply/
http://www.gnu.org/software/bison/
http://zaach.github.io/jison/

LL grammar
http://en.wikipedia.org/wiki/LL_parser
http://www.antlr.org/


some one asked for parser to handle left recursion on SO:
http://stackoverflow.com/questions/4397039/any-peg-parser-capable-to-handle-left-recursion


peg and packrat
http://bford.info/packrat/
http://en.wikipedia.org/wiki/Parsing_expression_grammar
http://cs.nyu.edu/rgrimm/xtc/rats-intro.html
http://java-source.net/open-source/parser-generators/rats!

phoenix 
http://www.boost.org/doc/libs/1_55_0/libs/spirit/phoenix/doc/html/index.html
http://www.ontolinux.com/community/phoenix/Phoenix_Manual.pdf

ometa:
Alessandro Warth describe a method to handle left recursion in peg grammar in 
his PhD paper firstly.
After I have my idea about Peasy and algorith to handle left recursion, I 
searched and found his paper, and found my algorithm is actully similar to his 
method.
http://tinlizzie.org/ometa/
Memoization in Top-Down Parsing
http://www.tinlizzie.org/~awarth/johnson.html

https://github.com/alexwarth/ometa-js

parsing under python
https://wiki.python.org/moin/LanguageParsing

parser for javascript or in javascript
http://esprima.org/
http://zaach.github.io/jison/
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to