On Feb 23, 11:37 am, rjf <fate...@gmail.com> wrote:
> On Feb 23, 9:17 am, "Dr. David Kirkby" <david.kir...@onetel.net>
> wrote:
>
> > On 02/22/11 10:57 PM, Dr. David Kirkby wrote:
>
> > > On 02/22/11 03:49 PM, rjf wrote:
> [snip]. The real difficulty is
> > >> to implement a Mathematica language parser, since the language
> > >> fails to fit the standard expectations for computer languages.

It does?  It is a context-free langauge, therefore parsable by
any parser capable of parsing context free langauges.

> > > I know you said that, but I've heard different from another source. See
> > >http://groups.google.com/group/comp.compilers/msg/8c4e6ccad3c40599
> > > The person there, who is the CTO of a company producing this
> > >http://www.semanticdesigns.com/Products/DMS/DMSToolkit.html
> > > which has an option for a Mathematica parser

It does.

 > > > He says Mathematica is not a particularly difficult language to
parse,
> > > and a GLR parser is a bit over the top.

It isn't, and GLR is (capable of parsing a context free language)
but AFAICT, isn't really needed to parse MMa.

> > Here you can see a Mathematica parser is listed for the DMS toolkit
> >http://www.semanticdesigns.com/Products/FrontEnds/index.html?Home=DMS...
>
> > So I don't know what to believe Richard. You are saying the Mathematica 
> > language
> > can't be parsed with a conventional parser, so (you?) had to hand-write the 
> > parser for
> > MockMMA,

Our parser for MMa consists of a relatively conventional lexical
definition
for tokens, and a very straightforward grammar for the language
itself.

> yet someone from a commercial company selling this DMS toolkit claims
> > the language is not particularly difficult to parse, and have a front end 
> > for
> > their toolkit (a GLR parser) able to parse Mathematica.
>
> Here are my suggestions:
>
> 1. The guy is lying. He doesn't really have a Mathematica parser that
> works.

Hmph.  For your example r[s[]] below, which you claim is *so* hard to
parse,
here's the output of DMS parsing it using our Mathematica grammar:

C:\DMS\Domains\Mathematica\Tools\Parser\Source>run ../domainparser +
+AST "C:\DMS\Domains\Mathematica\Examples\multiply.m"
Domain Parser for Mathematica 2.3.3
Copyright (C) Semantic Designs 1996-2010; All Rights Reserved
17 tree nodes in tree.
(Mathematica@Mathematica=1#481c320^0 Line 1 Column 1 File C:/DMS/
Domains/Mathematica/Examples/multiply.m
 (Commands@Mathematica=3#481c300 Line 1 Column 1 File C:/DMS/Domains/
Mathematica/Examples/multiply.m
  (Commands@Mathematica=3#481c2c0 Line 1 Column 1 File C:/DMS/Domains/
Mathematica/Examples/multiply.m
   (Commands@Mathematica=2#4819dc0 Line 1 Column 1 File C:/DMS/Domains/
Mathematica/Examples/multiply.m)Commands
   (Command@Mathematica=5#481c2a0 Line 1 Column 1 File C:/DMS/Domains/
Mathematica/Examples/multiply.m
   |(ExpressionSequence@Mathematica=17#481c280 Line 1 Column 1 File C:/
DMS/Domains/Mathematica/Examples/multiply.m
   | (Rule@Mathematica=29#4819f80 Line 1 Column 1 File C:/DMS/Domains/
Mathematica/Examples/multiply.m
   |  (Disjunction@Mathematica=34#4819fc0 Line 1 Column 1 File C:/DMS/
Domains/Mathematica/Examples/multiply.m
   |   (Conjunction@Mathematica=36#481c040 Line 1 Column 1 File C:/DMS/
Domains/Mathematica/Examples/multiply.m
   |   |(EqualitySequence@Mathematica=38#481c080 Line 1 Column 1 File
C:/DMS/Domains/Mathematica/Examples/multiply.m
   |   | (Sum@Mathematica=56#481c0e0 Line 1 Column 1 File C:/DMS/
Domains/Mathematica/Examples/multiply.m
   |   |  (Product@Mathematica=60#481c220 Line 1 Column 1 File C:/DMS/
Domains/Mathematica/Examples/multiply.m
   |   |   (QualifiedIdentifier@Mathematica=203#4819e40 Line 1 Column
1 File C:/DMS/Domains/Mathematica/Examples/multiply.m
   |   |   |(IDENTIFIER@Mathematica=206#4819da0[`a'] Line 1 Column 1
File C:/DMS/Domains/Mathematica/Examples/multiply.m)IDENTIFIE
R
   |   |   )QualifiedIdentifier
   |   |   (QualifiedIdentifier@Mathematica=203#481c260 Line 1 Column
3 File C:/DMS/Domains/Mathematica/Examples/multiply.m
   |   |   |(IDENTIFIER@Mathematica=206#4819e20[`b'] Line 1 Column 3
File C:/DMS/Domains/Mathematica/Examples/multiply.m)IDENTIFIE
R
   |   |   )QualifiedIdentifier
   |   |  )Product
   |   | )Sum
   |   |)EqualitySequence
   |   )Conjunction
   |  )Disjunction
   | )Rule
   |)ExpressionSequence
   )Command
  )Commands
  (Command@Mathematica=4#481c2e0 Line 2 Column 1 File C:/DMS/Domains/
Mathematica/Examples/multiply.m)Command
 )Commands
)Mathematica
Exiting with final status 0

Yes, it parses much bigger, much more complex examples.
JPL has used it internally.  Does it parse all of current 2011 MMa
syntax?
Probablly not, we haven't used it much recently.  But I spent 4 years
working on a 80,000 line MMa program so I think I understand
the basics of the language, and given its Lisp-like syntax,
I don't think I'll be surprised.   Wolfram could be crazy, though.

> 2. The company has a really neat parser generating tool and a lot of
> engineering
> to go with it and Mathematica can be easily parsed with it.

It is indeed the case that we have a neat parser generator and
a lot of engineering.

DMS parses much, much harder languages, such as C++
(famously hard to parse, ask the GNU guys who cracked their skull
on it) and COBOL (not famously so hard, but wait till
you try to parse the data declarations using COBOLs integer nesting
levels
instead of brackets, and get the nesting right), as well
as Fortran with nested and shared DO-continues
(getting the loop nesting right coming directly
out of the parser).  DMS has been applied to carrying out massive
automated transformations on C and C++ systems.

DMS is is the system I built after I decide Mathematica was a piece
of junk as a transformation system.

> 3. The company has nothing much beyond a good term project in a
> compiler-technology
> course (perhaps at a graduate level) plus a bunch of engineering and
> marketing.

> My guess is 1 + 3

These are very generous guesses at the facts, Mr. Fateman.
You might have wasted a few minutes of your time and checked
the web site.

> A VERY simple example.
> r[s[]]
>
> is legal in mathematica.

Yes.

> A traditional lexical analyzer, including the one apparently used by
> mathematica,
> typically looks for the longest string of characters that makes a
> token.   Hence
> a===b    has a token ===   which is  "SameQ"  even though there are
> tokens =  and ==.
> So the longest one is found, in general.

True.

> anyway, how does one do lexical analysis or scanning on
> r[s[]] ?
>
> The correct tokenization is  r, [, s, [, ], ] .   but the maximal
> token deal returns
> r, [, s, [, ]] .
>
> What does this mean?  It means that the conventional separation of
> lexical analysis
> and parsing must be intermixed in parsing Mathematica.

No, it doesn't.  It means you've made an assumption that
might not be true.   The lexer we have for Mma is not
one of the open source ones, but what it does is pretty
traditional: it produces a stream of lexemes.
While our lexical machinery is in fact capable of
asking about the left context, and we in fact use
that to parse other ugly languages,  and indeed we might
have done what you said, we in fact did not do so.
We don't use any special parsing tricks for this.

Since you are the parsing expert, I'll leave it to
you to figure how.

> I know of no other programming language that requires this.

Try any language which has syntax to change what is considered
to be terminator  characters.   This is possible in the legacy 4GL
called
Natural, but you are unlikely familiar with that.
I believe you can also do this in Perl, worse, it
happens at runtime so a statement containing
the terminator can actually precede the terminator-setting statement
by an arbitrary distanct.  Yes,  Perl is a bitch to lex, let alone
parse.

> Oh, there are also other glitches in mathematica of this sort.

Yes. There's a beaut of an ambiguity that needs resolution
between a product term and a pattern designation.
Most of the rest of the langauge seems pretty straightforward
compared to most real programming languages.

Mathematica is otherwise not hard to parse, and you don't need
a hand-written parser to do it.

Ira D. Baxter, CTO
Semantic Designs, Inc.

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to