On Feb 23, 11:37 am, rjf <fate...@gmail.com> wrote: > On Feb 23, 9:17 am, "Dr. David Kirkby" <david.kir...@onetel.net> > wrote: > > > On 02/22/11 10:57 PM, Dr. David Kirkby wrote: > > > > On 02/22/11 03:49 PM, rjf wrote: > [snip]. The real difficulty is > > >> to implement a Mathematica language parser, since the language > > >> fails to fit the standard expectations for computer languages.
It does? It is a context-free langauge, therefore parsable by any parser capable of parsing context free langauges. > > > I know you said that, but I've heard different from another source. See > > >http://groups.google.com/group/comp.compilers/msg/8c4e6ccad3c40599 > > > The person there, who is the CTO of a company producing this > > >http://www.semanticdesigns.com/Products/DMS/DMSToolkit.html > > > which has an option for a Mathematica parser It does. > > > He says Mathematica is not a particularly difficult language to parse, > > > and a GLR parser is a bit over the top. It isn't, and GLR is (capable of parsing a context free language) but AFAICT, isn't really needed to parse MMa. > > Here you can see a Mathematica parser is listed for the DMS toolkit > >http://www.semanticdesigns.com/Products/FrontEnds/index.html?Home=DMS... > > > So I don't know what to believe Richard. You are saying the Mathematica > > language > > can't be parsed with a conventional parser, so (you?) had to hand-write the > > parser for > > MockMMA, Our parser for MMa consists of a relatively conventional lexical definition for tokens, and a very straightforward grammar for the language itself. > yet someone from a commercial company selling this DMS toolkit claims > > the language is not particularly difficult to parse, and have a front end > > for > > their toolkit (a GLR parser) able to parse Mathematica. > > Here are my suggestions: > > 1. The guy is lying. He doesn't really have a Mathematica parser that > works. Hmph. For your example r[s[]] below, which you claim is *so* hard to parse, here's the output of DMS parsing it using our Mathematica grammar: C:\DMS\Domains\Mathematica\Tools\Parser\Source>run ../domainparser + +AST "C:\DMS\Domains\Mathematica\Examples\multiply.m" Domain Parser for Mathematica 2.3.3 Copyright (C) Semantic Designs 1996-2010; All Rights Reserved 17 tree nodes in tree. (Mathematica@Mathematica=1#481c320^0 Line 1 Column 1 File C:/DMS/ Domains/Mathematica/Examples/multiply.m (Commands@Mathematica=3#481c300 Line 1 Column 1 File C:/DMS/Domains/ Mathematica/Examples/multiply.m (Commands@Mathematica=3#481c2c0 Line 1 Column 1 File C:/DMS/Domains/ Mathematica/Examples/multiply.m (Commands@Mathematica=2#4819dc0 Line 1 Column 1 File C:/DMS/Domains/ Mathematica/Examples/multiply.m)Commands (Command@Mathematica=5#481c2a0 Line 1 Column 1 File C:/DMS/Domains/ Mathematica/Examples/multiply.m |(ExpressionSequence@Mathematica=17#481c280 Line 1 Column 1 File C:/ DMS/Domains/Mathematica/Examples/multiply.m | (Rule@Mathematica=29#4819f80 Line 1 Column 1 File C:/DMS/Domains/ Mathematica/Examples/multiply.m | (Disjunction@Mathematica=34#4819fc0 Line 1 Column 1 File C:/DMS/ Domains/Mathematica/Examples/multiply.m | (Conjunction@Mathematica=36#481c040 Line 1 Column 1 File C:/DMS/ Domains/Mathematica/Examples/multiply.m | |(EqualitySequence@Mathematica=38#481c080 Line 1 Column 1 File C:/DMS/Domains/Mathematica/Examples/multiply.m | | (Sum@Mathematica=56#481c0e0 Line 1 Column 1 File C:/DMS/ Domains/Mathematica/Examples/multiply.m | | (Product@Mathematica=60#481c220 Line 1 Column 1 File C:/DMS/ Domains/Mathematica/Examples/multiply.m | | (QualifiedIdentifier@Mathematica=203#4819e40 Line 1 Column 1 File C:/DMS/Domains/Mathematica/Examples/multiply.m | | |(IDENTIFIER@Mathematica=206#4819da0[`a'] Line 1 Column 1 File C:/DMS/Domains/Mathematica/Examples/multiply.m)IDENTIFIE R | | )QualifiedIdentifier | | (QualifiedIdentifier@Mathematica=203#481c260 Line 1 Column 3 File C:/DMS/Domains/Mathematica/Examples/multiply.m | | |(IDENTIFIER@Mathematica=206#4819e20[`b'] Line 1 Column 3 File C:/DMS/Domains/Mathematica/Examples/multiply.m)IDENTIFIE R | | )QualifiedIdentifier | | )Product | | )Sum | |)EqualitySequence | )Conjunction | )Disjunction | )Rule |)ExpressionSequence )Command )Commands (Command@Mathematica=4#481c2e0 Line 2 Column 1 File C:/DMS/Domains/ Mathematica/Examples/multiply.m)Command )Commands )Mathematica Exiting with final status 0 Yes, it parses much bigger, much more complex examples. JPL has used it internally. Does it parse all of current 2011 MMa syntax? Probablly not, we haven't used it much recently. But I spent 4 years working on a 80,000 line MMa program so I think I understand the basics of the language, and given its Lisp-like syntax, I don't think I'll be surprised. Wolfram could be crazy, though. > 2. The company has a really neat parser generating tool and a lot of > engineering > to go with it and Mathematica can be easily parsed with it. It is indeed the case that we have a neat parser generator and a lot of engineering. DMS parses much, much harder languages, such as C++ (famously hard to parse, ask the GNU guys who cracked their skull on it) and COBOL (not famously so hard, but wait till you try to parse the data declarations using COBOLs integer nesting levels instead of brackets, and get the nesting right), as well as Fortran with nested and shared DO-continues (getting the loop nesting right coming directly out of the parser). DMS has been applied to carrying out massive automated transformations on C and C++ systems. DMS is is the system I built after I decide Mathematica was a piece of junk as a transformation system. > 3. The company has nothing much beyond a good term project in a > compiler-technology > course (perhaps at a graduate level) plus a bunch of engineering and > marketing. > My guess is 1 + 3 These are very generous guesses at the facts, Mr. Fateman. You might have wasted a few minutes of your time and checked the web site. > A VERY simple example. > r[s[]] > > is legal in mathematica. Yes. > A traditional lexical analyzer, including the one apparently used by > mathematica, > typically looks for the longest string of characters that makes a > token. Hence > a===b has a token === which is "SameQ" even though there are > tokens = and ==. > So the longest one is found, in general. True. > anyway, how does one do lexical analysis or scanning on > r[s[]] ? > > The correct tokenization is r, [, s, [, ], ] . but the maximal > token deal returns > r, [, s, [, ]] . > > What does this mean? It means that the conventional separation of > lexical analysis > and parsing must be intermixed in parsing Mathematica. No, it doesn't. It means you've made an assumption that might not be true. The lexer we have for Mma is not one of the open source ones, but what it does is pretty traditional: it produces a stream of lexemes. While our lexical machinery is in fact capable of asking about the left context, and we in fact use that to parse other ugly languages, and indeed we might have done what you said, we in fact did not do so. We don't use any special parsing tricks for this. Since you are the parsing expert, I'll leave it to you to figure how. > I know of no other programming language that requires this. Try any language which has syntax to change what is considered to be terminator characters. This is possible in the legacy 4GL called Natural, but you are unlikely familiar with that. I believe you can also do this in Perl, worse, it happens at runtime so a statement containing the terminator can actually precede the terminator-setting statement by an arbitrary distanct. Yes, Perl is a bitch to lex, let alone parse. > Oh, there are also other glitches in mathematica of this sort. Yes. There's a beaut of an ambiguity that needs resolution between a product term and a pattern designation. Most of the rest of the langauge seems pretty straightforward compared to most real programming languages. Mathematica is otherwise not hard to parse, and you don't need a hand-written parser to do it. Ira D. Baxter, CTO Semantic Designs, Inc. -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org