Thanks for this.  It's always useful to see examples of parsers that aren't
performing as well as expected.

On my machine, using the latest version (1.2.13), your parser processes
your text file (which appears to have three bogus bytes at the front of the
file which I needed to edit out), I'm getting about 8 seconds.  That's not
necessarily meaningful -- I could just have a faster computer.

My first thought is that by treating the ands and ors as binary connectives
only, you get a very deeply nested data structure.

(sudoku-parser "c1|c2|c3|c4|c5")
([:and [:and [:and [:and [:atom "c1"] [:atom "c2"]] [:atom "c3"]] [:atom
"c4"]] [:atom "c5"]])

Perhaps it would be more efficient (and easier to process the resulting
trees) if you allowed these connectives to take multiple subexpressions.
Also, it looks like maybe you have & and | reversed from their usual
interpretation.  Was that intentional?

Here's a sample of how one might allow multiple subexpressions within & and
|:

<expr> = unary | binary
atom = #'c[0-9]*'
not = <'!'> atom
<unary> = atom | not | <'('> expr <')'>
<binary> = and | or
and = unary (<'&'> unary)+
or = unary (<'|'> unary)+

The above example puts & and | back to the usual interpretation.  Also, I
leverage the fact that in your data, ! only precedes atoms.  If you want a
more general interpretation of !, then:
not = <'!'> unary
does the trick.

This gives you:

(sudoku-parser2 "c1|c2|c3|c4|c5")
([:or [:atom "c1"] [:atom "c2"] [:atom "c3"] [:atom "c4"] [:atom "c5"]])

which is cleaner to work with and cuts the parsing time in half.

I'd be interested in seeing the grammar for the JavaCC parser that runs
faster.  Also, does the JavaCC parser build the corresponding data
structure, or simply check that the string fits the grammar?

If possible, let's follow up on the instaparse mailing list:
https://groups.google.com/forum/#!forum/instaparse

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to