Re: splitting words with brackets

Paul McGuire Wed, 26 Jul 2006 14:25:59 -0700

"Tim Chase" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> I'm sure there's a *much* more elegant pyparsing solution to
> this, but I don't have the pyparsing module on this machine.
> It's much better/clearer and will be far more readable when
> you come back to it later.
>
> However, the above monstrosity passes the tests I threw at
> it.
>
> -tkc


:)  Cute!  (but how come no pyparsing on your machine?)

Ok, I confess I looked at the pyparsing list parser to see how it compares.
Pyparsing's examples include a list parser that comprehends nested lists
within lists, but this is a bit different, and really more straightforward.

Here's my test program for this modified case:

wrd = Word(alphas)
parenList = Combine( Optional(wrd) + "(" + SkipTo(")") + ")" +
Optional(wrd) )
brackList = Combine( Optional(wrd) + "[" + SkipTo("]") + "]" +
Optional(wrd) )
listExpr = ZeroOrMore( parenList | brackList | wrd )

txt = "a (b c) d [e f g] h i(j k) l [m n o]p q"
print listExpr.parseString(txt)

Gives:
['a', '(b c)', 'd', '[e f g]', 'h', 'i(j k)', 'l', '[m n o]p', 'q']


Comparitive timing of pyparsing vs. re comes in at about 2ms for pyparsing,
vs. 0.13 for re's, so about 15x faster for re's.  If psyco is used (and we
skip the first call, which incurs all the compiling overhead), the speed
difference drops to about 7-10x.  I did try compiling the re, but this
didn't appear to make any difference - probably user error.

Since the OP indicates a concern for speed (he must be compiling a lot of
strings, I guess), it would be tough to recommend pyparsing - especially in
the face of a working re that so neatly does the trick.  But if at some
point it became necessary to add support for {}'s and <>'s, or quoted
strings, I'd rather be working with a pyparsing grammar than that crazy re
gibberish!

-- Paul



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: splitting words with brackets

Reply via email to