Re: splitting words with brackets

2006-07-27 Thread Tim Chase
>> Hunh! I thought pyparsing was included with Debian. >> (http://packages.debian.org/stable/source/pyparsing) Yes, it's available. Laziness is the main factor here...however, it's simply an "apt-get install pyparsing" away. >> And is downloading a package really such a hardship? >> What, a

Re: splitting words with brackets

2006-07-27 Thread Paul McGuire
"Tim Chase" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > >> >>> r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+') > >> >>> r.findall(s) > >>['(a c)b(c d)', 'e'] > > > > Ah, it's exactly what I want! I thought the left and right > > sides of "|" are equal, but it is not true. > > I

Re: splitting words with brackets

2006-07-26 Thread Justin Azoff
Paul McGuire wrote: > Comparitive timing of pyparsing vs. re comes in at about 2ms for pyparsing, > vs. 0.13 for re's, so about 15x faster for re's. If psyco is used (and we > skip the first call, which incurs all the compiling overhead), the speed > difference drops to about 7-10x. I did try com

Re: splitting words with brackets

2006-07-26 Thread Tim Chase
>> >>> r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+') >> >>> r.findall(s) >>['(a c)b(c d)', 'e'] > > Ah, it's exactly what I want! I thought the left and right > sides of "|" are equal, but it is not true. In theory, they *should* be equal. I was baffled by the nonparity of the situation. Yo

Re: splitting words with brackets

2006-07-26 Thread Paul McGuire
Ah, I had just made the same change! from pyparsing import * wrd = Word(alphas) parenList = "(" + SkipTo(")") + ")" brackList = "[" + SkipTo("]") + "]" listExpr = ZeroOrMore( Combine( OneOrMore( parenList | brackList | wrd ) ) ) t = "a (b c) d [e f g] h i(j k) l [m n o]p q r[s] (t u)v(w) (x)(y)

Re: splitting words with brackets

2006-07-26 Thread Paul McGuire
"Tim Chase" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > I'm sure there's a *much* more elegant pyparsing solution to > this, but I don't have the pyparsing module on this machine. > It's much better/clearer and will be far more readable when > you come back to it later. > > Howeve

Re: splitting words with brackets

2006-07-26 Thread Qiangning Hong
Tim Chase wrote: > Ah...the picture is becoming a little more clear: > > >>> r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+') > >>> r.findall(s) > ['(a c)b(c d)', 'e'] > > It also works on my original test data, and is a cleaner regexp > than the original. > > The clearer the problem, the clearer

Re: splitting words with brackets

2006-07-26 Thread Qiangning Hong
Simon Forman wrote: > What are the desired results in cases like this: > > "(a b)[c d]" or "(a b)(c d)" ? ["(a b)[c d]"], ["(a b)(c d)"] -- http://mail.python.org/mailman/listinfo/python-list

Re: splitting words with brackets

2006-07-26 Thread Simon Forman
Qiangning Hong wrote: > Tim Chase wrote: > > >>> import re > > >>> s ='a (b c) d [e f g] h ia abcd(b c)xyz d [e f g] h i' > > >>> r = re.compile(r'(?:\S*(?:\([^\)]*\)|\[[^\]]*\])\S*)|\S+') > > >>> r.findall(s) > > ['a', '(b c)', 'd', '[e f g]', 'h', 'ia', 'abcd(b c)xyz', 'd', > > '[e f g]', 'h

Re: splitting words with brackets

2006-07-26 Thread Tim Chase
> but it can't pass this one: "(a c)b(c d) e" the above regex > gives out ['(a c)b(c', 'd)', 'e'], but the correct one should > be ['(a c)b(c d)', 'e'] Ah...the picture is becoming a little more clear: >>> r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+') >>> r.findall(s) ['(a c)b(c d)', 'e'] I

Re: splitting words with brackets

2006-07-26 Thread Qiangning Hong
Simon Forman wrote: > def splitup(s): > return re.findall(''' > \S*\( [^\)]* \)\S* | > \S*\[ [^\]]* \]\S* | > \S+ > ''', s, re.VERBOSE) Yours is the same as Tim's, it can't handle a word with two or more brackets pairs, too. I tried to change the "\S*\([^\)]*

Re: splitting words with brackets

2006-07-26 Thread Qiangning Hong
Tim Chase wrote: > >>> import re > >>> s ='a (b c) d [e f g] h ia abcd(b c)xyz d [e f g] h i' > >>> r = re.compile(r'(?:\S*(?:\([^\)]*\)|\[[^\]]*\])\S*)|\S+') > >>> r.findall(s) > ['a', '(b c)', 'd', '[e f g]', 'h', 'ia', 'abcd(b c)xyz', 'd', > '[e f g]', 'h', 'i'] > [...] > However, the above

Re: splitting words with brackets

2006-07-26 Thread Simon Forman
Qiangning Hong wrote: > faulkner wrote: > > re.findall('\([^\)]*\)|\[[^\]]*|\S+', s) > > sorry i forgot to give a limitation: if a letter is next to a bracket, > they should be considered as one word. i.e.: > "a(b c) d" becomes ["a(b c)", "d"] > because there is no blank between "a" and "(". This

Re: splitting words with brackets

2006-07-26 Thread Tim Chase
> "a (b c) d [e f g] h i" > should be splitted to > ["a", "(b c)", "d", "[e f g]", "h", "i"] > > As speed is a factor to consider, it's best if there is a > single line regular expression can handle this. I tried > this but failed: > re.split(r"(?![\(\[].*?)\s+(?!.*?[\)\]])", s). It work

Re: splitting words with brackets

2006-07-26 Thread Justin Azoff
faulkner wrote: > er, > ...|\[[^\]]*\]|... > ^_^ That's why it is nice to use re.VERBOSE: def splitup(s): return re.findall(''' \( [^\)]* \) | \[ [^\]]* \] | \S+ ''', s, re.VERBOSE) Much less error prone this way -- - Justin -- http://mail.python.org/mai

Re: splitting words with brackets

2006-07-26 Thread Qiangning Hong
faulkner wrote: > re.findall('\([^\)]*\)|\[[^\]]*|\S+', s) sorry i forgot to give a limitation: if a letter is next to a bracket, they should be considered as one word. i.e.: "a(b c) d" becomes ["a(b c)", "d"] because there is no blank between "a" and "(". -- http://mail.python.org/mailman/listi

Re: splitting words with brackets

2006-07-26 Thread faulkner
er, ...|\[[^\]]*\]|... ^_^ faulkner wrote: > re.findall('\([^\)]*\)|\[[^\]]*|\S+', s) > > Qiangning Hong wrote: > > I've got some strings to split. They are main words, but some words > > are inside a pair of brackets and should be considered as one unit. I > > prefer to use re.split, but haven'

Re: splitting words with brackets

2006-07-26 Thread faulkner
re.findall('\([^\)]*\)|\[[^\]]*|\S+', s) Qiangning Hong wrote: > I've got some strings to split. They are main words, but some words > are inside a pair of brackets and should be considered as one unit. I > prefer to use re.split, but haven't written a working one after hours > of work. > > Exam

splitting words with brackets

2006-07-26 Thread Qiangning Hong
I've got some strings to split. They are main words, but some words are inside a pair of brackets and should be considered as one unit. I prefer to use re.split, but haven't written a working one after hours of work. Example: "a (b c) d [e f g] h i" should be splitted to ["a", "(b c)", "d", "[e