>> Hunh! I thought pyparsing was included with Debian.
>> (http://packages.debian.org/stable/source/pyparsing)
Yes, it's available. Laziness is the main factor
here...however, it's simply an "apt-get install pyparsing"
away.
>> And is downloading a package really such a hardship?
>> What, a
"Tim Chase" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> >> >>> r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+')
> >> >>> r.findall(s)
> >>['(a c)b(c d)', 'e']
> >
> > Ah, it's exactly what I want! I thought the left and right
> > sides of "|" are equal, but it is not true.
>
> I
Paul McGuire wrote:
> Comparitive timing of pyparsing vs. re comes in at about 2ms for pyparsing,
> vs. 0.13 for re's, so about 15x faster for re's. If psyco is used (and we
> skip the first call, which incurs all the compiling overhead), the speed
> difference drops to about 7-10x. I did try com
>> >>> r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+')
>> >>> r.findall(s)
>>['(a c)b(c d)', 'e']
>
> Ah, it's exactly what I want! I thought the left and right
> sides of "|" are equal, but it is not true.
In theory, they *should* be equal. I was baffled by the nonparity
of the situation. Yo
Ah, I had just made the same change!
from pyparsing import *
wrd = Word(alphas)
parenList = "(" + SkipTo(")") + ")"
brackList = "[" + SkipTo("]") + "]"
listExpr = ZeroOrMore( Combine( OneOrMore( parenList | brackList | wrd ) ) )
t = "a (b c) d [e f g] h i(j k) l [m n o]p q r[s] (t u)v(w) (x)(y)
"Tim Chase" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> I'm sure there's a *much* more elegant pyparsing solution to
> this, but I don't have the pyparsing module on this machine.
> It's much better/clearer and will be far more readable when
> you come back to it later.
>
> Howeve
Tim Chase wrote:
> Ah...the picture is becoming a little more clear:
>
> >>> r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+')
> >>> r.findall(s)
> ['(a c)b(c d)', 'e']
>
> It also works on my original test data, and is a cleaner regexp
> than the original.
>
> The clearer the problem, the clearer
Simon Forman wrote:
> What are the desired results in cases like this:
>
> "(a b)[c d]" or "(a b)(c d)" ?
["(a b)[c d]"], ["(a b)(c d)"]
--
http://mail.python.org/mailman/listinfo/python-list
Qiangning Hong wrote:
> Tim Chase wrote:
> > >>> import re
> > >>> s ='a (b c) d [e f g] h ia abcd(b c)xyz d [e f g] h i'
> > >>> r = re.compile(r'(?:\S*(?:\([^\)]*\)|\[[^\]]*\])\S*)|\S+')
> > >>> r.findall(s)
> > ['a', '(b c)', 'd', '[e f g]', 'h', 'ia', 'abcd(b c)xyz', 'd',
> > '[e f g]', 'h
> but it can't pass this one: "(a c)b(c d) e" the above regex
> gives out ['(a c)b(c', 'd)', 'e'], but the correct one should
> be ['(a c)b(c d)', 'e']
Ah...the picture is becoming a little more clear:
>>> r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+')
>>> r.findall(s)
['(a c)b(c d)', 'e']
I
Simon Forman wrote:
> def splitup(s):
> return re.findall('''
> \S*\( [^\)]* \)\S* |
> \S*\[ [^\]]* \]\S* |
> \S+
> ''', s, re.VERBOSE)
Yours is the same as Tim's, it can't handle a word with two or more
brackets pairs, too.
I tried to change the "\S*\([^\)]*
Tim Chase wrote:
> >>> import re
> >>> s ='a (b c) d [e f g] h ia abcd(b c)xyz d [e f g] h i'
> >>> r = re.compile(r'(?:\S*(?:\([^\)]*\)|\[[^\]]*\])\S*)|\S+')
> >>> r.findall(s)
> ['a', '(b c)', 'd', '[e f g]', 'h', 'ia', 'abcd(b c)xyz', 'd',
> '[e f g]', 'h', 'i']
>
[...]
> However, the above
Qiangning Hong wrote:
> faulkner wrote:
> > re.findall('\([^\)]*\)|\[[^\]]*|\S+', s)
>
> sorry i forgot to give a limitation: if a letter is next to a bracket,
> they should be considered as one word. i.e.:
> "a(b c) d" becomes ["a(b c)", "d"]
> because there is no blank between "a" and "(".
This
> "a (b c) d [e f g] h i"
> should be splitted to
> ["a", "(b c)", "d", "[e f g]", "h", "i"]
>
> As speed is a factor to consider, it's best if there is a
> single line regular expression can handle this. I tried
> this but failed:
> re.split(r"(?![\(\[].*?)\s+(?!.*?[\)\]])", s). It work
faulkner wrote:
> er,
> ...|\[[^\]]*\]|...
> ^_^
That's why it is nice to use re.VERBOSE:
def splitup(s):
return re.findall('''
\( [^\)]* \) |
\[ [^\]]* \] |
\S+
''', s, re.VERBOSE)
Much less error prone this way
--
- Justin
--
http://mail.python.org/mai
faulkner wrote:
> re.findall('\([^\)]*\)|\[[^\]]*|\S+', s)
sorry i forgot to give a limitation: if a letter is next to a bracket,
they should be considered as one word. i.e.:
"a(b c) d" becomes ["a(b c)", "d"]
because there is no blank between "a" and "(".
--
http://mail.python.org/mailman/listi
er,
...|\[[^\]]*\]|...
^_^
faulkner wrote:
> re.findall('\([^\)]*\)|\[[^\]]*|\S+', s)
>
> Qiangning Hong wrote:
> > I've got some strings to split. They are main words, but some words
> > are inside a pair of brackets and should be considered as one unit. I
> > prefer to use re.split, but haven'
re.findall('\([^\)]*\)|\[[^\]]*|\S+', s)
Qiangning Hong wrote:
> I've got some strings to split. They are main words, but some words
> are inside a pair of brackets and should be considered as one unit. I
> prefer to use re.split, but haven't written a working one after hours
> of work.
>
> Exam
I've got some strings to split. They are main words, but some words
are inside a pair of brackets and should be considered as one unit. I
prefer to use re.split, but haven't written a working one after hours
of work.
Example:
"a (b c) d [e f g] h i"
should be splitted to
["a", "(b c)", "d", "[e
19 matches
Mail list logo