> "a (b c) d [e f g] h i" > should be splitted to > ["a", "(b c)", "d", "[e f g]", "h", "i"] > > As speed is a factor to consider, it's best if there is a > single line regular expression can handle this. I tried > this but failed: > re.split(r"(?![\(\[].*?)\s+(?!.*?[\)\]])", s). It work > for "(a b) c" but not work "a (b c)" :( > > Any hint?
[and later added] > sorry i forgot to give a limitation: if a letter is next > to a bracket, they should be considered as one word. i.e.: > "a(b c) d" becomes ["a(b c)", "d"] because there is no > blank between "a" and "(". >>> import re >>> s ='a (b c) d [e f g] h ia abcd(b c)xyz d [e f g] h i' >>> r = re.compile(r'(?:\S*(?:\([^\)]*\)|\[[^\]]*\])\S*)|\S+') >>> r.findall(s) ['a', '(b c)', 'd', '[e f g]', 'h', 'ia', 'abcd(b c)xyz', 'd', '[e f g]', 'h', 'i'] I'm sure there's a *much* more elegant pyparsing solution to this, but I don't have the pyparsing module on this machine. It's much better/clearer and will be far more readable when you come back to it later. However, the above monstrosity passes the tests I threw at it. -tkc -- http://mail.python.org/mailman/listinfo/python-list