A S wrote: I think I've seen this question before ;)
> I am trying to extract all strings in nested parentheses (along with the > parentheses itself) in my .txt file. Please see the sample .txt file that > I have used in this example here: > (https://drive.google.com/open?id=1UKc0ZgY9Fsz5O1rSeBCLqt5dwZkMaQgr). > > I have tried and done up three different codes but none of them seems to > be able to extract all the nested parentheses. They can only extract a > portion of the nested parentheses. Any advice on what I've done wrong > could really help! > > Here are the three codes I have done so far: > > 1st attempt: > > import re > from os.path import join > > def balanced_braces(args): > parts = [] > for arg in args: > if '(' not in arg: > continue There could still be a ")" that you miss > chars = [] > n = 0 > for c in arg: > if c == '(': > if n > 0: > chars.append(c) > n += 1 > elif c == ')': > n -= 1 > if n > 0: > chars.append(c) > elif n == 0: > parts.append(''.join(chars).lstrip().rstrip()) > chars = [] > elif n > 0: > chars.append(c) > return parts It's probably easier to understand and implement when you process the complete text at once. Then arbitrary splits don't get in the way of your quest for ( and ). You just have to remember the position of the first opening ( and number of opening parens that have to be closed before you take the complete expression: level: 00011112222100 text: abc(def(gh))ij when we are here^ we need^ A tentative implementation: $ cat parse.py import re NOT_SET = object() def scan(text): level = 0 start = NOT_SET for m in re.compile("[()]").finditer(text): if m.group() == ")": level -= 1 if level < 0: raise ValueError("underflow: more closing than opening parens") if level == 0: # outermost closing parenthesis: # deliver enclosed string including parens. yield text[start:m.end()] start = NOT_SET elif m.group() == "(": if level == 0: # outermost opening parenthesis: remember position. assert start is NOT_SET start = m.start() level += 1 else: assert False if level > 0: raise ValueError("unclosed parens remain") if __name__ == "__main__": with open("lan sample text file.txt") as instream: text = instream.read() for chunk in scan(text): print(chunk) $ python3 parse.py ("xE'", PUT(xx.xxxx.),"'") ("TRUuuuth") -- https://mail.python.org/mailman/listinfo/python-list