On Tuesday, 3 December 2019 01:01:25 UTC+8, Peter Otten wrote: > A S wrote: > > I think I've seen this question before ;) > > > I am trying to extract all strings in nested parentheses (along with the > > parentheses itself) in my .txt file. Please see the sample .txt file that > > I have used in this example here: > > (https://drive.google.com/open?id=1UKc0ZgY9Fsz5O1rSeBCLqt5dwZkMaQgr). > > > > I have tried and done up three different codes but none of them seems to > > be able to extract all the nested parentheses. They can only extract a > > portion of the nested parentheses. Any advice on what I've done wrong > > could really help! > > > > Here are the three codes I have done so far: > > > > 1st attempt: > > > > import re > > from os.path import join > > > > def balanced_braces(args): > > parts = [] > > for arg in args: > > if '(' not in arg: > > continue > > There could still be a ")" that you miss > > > chars = [] > > n = 0 > > for c in arg: > > if c == '(': > > if n > 0: > > chars.append(c) > > n += 1 > > elif c == ')': > > n -= 1 > > if n > 0: > > chars.append(c) > > elif n == 0: > > parts.append(''.join(chars).lstrip().rstrip()) > > chars = [] > > elif n > 0: > > chars.append(c) > > return parts > > It's probably easier to understand and implement when you process the > complete text at once. Then arbitrary splits don't get in the way of your > quest for ( and ). You just have to remember the position of the first > opening ( and number of opening parens that have to be closed before you > take the complete expression: > > level: 00011112222100 > text: abc(def(gh))ij > when we are here^ > we need^ > > A tentative implementation: > > $ cat parse.py > import re > > NOT_SET = object() > > def scan(text): > level = 0 > start = NOT_SET > for m in re.compile("[()]").finditer(text): > if m.group() == ")": > level -= 1 > if level < 0: > raise ValueError("underflow: more closing than opening > parens") > if level == 0: > # outermost closing parenthesis: > # deliver enclosed string including parens. > yield text[start:m.end()] > start = NOT_SET > elif m.group() == "(": > if level == 0: > # outermost opening parenthesis: remember position. > assert start is NOT_SET > start = m.start() > level += 1 > else: > assert False > if level > 0: > raise ValueError("unclosed parens remain") > > > if __name__ == "__main__": > with open("lan sample text file.txt") as instream: > text = instream.read() > for chunk in scan(text): > print(chunk) > $ python3 parse.py > ("xE'", PUT(xx.xxxx.),"'") > ("TRUuuuth")
Hello Peter! I tried this on my actual working files and it returned this error: "unclosed parens remain". In this case, how can I continue to parse through my text files by only extracting those with balanced parentheses and ignore those that are incomplete? -- https://mail.python.org/mailman/listinfo/python-list