> i've spent couple of hours trying to figure out the correct regular > expression to catch a VisualLisp [snipped] > "(defun foo", but it is hard to find the ")" at the end of code block. > if eventually i can't come up with the solution using regular > expression only, what i was thinking is after finding the beginning > part, which is "(defun foo" in this case, i can count the parenthesis, > ignoring anything inside "" and any line for comment, until i find the > closing ")".
""" Some people, when confronted with a problem, think "I know, I'll use regular expressions!" Now they have two problems """ Regular expressions are a wonderful tool when the domain is correct. However, when your domain involves processing arbitrarily nested syntax, regexps are not your friend. It is sometimes feasible to mung them into a fixed-depth-nesting parser, but it's always fairly painful, and the fixed-depth is an annoying limitation. Use a parsing lib. I've tinkered a bit with PyParsing[1] which is fairly easy to pick up, but powerful enough that you're not banging your head against limitations. There are a number of other parsing libraries[2] with various domain-specific features and audiences, but I'd go browsing through them only if PyParsing doesn't fill the bill. As you don't detail what you want to do with the content or how pathological the input can be, but you might be able to get away with just skimming through the input and counting open-parens and close-parens, stopping when they've been balanced, skipping lines with comments. -tkc [1] http://pyparsing.wikispaces.com/ [2] http://nedbatchelder.com/text/python-parsers.html -- http://mail.python.org/mailman/listinfo/python-list