jodawi wrote: > I need to find a bunch of C function declarations by searching > thousands of source or html files for thousands of known function > names. My initial simple approach was to do this: > > rxAllSupported = re.compile(r"\b(" + "|".join(gAllSupported) + r")\b") > # giving a regex of \b(AAFoo|ABFoo| (uh... 88kb more...) |zFoo)\b > > for root, dirs, files in os.walk( ... ): > ... > for fileName in files: > ... > filePath = os.path.join(root, fileName) > file = open(filePath, "r") > contents = file.read() > ... > result = re.search(rxAllSupported, contents) > > but this happens: > > result = re.search(rxAllSupported, contents) > File "C:\Python24\Lib\sre.py", line 134, in search > return _compile(pattern, flags).search(string) > RuntimeError: internal error in regular expression engine > > I assume it's hitting some limit, but don't know where the limit is to > remove it. I tried stepping into it repeatedly with Komodo, but didn't > see the problem. > > Suggestions?
One workaround may be as easy as wanted = set(["foo", "bar", "baz"]) file_content = "foo bar-baz ignored foo()" r = re.compile(r"\w+") found = [name for name in r.findall(file_content) if name in wanted] print found Peter -- http://mail.python.org/mailman/listinfo/python-list