I've run in to this problem a couple of times. Say I have a piece of text that I want to test against a large number of regular expressions, where a different action is taken based on which regex successfully matched. The naive approach is to loop through each regex, and stop when one succeeds. However, I am finding this to be too slow for my application -- currently 30% of the run time is being taken up in the regex matching.
I thought of a couple of approaches, but I am not sure how to make them work: 1) Combine all of the regular expressions into one massive regex, and let the regex state machine do all the discriminating. The problem with this is that it gives you no way to determine which regex was the matching one. 2) Use the first character of the text string to cut down the search size. The problem here is that since I don't know the regex patterns in advance, I would need to inspect each regex and determine the possible set of starting characters that could be matched. This would require creating my own regex parser. 3) Use a lexical scannner. This is probably overkill for what I want to do. 4) Write my own regex system that does what I want. This is more work than I want to do. Any thoughts? -- http://mail.python.org/mailman/listinfo/python-list