Thomas Dybdahl Ahle wrote: > Hi, I'm writing a program with a large data stream to which modules can > connect using regular expressions. > > Now I'd like to not have to test all expressions every time I get a line, > as most of the time, one of them having a match means none of the others > can have so.
Not what you want, and would be a LOT of work, but if I remember correctly (from university more than 20 years ago) there is an algorithm that could be adapted to return a list of all the regular expressions that match a string. I thing the base algorithms were documented in Aho and Ullman ("The Dragon book" if I remember correctly). There was one for converting a regular expression into a Non-deterministic Finite-state Automaton, and another for converting the NFA to a Deterministic Finite-state Automaton. The book mentioned that this also handles multiple regular expressions which can be treated as the sub-expressions combined using the or operator. Other, newer books on compiler design would probably contain similar information in their sections on lexical analysis. The algorithm given (if my memory is correct) only handled the case where a true/false match/no-match result was required, but mentioned how to generalise it to return which of several expressions separated by or operators was matched. I think an additional generalisation to return a set of matches rather than one would be relatively straight-forward. I believe these algorithms are the basis of lex/flex and similar utilities, as well as the regular expression handling of many languages. Of course, I would like to emphasise that all this is based on memory of university courses more than 20 years ago, so the details could be wrong. Charles -- http://mail.python.org/mailman/listinfo/python-list