This is very similar to some common problems in developing pyparsing grammars. One of my early attempts at parsing CORBA IDL listed the valid parameter passing mechanisms as ('in' | 'out' | 'inout'). As you can imagine, I never successfully matched an 'inout', since the 'in' match came first. (Pyparsing offers an alternative greedy alternation operator '^', which tests all alternatives and then selects the longest match - however, '^' can have performance issues.) Of course, the simplest recourse is just to change the definition to ('inout' | 'in' | 'out'), but I wanted to provide a more general mechanism (since I was getting e-mails from pyparsing users also having this same problem). The simplest approach is to just sort the list by order, longest to shortest, but this removes any option the developer may want to try to front-load the list with options that are expected to occur more frequently (for example, if defining a list of potential relational operators, forcing ('=' | '<' | '>' | '<=' | '>=' | '!=') to become ('<=' | '>='| '!=' | '=' | '<' | '>') may cause unnecessary tests of infrequently used operators).
I ended up adding the oneOf helper to pyparsing, which takes a single string of space-delimited literals, and then returns a list of literals separated by '|' operators, with literals minimally reordered, that is, only reorderded if a longer literal masks some shorter literal. Also, accidental duplicates are stripped. Here is the implementation of oneOf ('|' operators generate a MatchFirst object): .def oneOf( strs, caseless=False ): . """Helper to quickly define a set of alternative Literals, and makes sure to do . longest-first testing when there is a conflict, regardless of the input order, . but returns a MatchFirst for best performance. . """ . if caseless: . isequal = ( lambda a,b: a.upper() == b.upper() ) . parseElementClass = CaselessLiteral . else: . isequal = ( lambda a,b: a == b ) . parseElementClass = Literal . . symbols = strs.split() . i = 0 . while i < len(symbols)-1: . cur = symbols[i] . for j,other in enumerate(symbols[i+1:]): . if ( isequal(other, cur) ): . del symbols[i+j+1] . break . elif ( isequal(other[:len(cur)],cur) ): . del symbols[i+j+1] . symbols.insert(i,other) . cur = other . break . else: . i += 1 . . return MatchFirst( [ parseElementClass(sym) for sym in symbols ] ) . You could pre-process your regex with something similar, so you can avoid having to do this (error-prone) checking yourself. -- Paul -- http://mail.python.org/mailman/listinfo/python-list