<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > We process a lot of messages in a file based on some regex pattern(s) > we have in a db. > If I compile the regex using re.I, the processing time is substantially > more than if I > don't i.e using re.I is slow. > > However, more surprisingly, if we do something on the lines of : > > s = <regex string> > s = s.lower() > t = dict([(k, '[%s%s]' % (k, k.upper())) for k in > string.ascii_lowercase]) > for k in t: s = s.replace(k, t[k]) > re.compile(s) > ...... > > its much better than using plainly re.I. > > So the qns are: > a) Why is re.I so slow in general? > b) What is the underlying implementation used and what is wrong, if > any, > with above method and why is it not used instead? > > Thanks > Vikram > Can't tell you why re.I is slow, but perhaps this expression will make your RE transform a little plainer (no need to create that dictionary of uppers and lowers).
s = <regex string> makeReAlphaCharLowerOrUpper = lambda c : c.isalpha() and "[%s%s]" % (c.lower(),c.upper()) or c s_optimized = "".join( makeReAlphaCharLowerOrUpper(k) for k in s) or s_optimized = "".join( map( makeReAlphaCharLowerOrUpper, s ) ) Just curious, but what happens if your RE contains something like this spelling check error finder: "[^c]ei" (looking for violations of "i before e except after c") Can []'s nest in an RE? -- Paul -- http://mail.python.org/mailman/listinfo/python-list