On Sep 16, 10:18 am, "Dotan Cohen" <[EMAIL PROTECTED]> wrote: > I'd like to filter spam from a certain company. Here are examples of > strings found in their spam: > Mega Dik > Mega D1k > MegaDik > Mega. Dik > M eg ad ik > M E _G_A_D_ IK > M_E_G. ADI. K > > I figured that this regex would match all but the second example, yet > it matches none: > |[^a-z]m[^a-z]e[^a-z]g[^a-z]a[^a-z]d[^a-z]i[^a-z]k[^a-z]|i > > What would be the regex that matches "megadik" regardless of whatever > characters are sprinkled throughout? > > Thanks in advance. > > Dotan
In your regex, every occurrence of "[^a-z]" requires a single character not in the a-z range. So what you have *should* match "M*E*G*A*D*I*K" (an unfortunate pr0n sequel to "M*A*S*H"?), but not any of your examples. You will need to add an '*' character to your [^a-z]'s, as in: [^a-z]*m[^a-z]*e[^a-z]*g[^a-z]*a[^a-z]*d[^a-z]*i[^a-z]*k[^a-z]* to indicate "0 or more" repetitions of [^a-z]. Also, I would omit the leading and trailing "[^a-z]*"s - I think they will significantly slow down your regex. -- Paul -- http://mail.python.org/mailman/listinfo/python-list