On Mon, 20 Nov 2006, twofers wrote: > I would like to know what local rule I could invoke to tag email that the > subject is not in english. > > header NOT_IN_ENGLISH Subject !~ /English/i > describe NOT_IN_ENGLISH Subject Contains Non English Characters > score NOT_IN_ENGLISH 3.5 > > What regexp could I use?
I haven't tested this, but it may work: header NOT_IN_ENGLISH Subject =~ /[\x80-\xFF]{3}/ That should hit on a string of at least three charaters with the high bit set. You may need to drop it down to {2} to get good detection. Don't score it very high. -- John Hardin KA7OHZ http://www.impsec.org/~jhardin/ [EMAIL PROTECTED] FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 ----------------------------------------------------------------------- A sword is never a killer, it is but a tool in the killer's hands. -- Lucius Annaeus Seneca (Martial) 4BC-65AD -----------------------------------------------------------------------