On Mon, 20 Nov 2006, twofers wrote:

> I would like to know what local rule I could invoke to tag email that the 
> subject is not in english.
>    
>   header       NOT_IN_ENGLISH     Subject !~ /English/i
>   describe     NOT_IN_ENGLISH     Subject Contains Non English Characters
>   score         NOT_IN_ENGLISH     3.5
>    
>   What regexp could I use?

I haven't tested this, but it may work:

header       NOT_IN_ENGLISH     Subject =~ /[\x80-\xFF]{3}/

That should hit on a string of at least three charaters with the high
bit set.

You may need to drop it down to {2} to get good detection.

Don't score it very high.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]    FALaholic #11174     pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  A sword is never a killer, it is but a tool in the killer's hands.
                          -- Lucius Annaeus Seneca (Martial) 4BC-65AD
-----------------------------------------------------------------------

Reply via email to