Have you enabled TextCat in v310.pre?

IMHO languages really can't be detected in SA. It has a TextCat plugin -
but that's too old and basically hasn't worked since Unicode was
invented (it relied on the old charset definitions)

ie these days, most non-ASCII email is in unicode and cannot be parsed
by SA in order to figure out what languages are used.

(please, oh please someone prove me wrong! :-)

Here's some evidence - this is from our current blocked spam folder on
one of our servers. It's flushed quite often and so isn't big  - but
it's a data point (basically iso-8859-1 dominates, but utf8 is next)


grep -i charset *|sed 's/^.*charset.//i'|awk '{print tolower($1)}'|sed
-e 's/"//g' -e 's/>//g' -e 's/;//g'|sort|uniq -c|sort -n
      1 3dgb2312=
      1 3dgb2312=3e</head=3e<body
      1 3dus-ascii
      1 3dutf-8=
      1 iso-8859-1</head<bodydo
      1 shift_jis
      3 windows-1251
      4 gb2312
      8 windows-1250
     13 iso-8859-2
     14 windows-1252
     62 utf-8
     70 iso-8859-1
    208 us-ascii


Jason

-- 
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +1 408 481 8171
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1

Reply via email to