On Thu, 21 Mar 2019, Martin Gregorie wrote:

On Thu, 2019-03-21 at 09:23 -0700, John Hardin wrote:
On Thu, 21 Mar 2019, Savvas Karagiannidis wrote:

What should be considered is the message's language. All messages
that were
false positives had the following mime encoding (messages were
actually in
greek):

Content-Type: text/[plain|html]; charset="windows-1253" or
Content-Type: text/[plain|html]; charset="iso-8859-7"

while all messages that were actual spam and were properly detected
had:

Content-Type: text/[plain|html]; charset="utf-8"

It should be fairly easy to add an exclusion based on that
information.
However, that information may well be leveraged by spammers who are
using that obfuscation...

FWIW roughly 10% of my spam corpus uses <font> tags to set white text.

...wrong thread? :)

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  ...the Constitution and the Bill of Rights exists to protect
  the individual, not the mob.                      -- Matt Pickering
-----------------------------------------------------------------------
 721 days since the first commercial re-flight of an orbital booster (SpaceX)

Reply via email to