Re: Absurd mail headers in new spam

jdow Wed, 31 May 2017 17:10:46 -0700


On 2017-05-31 16:59, Kim Roar Foldøy Hauge wrote:

On Wed, 31 May 2017, John Hardin wrote:
On Thu, 1 Jun 2017, Benny Pedersen wrote:
 John Hardin skrev den 2017-06-01 00:29:

>   That sort of thing has happened before, and there are rules to *try*
>   to catch nonsense headers in my sandbox, but IIRC they never worked
>   well enough in masscheck to actually get published.

 would it be possible to make list of non nonsense headers, and count based
 on that how many other headers is in mail ?
Define "nonsense".
There are a fairly limited number of headers explicitly defined by the variousRFCs which could be used to restrict the hits, but the number of *valid*headers is unbounded - any header that begins with "X-" is permitted.
 and thus based on how many other headers a mail have say its more spammy
 by to many no nonsense headers ?

 anyway food for bayes training
Potentially.
The headers' randomness could be a clue. Perhaps a plugin that records headersin a database with a "seen" count, and if a message has more than a half-dozenor so low-seen-count headers then it would earn a point or two. The risk thereis FP on messages with a bunch of unusual but not-spammy headers.
To me, this sounds like an excellent candidate for some sort of bayes filtering.Use the headers to make tokens. Tokens token that are only in spam, or neverseen before, should lead to a slightly higher score.
Regular headers should be scored 0 or an extremely low negative score.
Since headers are somewhat more limited than the body, there should be less roomfor false negatives if there is a decent default set of headers already in thedatabase.
Legitimate mail with a lot of odd headers, is hopefully, a very rare occurance.
If I were to guess, adding such headers is done to confuse tools that computehashes based on headers or use bayes filtering on the entire mail,
since it adds innocent words to the mail without showing them to most end-users.

That's basically the "Bayes Poison" argument. It should be possible to dobetter. I'm also finding here that a Bayes that remembered two word phrasescould go a long way to killing off spam. (In this context a, and, the, his, andother such words would be ignored in gathering the two word phrases.) I suspectit would be a nasty piece of code to write; but, I do think it could producesome nice results. Specifically results on the random headers might be prettygood, too.


{^_^}

Re: Absurd mail headers in new spam

Reply via email to