On Wed, 31 May 2017, John Hardin wrote:
On Thu, 1 Jun 2017, Benny Pedersen wrote:
John Hardin skrev den 2017-06-01 00:29:
> That sort of thing has happened before, and there are rules to *try*
> to catch nonsense headers in my sandbox, but IIRC they never worked
> well enough in masscheck to actually get published.
would it be possible to make list of non nonsense headers, and count based
on that how many other headers is in mail ?
Define "nonsense".
There are a fairly limited number of headers explicitly defined by the
various RFCs which could be used to restrict the hits, but the number of
*valid* headers is unbounded - any header that begins with "X-" is permitted.
and thus based on how many other headers a mail have say its more spammy
by to many no nonsense headers ?
anyway food for bayes training
Potentially.
The headers' randomness could be a clue. Perhaps a plugin that records
headers in a database with a "seen" count, and if a message has more than a
half-dozen or so low-seen-count headers then it would earn a point or two.
The risk there is FP on messages with a bunch of unusual but not-spammy
headers.
To me, this sounds like an excellent candidate for some sort of bayes
filtering. Use the headers to make tokens. Tokens token that are only in
spam, or never seen before, should lead to a slightly higher score.
Regular headers should be scored 0 or an extremely low negative score.
Since headers are somewhat more limited than the body, there should be
less room for false negatives if there is a decent default set of headers
already in the database.
Legitimate mail with a lot of odd headers, is hopefully, a very rare
occurance.
If I were to guess, adding such headers is done to confuse tools that
compute hashes based on headers or use bayes filtering on the entire mail,
since it adds innocent words to the mail without showing them to most
end-users.
----
Kim Roar Foldøy Hauge
Drift @ SysRq, Narvik Studentersamfunn
Creative @ TG17
Root@B,JH,LZ,MS,VH