Le 02/02/2017 à 15:50, RW a écrit :
On Thu, 2 Feb 2017 05:43:24 -0500
Kevin A. McGrail wrote:
...
I will score much higher since it is in the wild. Can you throw a
spample up on pastebin?
Perhaps text/html makes a big difference, but base64 encoded utf-8
text is not uncommon these days - particularly outside North America.
To score it higher you might want to include a "full" rule that checks
for base64 encoding in the headers followed by illegal whitespace near
the beginning of what should be the base64 text.
Indeed. In my (very small) corpus, I see lots of base64-encoded utf-8
text/html parts of multipart messages, but very few non-multipart examples.
All of the latter really are base64-encoded, rather than plain text
labelled as base64, but that may simply be due to the small size of my
corpus. As it happens they are all spam, but I'm not convinced that
hitting on any utf-8 text/html message that purports to be
base64-encoded, regardless of whether it is actually base64 or not, is a
good idea.
FWIW,
John