On Fri, 28 Jul 2023, Jared Hall wrote:
On 7/27/2023 12:08 PM, Ken D'Ambrosio wrote:
Hey, all. I've recently started getting spam that's really hard to deal
with, and I'm open to suggestions as to how to approach it. Superficially,
[snip..]
The damn body's been encoded! And there's so little in there that it's not
triggering on many rules (e.g., Bayesian doesn't go over 20%). If anyone
has a bright idea -- maybe a way to decode the attachments and run a regex
against _that_? -- I'm all ears.
1. There are milters/content-filters that decode Base64 message parts
(amavisd-new, mimedefang, etc) for processing by SA.
2. There are still sufficiently unique items: First-Name-Only, Mixed-Case
word in the Subject (NLP modeling), and a Base-64 encoded HTML attachment (w/
UTF-8 encoding no less). Combined in a Meta rule, these innocuous items will
likely hit with good accuracy even without Base64 decoding.
Umm, unless I'm really missing something here the usual SA processing decodes
such body stuff (QP, Base64, etc) and feeds the "cleaned" text to the rule
processing engine.
You have to work hard to get matches done on the raw stuff if you want to do
special rule matching on the un-decoded body.
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{