On Fri, 28 Jul 2023, Jared Hall wrote:

On 7/27/2023 12:08 PM, Ken D'Ambrosio wrote:
Hey, all. I've recently started getting spam that's really hard to deal with, and I'm open to suggestions as to how to approach it. Superficially,
[snip..]
The damn body's been encoded!  And there's so little in there that it's not triggering on many rules (e.g., Bayesian doesn't go over 20%).  If anyone has a bright idea -- maybe a way to decode the attachments and run a regex against _that_? -- I'm all ears.


1.  There are milters/content-filters that decode Base64 message parts (amavisd-new, mimedefang, etc) for processing by SA. 2.  There are still sufficiently unique items: First-Name-Only, Mixed-Case word in the Subject (NLP modeling), and a Base-64 encoded HTML attachment (w/ UTF-8 encoding no less).  Combined in a Meta rule, these innocuous items will likely hit with good accuracy even without Base64 decoding.

Umm, unless I'm really missing something here the usual SA processing decodes such body stuff (QP, Base64, etc) and feeds the "cleaned" text to the rule processing engine.

You have to work hard to get matches done on the raw stuff if you want to do special rule matching on the un-decoded body.


--
Dave Funk                               University of Iowa
<dbfunk (at) engineering.uiowa.edu>     College of Engineering
319/335-5751   FAX: 319/384-0549        1256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin         Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Reply via email to