Re: sanitizing/normalizing messages for feeding sa-learn

Matus UHLAR - fantomas Wed, 27 Aug 2014 23:19:33 -0700

On 27.08.14 17:06, btb wrote:

we have a system [zimbra] where users can select a message in the muainterface and click a spam or not spam button. this generates amessage [containing the selected message] which is ultimatelydelivered to a mailbox. i intend on retrieving these messages viaimap and feeding sa-learn, but they've been a bit adulterated by thetime they're retrieved, and i believe some cleanup is probablynecessary prior to feeding sa-learn.


Should not be that necessary. Hopefully Zimbra does not alter messages as
bad as Outlook/Exchange does (what should I tell you? I've been trying to
block spam with specific address in From: ... after I blocked according to
the Subject, I found out that real From: is very different)

here are two samples:

http://dpaste.com/0B6S3FN.txt [claimed to be spam]
http://dpaste.com/3ZZ733Z.txt [claimed to be not spam]
the original message is encapsulated as an attachment, so i wasplanning on extracting this and discarding the rest of the message -unless sa-learn is magical enough to handle this?


it is not, but extracting original message should be enough.

aside from that, i've readhttps://wiki.apache.org/spamassassin/BayesInSpamAssassin and man 1sa-learn about spamassassin markup/headers, but would appreciate anyfeedback for the above samples that might be pertinent - particularheaders that i may not have considered removing, etc.


I would remove no headers, SA should handle that properly.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Your mouse has moved. Windows NT will now restart for changes to take
to take effect. [OK]

Re: sanitizing/normalizing messages for feeding sa-learn

Reply via email to