On 14.03.22 20:15, Alex wrote:
I'm seeing a lot of DCC/pyzor mail being marked as spam that shouldn't
be, and want to see what can be done to prevent that.
DCC contains fuzzy checksums of bulk messages, which means they have been
seen on the internet multiple times. This includes common notifications
from big sites as social networks.
it is also possible to report message to DCC as bulk.
pyzor contains fuzzy checksums of messahes that have been reported multiple
times.
neither of these means messages are spam, but both indicate it might be.
unfortunately, short messages often hit, since the fuzzy checksums for short
messages may often match.
For example, many emails with just an image attachment and an empty
body are hitting DCC. I thought I recalled a way to create a checksum
of these empty messages and add them to an allow list, but it seems it
is specific to the sender, based on /var/lib/dcc/testmsg-whitelist:
# empty Exchange
ok hex fuz1 e038b933 6003e07e 8e990536 110cfa90
How do I generate that signature? I've been unable to find any
instructions on how to do it. Same with pyzor?
Another example is an email I received from Pizza Hut. Their marketing
emails hit DCC and pyzor and sendgrid, making it very difficult for
that email to be delivered unless it also hits some negative bayes or
is allowlisted. Do people add them to the welcomelist? Do you train
marketing emails for bayes?
I usually train many kinds of marketing messages so they don't hit BAYES_00
(BAYES_50 is usually OK) - marketing messages are very similar to typical
spam and hitting BAYES_00 may lower cause for real spam.
* 1.5 KAM_SENDGRID Sendgrid being exploited by scammers
* 0.3 DIGEST_MULTIPLE Message hits more than one network digest check
* 1.0 DCC_REPUT_95_98 DCC reputation between 95 and 98 % (mostly spam)
* 0.5 KAM_REALLYHUGEIMGSRC RAW: Spam with image tags with ridiculously
* huge http urls
* 1.4 PYZOR_CHECK Listed in Pyzor
* 3.0 BAYES_95 BODY: Bayes spam probability is 95 to 99%
* [score: 0.9668]
* 0.1 POISEN_SPAM_PILL_3 BODY: random spam to be learned in bayes
Is sendgrid still as big of a problem as it was a year ago?
if your wanted marketing messages hit BAYES_[89]*, simply train them as ham.
There are a few negative rules, like TXREP and DKIMWL_WL and
RCVD_IN_SENDERSCORE_90_100, but someone really doesn't want Pizza Hut
email to be delivered.
btw I configured DKIMWL to be ignored when training, because these hit many
outlook/gmail spam.
Separately, is ExtractText broken? I have legitimate invoices that are
hitting multiple money rules. Is this the expected behavior? Any
advice on how to deal with it?
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Fucking windows! Bring Bill Gates! (Southpark the movie)