On 02.06.2012 23:23, Matt Simerson wrote:
On Jun 2, 2012, at 11:15 AM, Jared Johnson wrote:
Yup. Part of the motivation for this plugin was to short circuit all the
intermediate plugins and handlers so I can feed the message to sa-learn
and dspam. Until dspam is trained, that's a very important step in
training it. But there's no gain in validating the HELO name, SPF, or
DomainKeys. This plugin and associated changes adds that flexibility while
reducing the code and complexity of the plugins.
It might not be fair to say there's *no* gain. Our HELO validation and
SPF plugins (we don't have a DKIM plugin at the moment, for shame) now do
their lookups unconditionally and add headers to the message so that our
bayes engine can tokenize the headers themselves.
Wait until you actually run DomainKeys before you decide if it's a gain. It requires more resources than I'd
have guessed. And surprisingly (to me) is that the most reliably signed messages are spam. Or very big
"mostly good" senders. I've seen enough ham senders with broken DomainKeys so I don't consider it
reliable enough to reject or train based on. Same goes for SPF. Spammers are far more likely to have good SPF
than legit mailers. Spammers automate their SPF records, so they don't make typo mistakes like
"ip:127..." (should be "ip4:127...") or missing spaces between the declarations and the
~all. The errors are common enough, and affect ham often enough, that I'm tempted to fix them up in the SPF
plugin before validation.
And SPF breaks legit forwarding servers that don't implement SRS. So I don't
reject or train based on SPF alone.
I too have a custom HELO validation plugin (it needs more work, but I'll
contribute it eventually), and it may actually provide some gain, but I think
it's safe to say the one presently in plugins is not a gain.
How do you measure if the resources expended are worth the (likely small)
benefit you would get from the additional bayes tokens? That will determine if
it's a gain or not. I've placed my bet on the table, and I'd be pleased to be
proven wrong.
Bayes is a little bit of a black box to me, so I can't really quantify
just how useful this is, but I'd say it's greater than zero. Dspam even
treats headers in a special way to ensure that their usefulness is
maximized.
Usefulness != gain. There may be some gain, but I'm not familiar with bayes
enough either. But I know someone who is. The dspam author (Stevan Bajić)
noticed my plugin, contacted me, and will be submitting some improvements, like
talking directly to the dspam server. I'm BCC'ing him on this message, and
hopefully we'll get a more informed opinion.
I don't 100% understand what you are trying to do with bayes? Is this
'reaper' plugin adding some additional data to the header of the mail
and the other person posting is questioning if that additional header is
beneficial to the bayes engine?
Care to explain little more to me what this is all about?
Matt
--
Kind Regards from Switzerland,
Stevan Bajić