On Sat, 14 May 2016, Reindl Harald wrote:

Am 14.05.2016 um 19:10 schrieb John Hardin:
 On Sat, 14 May 2016, Reindl Harald wrote:

>  Am 14.05.2016 um 04:50 schrieb John Hardin:
> >   On Sat, 14 May 2016, Reindl Harald wrote:
> > >   Am 14.05.2016 um 04:04 schrieb John Hardin:
> > > > How would a webservice be better? That would still be > > > > sending customer emails to a third party for processing. uhm > > > > you missed "and only give the rules which hitted and
> > > >  spam/ham flag out"
> > > > Ah, OK, I misunderstood what you were suggesting. > > > > That wouldn't work. That tells you the rules they hit at the time > > they were scanned, not which rules they would hit from the > > current testing rules. > > on the other hand it would reflect the complete mail-flow and not just
>  hand-crafted samples

 It's not hand *crafted* samples, it's hand *classified* samples. The
 message needs to be classified by a reliable human as ham or spam for
 the analysis of the rules that it hits to have any use, or even be
 possible.

that's just nitpicking - i can correct you the same way in german for most of you would try to express :-)

Yes, probably.

 That's why doing something like having an SA install that's based on the
 current SVN sandbox rules, and that gets a forked copy of your mail
 stream, and that captures the hits, is still not useful for anything
 other than gross "this rule didn't hit anything" analysis - you don't
 know what a given message *should* have been, so you can't say anything
 about the rules that hit it - whether they aid that result, or hider it.

how do you imagine such a setup *in practice*

Somewhat stream-of-consciousness:

In addition to your normal deliver-to-the-user MTA, have another MTA that is running against an SA that is configured from SVN. Note that this wouldn't be a backup MTA, it would have to get a copy of your inbound mail stream. Not sure how you'd fork the mail delivery process, that's probably MTA-dependent.

The masscheck MTA would deliver to SA, record the rule hits and classification in the masscheck upload format, and discard the message.

Normal delivery would usually be suspended so that messages queue.

When the masscheck start time is reached, update from SVN, recompile the rules, clear the log and enable MTA delivery. The queued messages would be scanned and recorded until the upload time is reached, at which time delivery is suspended again. This may or may not be long enough to clear the queue.

The results would then be uploaded.

As you noted, there would have to be some minimum score for recording the message as spam, and some maximum score for recording it as ham. Anything in between would have to be discarded as ambiguous. There might also need to be some kind of weighting on the results when they are incorporated into masscheck to reflect that they are not hand-classified and thus their reliability isn't as good as we'd like, however there have been misclassifications in hand-classified corpora before so if the thresholds are well-chosen that may not be an issue.

But note, this would probably not help offset a high-scoring FP rule as the message would be auto-classified as spam or, at best, ambiguous - it might actually be self-reinforcing and make the situation worse, rather than help it be self-correcting as hand-classified corpora would. It also won't probably help much with new rules.

I don't really think there's any way around having hand-classified clean and complete corpora for running masschecks.

 Unless your mail stream prior to SA is *guaranteed* 100% ham (which is
 hugely unlikely or why would you be running SA at all?) or 100% spam
 (which might be the case for a clean honeypot), you need to review and
 classify the messages manually before performing the scan and reporting
 the rule hits, and that means keeping copies of the pristine messages,
 at least for a while.

 I don't know whether statutory requirement make this impossible for you
 even if you did obtain consent from some of your clients to use their
 mail stream in that manner.

i don't have access to the whole mailflow to classify it nor is there a technical way to mirror it on a different setup

OK

nor would SA or even smtpd ever see 95% of junk because content filters are the last ressort by definition

It's not too difficult for masscheck to get spam, as there are honeypots feeding masscheck. It's harder to get ham, especially non-English ham, so contributing to masscheck from a 99% clean feed is still helpful.

>  should be chained in a minimum negative score to count as ham and a
>  minimum positive to count as spam - configureable because it depends
>  on the local environment and adjustments which scores are clear
>  classifications, 7.0 would here not be 100% spam, 12.0 would be as
>  example

 That's probably still not reliable enough for use in masscheck. Ham is a
 bit more important; what would you recommend as a lower limit for
 considering a message as ham? How many actual hams would meet that
 requiement? It might be a lot of work for little final benefit. What
 percentage actual FNs would you see with that setting? Those would
 damage the masscheck analysis.

i would agree if we could call the current masscheck results reliable

>  it would at least help in the current situation and with a rule like
>  FSL_HELO_HOME when it hits only clear ham and has a high spam-score
>  and when it only needs to be enabled, collects the information through
>  scanning and submit the results once per day a lot of people running
>  milter like setups with reject and no access to rejected mails could
>  help to improve to auto-QA without collecting whole mails

 Potentially. You'd have to be willing to set up a parallel mail
 processing stream using the current SVN sandbox rules as I described
 above. Performing analysis on the released rules provides no benefit to
 masscheck

why would it provide no benefit when one part of the "sa-update" which currently don't get any updates most of the time is to re-score badly socred rules - that's really not only about sandbox rules

Because the rules in question may have changed since the last update was released. The analysis needs to be of the current state of the rules in SVN - take a snapshot, masscheck it and generate scores, and those rules and their scores are released as an update if the corpora are large enough for the results to be considered reliable. (Note that "reliability" is based on the *size* of the corpora. We sadly don't have any way to judge it based on broadness of content.)

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  ...much of our country's counterterrorism security spending is not
  designed to protect us from the terrorists, but instead to protect
  our public officials from criticism when another attack occurs.
                                                    -- Bruce Schneier
-----------------------------------------------------------------------
 144 days since the first successful real return to launch site (SpaceX)

Reply via email to