On 08/31/2014 04:08 PM, Ted Mittelstaedt wrote:
Out of the box the default decision point of 5 is too high anyway.
SA is the framework - you can tune to your need as much as you want.
I think the emphasis on avoiding false positives in the stock
(non-Bayes) distribution is far too high. I suspect that over
the years many good rule submissions have been ignored because
incidence of false positives with them was too high for the
SA maintainers.
During the last +-4 years, scores have been set by the masscheck GA system.
IF more ppl would contribute with masschecks and rules, detection could
be better, but the lack of volunteers doing this shows that apparently
what SA does is good enough or there is little interest in commitment.
For the same reason, SARE went belly up after volunteers drifted to new
interests, jobs, had families, etc.
The lack of general commitment and a general passive attitude expecting
"others" to do the job doesn't help at all.
For a newbie to SA it is disheartening to install SA and not
get 90% with a 2% false positive, out of the box, but rather get
50% with a 0% false positive. And I think that is a mistake the
maintainers are making is over-reliance on bayes.
Mantainers do what they can, on a voluntary basis. If newbies expect SA
to be FUSP out of the box, then they didn't get enough info beforehand.
At the least the SA maintainers should maintain a separate
"highly aggressive" rule distro that was optional that would
give us a much higher success rate with a corresponding
slight increase in false positives.
"should" ? SA devs are volunteers, contributing time and resources with
little return other than some personal satisfaction of helping others.
SA's develpment is not funded or backed by some multimillion corp.
What are you doing to contribute ?
SA is the framework - if you wish to start a sa-update channel for extra
agressive rules_du_jour you're welcome to do it and if you find some
volunteers to help you, even better.
Their design approach has been to rely on Bayes to be trained to go from
50% capture out of box with 0% FP to 80-90% capture with 0% FP.
an assumption, based on what?
But, the design approach could easily be relying on Bayes to go
from 90% capture with 5% FP out of the box, to 90% capture with
0% FP with Bayes, and the emphasis being on training Bayes on ham,
not spam.
Note I am pulling the percentages out of my ass, but I think you
get the idea.
By design, SA's Bayes is not FUSP, it's a small part of the arsenal -
depending on your skill to write rules, make use of other SA features,
etc, you can even run a very efficient filtering system without it.
There are simple methods to automagically feed Bayes with lots of spam
or ham - depending on what you feel you need most. It's up to you to be
creative and make use of SA's ton of features (including third party
rules/plugins)