On 6 Sep 2016, at 16:04, do...@mail.com wrote:
On Mon, 05 Sep 2016 20:17:18 "Bill Cole" wrote:
On 4 Sep 2016, at 21:11, @lbutlr wrote:
On Sep 1, 2016, at 7:41 PM, David Niklas >
<[do...@mail.com](<mailto:do...@mail.com>)> wrote:
Would you like to go out to lunch?
Other than your message, that phrase does not appear in 7 years of
my
mail.
It's in hash-buster/bayes-buster parts in 5 messages in my spam
corpus
spread over 4 years without other obvious commonalities (other than
their use of such tactics.)
It was just an example to make a point. You would need to look at your
cool database for a non-spamy string and place it in with an equally
spamy
one to figure out if I have found a bug in your cool program.
BTW: You never mentioned if anyone accepted your offer yet.
You seem to have me confused with Marc Perkel. I am not Marc Perkel.
This should have been apparent from the attribution line you included in
your message.
The point I was hoping others would infer is simply that different
people get substantially different mail (ham and spam) which makes
statistical approaches of all sorts increasingly ineffective as you
increase the diversity of the recipient population. This latest FUSSP
proposal is even more fragile to that sort of breakage because all it
takes to completely burn a classifier token is a single appearance in
both classes. As one grows a source corpus across a broad enough
audience, the usable tokens trend inevitably towards zero while the
remaining usable tokens are those which simply don't occur very often
and so aren't operationally valuable.
Despite Mr. Perkel's extensive insistence to the contrary, his proposal
does logically reduce to a variation on Bayesian filtering which avoids
FPs at the cost of not being able to make any judgment at all on the
actually difficult cases.