On 6 Sep 2016, at 16:04, do...@mail.com wrote:

On Mon, 05 Sep 2016 20:17:18 "Bill Cole" wrote:
On 4 Sep 2016, at 21:11, @lbutlr wrote:

On Sep 1, 2016, at 7:41 PM, David Niklas >
&lt;[do...@mail.com](<mailto:do...@mail.com>)&gt; wrote:

Would you like to go out to lunch?

Other than your message, that phrase does not appear in 7 years of my
mail.

It's in hash-buster/bayes-buster parts in 5 messages in my spam corpus
spread over 4 years without other obvious commonalities (other than
their use of such tactics.)

It was just an example to make a point. You would need to look at your
cool database for a non-spamy string and place it in with an equally spamy
one to figure out if I have found a bug in your cool program.

BTW: You never mentioned if anyone accepted your offer yet.

You seem to have me confused with Marc Perkel. I am not Marc Perkel. This should have been apparent from the attribution line you included in your message.

The point I was hoping others would infer is simply that different people get substantially different mail (ham and spam) which makes statistical approaches of all sorts increasingly ineffective as you increase the diversity of the recipient population. This latest FUSSP proposal is even more fragile to that sort of breakage because all it takes to completely burn a classifier token is a single appearance in both classes. As one grows a source corpus across a broad enough audience, the usable tokens trend inevitably towards zero while the remaining usable tokens are those which simply don't occur very often and so aren't operationally valuable.

Despite Mr. Perkel's extensive insistence to the contrary, his proposal does logically reduce to a variation on Bayesian filtering which avoids FPs at the cost of not being able to make any judgment at all on the actually difficult cases.

Reply via email to