What you are trying to do is to identify a source of messages by its
entropy....supposed the entropy of a ham source is distinguishable from a
spam one...

2016-08-22 13:48 GMT-03:00 Antony Stone <
antony.st...@spamassassin.open.source.it>:

> On Monday 22 August 2016 at 18:00:35, Marc Perkel wrote:
>
> > On 08/22/16 07:37, Antony Stone wrote:
> > >
> > > So what makes "cheapest Viagra online" a token, such that "cheapest"
> and
> > > "online" are not tokens?
> >
> > They would all be tokens. Just pointing out one that would match spam
> > and not match ham. "cheapest" and "online" would likely be in both sets
> > and would be ignored.
>
> Hm, that doesn't tie up with your earlier reply:
>
> On Monday 22 August 2016 at 16:34:00, Marc Perkel wrote:
>
> > On 08/22/16 07:28, Dianne Skoll wrote:
> > > On Mon, 22 Aug 2016 07:16:41 -0700
> > >
> > > As far as I understand your algorithm, if an email contains at least
> one
> > > token in the "ham" set and zero tokens in the "spam" set, you classify
> it
> > > as ham.  And conversely, if it contains at least one spam token but
> zero
> > > ham tokens, you classify it as spam.
> >
> > YES! YES! YES!
>
> Er, really?  See below.
>
> > Although I look at some thousand "fingerprints" to get a more
> > significant result.
> >
> > > The other two possibilities (no tokens in either or some tokens in
> both)
> > > are undecidable.
> >
> > Exactly!
>
> So, it's not that "if an email contains at least one token in the 'ham' set
> and zero tokens in the 'spam' set, you classify it as ham".
>
> You in fact ignore any tokens in the email which are in both the 'ham' and
> 'spam' sets, and then - what - work out which set contains more of the
> left-
> over tokens?
>
>
> Antony.
>
> --
> Pavlov is in the pub enjoying a pint.
> The barman rings for last orders, and Pavlov jumps up exclaiming "Damn!  I
> forgot to feed the dog!"
>
>                                                    Please reply to the
> list;
>                                                          please *don't* CC
> me.
>

Reply via email to