Hi Regis,

> -----Original Message-----
> From: C. Regis Wilson

> HOWEVER!  Your graph is excellent and points out a flaw with 
> your (and my) suggestion.  The graph clearly shows when peak Ham
> flows.  It does not tell you when Spam flows!  We're looking at the
> problem backward.  :)  You see, Spam flows continuously on average,
> while Ham flows in big bursts.

I believe there is a big problem awaiting those who generalize one graph.
This graph will most definitely change from organization to organization.
The pattern, however may be similar.  This poses a huge problem when trying
to develop a general rule for the SA community.

> 
> One could use some frequency analysis to try to come up with 
> better rules than a timestamp-only rule which would only be riddled
> with tons of false-positives and false-negatives.  A time-stamp
> rule (by definition, since all mails have timestamps) would be an
> almost worthless in terms of separating mails. How about (some
> ideas)

Spam flow is continuous and analogous to white noise.  Basing it only on
time in which ham tends to peak will produce false positives unless you
graduate scores to timeframes in smaller intervals.  Looking at the graph as
a signal would imply that scoring will be ineffective until the RMS value of
the ham to spam exceeds 1.0.  Another way to look at it, from a power
perspective, would be to say that the times in which ham is 50% of the peak,
taking zero to be from the average of a natural spam bias average, is a good
indication of non-spam.

1) Messages arriving in the +50% power range should subtract from the score.

2) Messages arriving in the white-noise range should neither subtract nor
add to the score.

3) Scoring arriving in the -50% power range should add to the score.

The question comes as to the practical application of this to the creation
of three general rules.  Each organization would have to adapt the time
intervals to their specific experience and each organization would have to
have the capability to change these intervals over time as the spam/ham
signals change.  Taking the 50% points would buffer from fluctuations in
spam flow but the buffer is only good within certain tolerances.

While I believe the solution to this problem is attainable, I do not believe
it would be in the form of three rules that would be effective over time.

--Larry



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to