Hi Regis, > -----Original Message----- > From: C. Regis Wilson
> HOWEVER! Your graph is excellent and points out a flaw with > your (and my) suggestion. The graph clearly shows when peak Ham > flows. It does not tell you when Spam flows! We're looking at the > problem backward. :) You see, Spam flows continuously on average, > while Ham flows in big bursts. I believe there is a big problem awaiting those who generalize one graph. This graph will most definitely change from organization to organization. The pattern, however may be similar. This poses a huge problem when trying to develop a general rule for the SA community. > > One could use some frequency analysis to try to come up with > better rules than a timestamp-only rule which would only be riddled > with tons of false-positives and false-negatives. A time-stamp > rule (by definition, since all mails have timestamps) would be an > almost worthless in terms of separating mails. How about (some > ideas) Spam flow is continuous and analogous to white noise. Basing it only on time in which ham tends to peak will produce false positives unless you graduate scores to timeframes in smaller intervals. Looking at the graph as a signal would imply that scoring will be ineffective until the RMS value of the ham to spam exceeds 1.0. Another way to look at it, from a power perspective, would be to say that the times in which ham is 50% of the peak, taking zero to be from the average of a natural spam bias average, is a good indication of non-spam. 1) Messages arriving in the +50% power range should subtract from the score. 2) Messages arriving in the white-noise range should neither subtract nor add to the score. 3) Scoring arriving in the -50% power range should add to the score. The question comes as to the practical application of this to the creation of three general rules. Each organization would have to adapt the time intervals to their specific experience and each organization would have to have the capability to change these intervals over time as the spam/ham signals change. Taking the 50% points would buffer from fluctuations in spam flow but the buffer is only good within certain tolerances. While I believe the solution to this problem is attainable, I do not believe it would be in the form of three rules that would be effective over time. --Larry ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk