RE: [SAtalk] Mail arrival time may be a criteria

Larry Gilson Mon, 04 Aug 2003 07:17:49 -0700

Hey Justin,

Fuzzy Fox suggested a similar route.  The Bayes token is a great
possibility.  The tokens in this case would be time rather than words.

One way to accomplish this task is to just give local.cf assignments that
would score during a specific time interval.  This would allow the
administrators to adjust the time interval and score.  The only thing the
administrator needs to know is their spam/ham flow over time.

However, if one tries to automate this then thresholds need to be attained
automatically.  So even if Bayes learns the time, a filtration engine needs
to be able to analyze the spam/ham over time.  A time minimum time interval
needs to be met before thresholds can be attained.  I would venture a guess
that the minimum needs to be a week.  One might need to learning
distributions such as hours for Mon-Fri and hours for Sat-Sun.  It is easy
for a human to look at the graph
(http://www.gryzor.com/tools/spamstats-pics.html) and have the administrator
make assumptions and just enable this test, set the time interval, and be
able to override the default score.  However, if making this automatic one
needs to create a means of describing the spam bandwidth (variation in
message numbers over a 24 hour time period), ham bandwidth, average high
spam count to create an upper threshold, average low spam count to create a
lower threshold, etc.  What I am saying is that one really needs to describe
the ham/spam flow.  This quickly becomes a signal analysis problem.  Why?
Because if we don't describe the singals (ham/spam flow over time) then we
can quickly run into false positives.  This is especially true for
organizations that have ham that more closely looks like spam flow.  A
global organization that receives ham continuously throught the day would
have mail flow that looks very different from the provided graph.  Does
anyone agree with this or am I out to lunch?

Please don't get me wrong.  I think that if this can be done it would really
be great!  I would really love to see this done automatically and Bayes
tokens might just be the way to do this.  I think the graph provided is
probably more true for organizations (and maybe most individuals) than not.
However, there is no guarantee as there is not enough data to support any
conclusion.  I do believe that time would be a great indicator and that the
proper implementation is crucial for success.

If you decide to persue this I would love to help!  I remember some of my
signal analysis classes but my associated math knowledge has waned over
time.  Still, great problem and potentially a great test.

--Larry

> -----Original Message-----
> From: [EMAIL PROTECTED]

> Larry Gilson writes:
> >I believe there is a big problem awaiting those who 
> >generalize one graph.  This graph will most definitely change from
> >organization to organization.  The pattern, however may be
> >similar.  This poses a huge problem when trying to develop a
> >general rule for the SA community.
> 
> It may get good results if made into a token for the Bayes scanner.

-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

RE: [SAtalk] Mail arrival time may be a criteria

Reply via email to