>>>>> "DP" == Daniel Pittman <[EMAIL PROTECTED]> writes:

>> Rule optimization is proceeding. You might find a better/faster regex
>> engine, but you'll probably have to re-optimize the rules for that
>> engine vs the perl engine. I think we're going to be focussing on

In one of my previous lives, we used a system called LMDS from this
large defense contractor called Logicon.  This system was essentially
designed from some intelligence agencies to scan large volumes of news
feeds rapidly.  Believe me, it was fast as anything I've ever seen.

What you did was define a profile for each interest area.  The profile
was a bunch of keywords to search for with limits such as "within N
words of" or "not when this other word appears", and the like.  What
the system then did was compile all of these profiles into one big
custom search object which matched all profiles simultaneously.  When
you fed in a message, it returned a list of the names of every profile
that it matched.

On a dual Pentium II 400MHz box running BSD/OS, I could cram about 300
messages averaging 2 to 4 kbytes into it to match against about 500
profiles and have the results within about 2 minutes.

It seems to me that such technology could be applied here, but I don't
know of any freely availble software to do such a merged search.  It
would make for a great masters project...

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: [EMAIL PROTECTED]       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to