Nice stuff - have you got any benchmarks to prove it's all worthwhile?

Cheers,

Phil

---------------------------------------------
Phil Randal
Network Engineer
Herefordshire Council
Hereford, UK 

> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] 
> Behalf Of Scott
> A Crosby
> Sent: 20 January 2004 05:50
> To: [EMAIL PROTECTED]; Chris Santerre
> Subject: [SAtalk] Matching a list of strings quickly.
> 
> 
> A few weeks ago I described a technique to automatically convert a
> list of strings into a factored regexp for faster matching. 
> 
> You know, from
> 
>   foobat
>   foobang
>   fooziit
> 
> to
> 
>   foo(bat|bang|ziit)
> 
> Well, I've got a prototype complete and available here:
> 
>    http://www.cs.rice.edu/~scrosby/datamining/src/prefixStringFactor/
> 
> Binary is for linux x86. I'll put source up eventually.
> 
> Pass it a bunch of ordinary strings on successive lines as input, and
> each line of output is a seperate rule. You don't want to use escaped
> strings or prefixes and suffixes like the test file shown below, but
> its what I had. If you're matching URL's, I suggest folding the URL
> list to lowercase first, and using case-insensitive matching.
> 
> Its fully automatic and fairly sophisticated though it will look silly
> on small files. I don't implement right-factoring or greedy left
> factoring yet.
> 
> For instance:
> 
> /zrowlandtzq\.com/i
> /zsoftech\.net/i
> /zsupper\.com/i
> /zui6av\.net/i
> /zunoz\.com/i
> /zuon6\.net/i
> /zvg3gc\.org/i
> /zwdsj\.org/i
> /zworg\.com/i
> /zzitq5\.net/i
> 
> 
> TO
> 
> /ze(roads\.com/i|dnet\.net/i|sty\.ws/i|belkhan\.com/i|nitzenit
> \.com/i|n1ado\.com/i|nmail2003\.com/i)
> /za(irmail\.com/i|ushon\.com/i|xouts\.com/i|meq\.org/i|karish\
> .com/i|qxsw\.biz/i)
> /zo(ontzq\.com/i|rromail\.com/i|anmail\.com/i|mnieb\.com/i|ne-
> net\.net/i|ningfor-best\.com/i)
> /zi(04\.com/i|m-crozer\.net/i|p-media\.com/i|yuantzq\.com/i|bx
> r\.com/i)
> /z(worg\.com/i|wdsj\.org/i|hupong\.com/i|hangxiaoping\.com/i|h
> angnian\.com/i|vg3gc\.org/i|unoz\.com/i|uon6\.net/i|ui6av\.net
> /i|supper\.com/i|
> softech\.net/i|dl\.net/i|7wmcsp\.com/i)
> /z(rowlandtzq\.com/i|re9iq\.net/i|ckzh\.net/i|qlp\.com/i|q89\.
> org/i|bestoffer\.com/i|ppi\.org/i|3i26up\.org/i|n8px\.com/i|no
> lt\.net/i|ncvma\.
> org/i|2p\.net/i|mqp\.net/i|m01\.net/i|kpc\.net/i|khatritzq\.co
> m/i|zitq5\.net/i|jzm\.net/i|jwju\.org/i|jfe\.com/i)
> /yu(f7b89\.com/i|ictme1s2g5jph\.org/i|78hg\.com/i|aln38\.org/i
> |noz\.biz/i)
> /ye(6tj\.com/i|llowtang\.net/i|ah\.net/i|arendsaver\.com/i|sma
> il\.com/i|smail\.net/i|ez\.org/i)
> /youn(gfaster\.biz/i|gforever22\.com/i|gandhorny\.us/i|gandthi
> n\.biz/i|gpinkpussies\.com/i|gerfasternow\.biz/i)
> /yourf(avoritepresent\.com/i|avoritestuff\.com/i|reelunch\.com
> /i|reepresent\.com/i|reevitamins\.com/i)
> /yourd(omain\.biz/i|omain\.com/i|vdrentalstore\.com/i|ebt\.com/i)
> /yourb(ig\.com/i|igfun\.com/i|izinformation\.com/i|randsdirect
> \.net/i|argainbuddy\.com/i|estsavings\.com/i)
> /yourm(ailsource\.com/i|arketnews\.com/i|edicinecabinet\.biz/i
> |eds\.biz/i|edstore\.us/i)
> 
> 
> -------------------------------------------------------
> The SF.Net email is sponsored by EclipseCon 2004
> Premiere Conference on Open Tools Development and Integration
> See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
> http://www.eclipsecon.org/osdn
> _______________________________________________
> Spamassassin-talk mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
> 


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to