A few weeks ago I described a technique to automatically convert a list of strings into a factored regexp for faster matching.
You know, from foobat foobang fooziit to foo(bat|bang|ziit) Well, I've got a prototype complete and available here: http://www.cs.rice.edu/~scrosby/datamining/src/prefixStringFactor/ Binary is for linux x86. I'll put source up eventually. Pass it a bunch of ordinary strings on successive lines as input, and each line of output is a seperate rule. You don't want to use escaped strings or prefixes and suffixes like the test file shown below, but its what I had. If you're matching URL's, I suggest folding the URL list to lowercase first, and using case-insensitive matching. Its fully automatic and fairly sophisticated though it will look silly on small files. I don't implement right-factoring or greedy left factoring yet. For instance: /zrowlandtzq\.com/i /zsoftech\.net/i /zsupper\.com/i /zui6av\.net/i /zunoz\.com/i /zuon6\.net/i /zvg3gc\.org/i /zwdsj\.org/i /zworg\.com/i /zzitq5\.net/i TO /ze(roads\.com/i|dnet\.net/i|sty\.ws/i|belkhan\.com/i|nitzenit\.com/i|n1ado\.com/i|nmail2003\.com/i) /za(irmail\.com/i|ushon\.com/i|xouts\.com/i|meq\.org/i|karish\.com/i|qxsw\.biz/i) /zo(ontzq\.com/i|rromail\.com/i|anmail\.com/i|mnieb\.com/i|ne-net\.net/i|ningfor-best\.com/i) /zi(04\.com/i|m-crozer\.net/i|p-media\.com/i|yuantzq\.com/i|bxr\.com/i) /z(worg\.com/i|wdsj\.org/i|hupong\.com/i|hangxiaoping\.com/i|hangnian\.com/i|vg3gc\.org/i|unoz\.com/i|uon6\.net/i|ui6av\.net/i|supper\.com/i| softech\.net/i|dl\.net/i|7wmcsp\.com/i) /z(rowlandtzq\.com/i|re9iq\.net/i|ckzh\.net/i|qlp\.com/i|q89\.org/i|bestoffer\.com/i|ppi\.org/i|3i26up\.org/i|n8px\.com/i|nolt\.net/i|ncvma\. org/i|2p\.net/i|mqp\.net/i|m01\.net/i|kpc\.net/i|khatritzq\.com/i|zitq5\.net/i|jzm\.net/i|jwju\.org/i|jfe\.com/i) /yu(f7b89\.com/i|ictme1s2g5jph\.org/i|78hg\.com/i|aln38\.org/i|noz\.biz/i) /ye(6tj\.com/i|llowtang\.net/i|ah\.net/i|arendsaver\.com/i|smail\.com/i|smail\.net/i|ez\.org/i) /youn(gfaster\.biz/i|gforever22\.com/i|gandhorny\.us/i|gandthin\.biz/i|gpinkpussies\.com/i|gerfasternow\.biz/i) /yourf(avoritepresent\.com/i|avoritestuff\.com/i|reelunch\.com/i|reepresent\.com/i|reevitamins\.com/i) /yourd(omain\.biz/i|omain\.com/i|vdrentalstore\.com/i|ebt\.com/i) /yourb(ig\.com/i|igfun\.com/i|izinformation\.com/i|randsdirect\.net/i|argainbuddy\.com/i|estsavings\.com/i) /yourm(ailsource\.com/i|arketnews\.com/i|edicinecabinet\.biz/i|eds\.biz/i|edstore\.us/i) ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk