From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > ...omissis > > Recently in the perl "blead" code, one of the perl hackers has added a > trie-based regexp matcher (with Aho-Corasick optimisations) to efficiently > match multiple regular expressions in parallel, to the perl core regexp > matching code. That's pretty much what you're describing,
Yes, I think so too. I didn't know the name of such a beast. Aho-Corasick. It should definitely work. How could something named Aho-Corasick not to work? :) Thank you for naming it. > and I'm looking > into rewriting bits of SpamAssassin to take advantage of that (in the > "jm_re2c_hacks" branch). > > Hopefully it will run faster than the current regexp matching system, > which is actually quite fast as it stands! (The perl regular expression > matching engine is _very_ efficient.) > > There's also an re2c-based version, which already outperforms basic > SpamAssassin by 15-20%, btw. > > They almost definitely will not reduce memory usage, though. ;) Mmmmh, I had the impression that all that strings being created, cloned, used, merged and the like in spite of being fed to the regexes would be one of the reasons of big memory usage. So, I'm wrong in this... What's the memory-hungry piece of code, then? giampaolo > > --j.