Giampaolo Tomassoni writes: > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > > > ...omissis > > > > Recently in the perl "blead" code, one of the perl hackers has added a > > trie-based regexp matcher (with Aho-Corasick optimisations) to efficiently > > match multiple regular expressions in parallel, to the perl core regexp > > matching code. That's pretty much what you're describing, > > Yes, I think so too. I didn't know the name of such a beast. Aho-Corasick. It > should definitely work. How could something named Aho-Corasick not to work? :) > > Thank you for naming it.
here's more info: http://en.wikipedia.org/wiki/Aho-Corasick . it's a nice algorithm ;) > > and I'm looking > > into rewriting bits of SpamAssassin to take advantage of that (in the > > "jm_re2c_hacks" branch). > > > > Hopefully it will run faster than the current regexp matching system, > > which is actually quite fast as it stands! (The perl regular expression > > matching engine is _very_ efficient.) > > > > There's also an re2c-based version, which already outperforms basic > > SpamAssassin by 15-20%, btw. > > > > They almost definitely will not reduce memory usage, though. ;) > > Mmmmh, I had the impression that all that strings being created, cloned, > used, merged and the like in spite of being fed to the regexes would be one > of the reasons of big memory usage. So, I'm wrong in this... > > What's the memory-hungry piece of code, then? The perl interpreter -- I think the compiled code itself is quite memory-hungry, as far as I can see. --j.