Re: would SA benefit from port to Java

Justin Mason Sat, 18 Nov 2006 12:45:55 -0800

Giampaolo Tomassoni writes:
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> >
> > ...omissis
> >
> > Recently in the perl "blead" code, one of the perl hackers has added a
> > trie-based regexp matcher (with Aho-Corasick optimisations) to efficiently
> > match multiple regular expressions in parallel, to the perl core regexp
> > matching code.  That's pretty much what you're describing,
> 
> Yes, I think so too. I didn't know the name of such a beast. Aho-Corasick. It 
> should definitely work. How could something named Aho-Corasick not to work? :)
> 
> Thank you for naming it.


here's more info: http://en.wikipedia.org/wiki/Aho-Corasick .
it's a nice algorithm ;)

> > and I'm looking
> > into rewriting bits of SpamAssassin to take advantage of that (in the
> > "jm_re2c_hacks" branch).
> > 
> > Hopefully it will run faster than the current regexp matching system,
> > which is actually quite fast as it stands!  (The perl regular expression
> > matching engine is _very_ efficient.)
> > 
> > There's also an re2c-based version, which already outperforms basic
> > SpamAssassin by 15-20%, btw.
> > 
> > They almost definitely will not reduce memory usage, though. ;)
> 
> Mmmmh, I had the impression that all that strings being created, cloned, 
> used, merged and the like in spite of being fed to the regexes would be one 
> of the reasons of big memory usage. So, I'm wrong in this...
> 
> What's the memory-hungry piece of code, then?

The perl interpreter -- I think the compiled code itself is quite
memory-hungry, as far as I can see.

--j.

Re: would SA benefit from port to Java

Reply via email to