Eric A. Hall wrote:
> Thinking about the GPL Java announcement some, and trying to imagine the
> kinds of opportunities this allows for, it occurs to me that SpamAssassin
> might be a natural fit for Java.
>
> I'm just thinking out loud here, not advocating anything...
>
> Would it run better? Would it be faster, have smaller memory footprint,
> better reclamation, better hooks for plugins etc? OTOH, would it be harder
> to build, given the dependence of SA on perl modules?
>   
There's been about a 3 dozen other folks who have asked about porting SA
to C/C++/Java/Python/<Insert any other language here>.

In general, SA would suffer severely from a conversion to Java, or any
other language.

It all fundamentally boils down to two things:

1) perl has a substantial base of text parsing and utility libraries
that no other language can match.. Java does have native regex support,
so it has a leg up over the others, but it still lacks many of the
libraries that SA is so heavily entrenched in. Do you know of any
equivalent to IP::Country::Fast, for *ANY* other language? Admittedly
that one is not used by everyone, but the MIME parsers, base64 decoders,
HTML parser, Net::DNS, etc would be tough to find good matches for
without having to write/maintain your own. This kind of text
manipulation is what perl is actually very good at, and has lots of
support libraries for.

2) Most importantly, consider that all of the existing devels that
maintain the code are perl developers, and not all of them are Java
developers. Poof, there goes at least some, if not all, of your
development team down the tubes. This is by far the most significant
hurdle. Who would we loose here, and can we afford to loose the
spam-fighting expertise these people have?

That said, I'm a C/C++/assembly developer myself, and my own personal
reaction is "why would you want to convert from one lumbering hulk of a
language with an expensive interpreter to another lumbering hulk of a
language with an expensive VM." And yes, I know java is "JIT compiled"
not interpreted, but AFAIK this is not as different from how perl works
as you might think. Perl code isn't strictly interpreted from scratch
every time you pass through the same code. Perl is really compiled and
optimized at load time into bytecode, then interpreted from that. This
makes perls startup much slower, but runtime isn't as slow as an
interpreted language. As for size, perl interpreters and java VMs are
both large.

And yes, you can native compile java to machine code, but I doubt your
gains here will be significant.

My bets are on SA spending 99% of it's time in regex evaluation or
network lookups. Regex execution is VERY well optimized in both
languages even without native compilation, so that won't be helped much,
if at all. Network lookups are basically spending their time waiting..
you can't wait any faster in machine code than a semi-interpreted
application.

I also expect a lot of the memory usage is the annotation tables and
such for regexes. It would be interesting to compare the size of spamd
without any rules loaded against one with a stock ruleset. The
difference between the two can't really be improved by any means other
than using a slower regex interpreter that doesn't use tables as
extensively.









Reply via email to