Eric A. Hall wrote: > Thinking about the GPL Java announcement some, and trying to imagine the > kinds of opportunities this allows for, it occurs to me that SpamAssassin > might be a natural fit for Java. > > I'm just thinking out loud here, not advocating anything... > > Would it run better? Would it be faster, have smaller memory footprint, > better reclamation, better hooks for plugins etc? OTOH, would it be harder > to build, given the dependence of SA on perl modules? > There's been about a 3 dozen other folks who have asked about porting SA to C/C++/Java/Python/<Insert any other language here>.
In general, SA would suffer severely from a conversion to Java, or any other language. It all fundamentally boils down to two things: 1) perl has a substantial base of text parsing and utility libraries that no other language can match.. Java does have native regex support, so it has a leg up over the others, but it still lacks many of the libraries that SA is so heavily entrenched in. Do you know of any equivalent to IP::Country::Fast, for *ANY* other language? Admittedly that one is not used by everyone, but the MIME parsers, base64 decoders, HTML parser, Net::DNS, etc would be tough to find good matches for without having to write/maintain your own. This kind of text manipulation is what perl is actually very good at, and has lots of support libraries for. 2) Most importantly, consider that all of the existing devels that maintain the code are perl developers, and not all of them are Java developers. Poof, there goes at least some, if not all, of your development team down the tubes. This is by far the most significant hurdle. Who would we loose here, and can we afford to loose the spam-fighting expertise these people have? That said, I'm a C/C++/assembly developer myself, and my own personal reaction is "why would you want to convert from one lumbering hulk of a language with an expensive interpreter to another lumbering hulk of a language with an expensive VM." And yes, I know java is "JIT compiled" not interpreted, but AFAIK this is not as different from how perl works as you might think. Perl code isn't strictly interpreted from scratch every time you pass through the same code. Perl is really compiled and optimized at load time into bytecode, then interpreted from that. This makes perls startup much slower, but runtime isn't as slow as an interpreted language. As for size, perl interpreters and java VMs are both large. And yes, you can native compile java to machine code, but I doubt your gains here will be significant. My bets are on SA spending 99% of it's time in regex evaluation or network lookups. Regex execution is VERY well optimized in both languages even without native compilation, so that won't be helped much, if at all. Network lookups are basically spending their time waiting.. you can't wait any faster in machine code than a semi-interpreted application. I also expect a lot of the memory usage is the annotation tables and such for regexes. It would be interesting to compare the size of spamd without any rules loaded against one with a stock ruleset. The difference between the two can't really be improved by any means other than using a slower regex interpreter that doesn't use tables as extensively.