Daniel Quinlan wrote: > Matt Sergeant <[EMAIL PROTECTED]> writes: > > >>If Craig would work on the email parser I posted to the dev list instead >>of MIME-tools, it decodes all character sets (even embedded ones in >>headers) to UTF-8, making detecting alternate character set stuff >>infinitely easier. > > > Which is the better long-term solution for SA in terms of features, > performance (especially the ability to not decode headers we don't > want to decode), and maintainability?
What headers might you want to not decode? I would guess if you've got very specific requirements, using your own custom mail parser is almost always going to be easier to extend. As far as performance is concerned, it's fast enough for me, but then I have pretty low performance requirements for what I'm using it for. I just timed it over a large-ish spam and non-spam archive, and it parsed and processed 1112 files in 83 seconds (a single thread) (i.e. about 13 per second), including all larger files (largest being about 1.6M, which IIRC took about 2 or 3 secs to parse). Server spec was a PIII 550 with 512M of ram. Matt. _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk