Craig, | MM> This is one of the three reasons why amavisd-new (a MTA-to-virus scanner | MM> interface, which can call SA as well) prefers to call Razor by itself | MM> and not letting SA do it: | | Note that doing it this way instead of inside SA means that razor is being | used as a black/white decision criterion, rather than as | just-another-scored-rule in SA. SA assigns razor a score of 3.0 by | default, which means that by itself a razor positive is not enough to flag | a message as spam. Now probably, the score should be higher (4? 4.5?), | but part of the strength of SA is that it doesn't tend to flag things based | on a single criterion. | | I do like the concept of only calling Razor for messages bigger than a | certain size. I think maybe I'll add that to SA. Well, I'll file a | bugzilla ticket for now, maybe fix it later.
amavisd-new does the same thing with Razor score as SA does when it calls Razor by itself, i.e. adds 3 to SA score (get_hits) if Razor returns positive, and checks the sum against get_required_hits(). I didn't explain that in detail, and you misinterpreted it, sorry. Here is how it works in amavisd-new (when SA support is enabled). (note that amavisd-new is normally positioned at the mail gateway, and not as a (or before/instead of a) final user local delivery agent): - compute a SHA1 Razor hash of the message body by directly calling Razor::String::hash giving it a file handle. This is done early and directly (not later by Razor during check) for the following reasons: * the hash is always added (even for large mail or skipped spam checks) to the message as a X-Razor-id: header line, facilitating user reporting spam later to Razor servers - this way we can avoid user reporting incorrect SHA1 hash due to a message being possibly mangled later by some broken (or too smart for its own good) UA or filter or HTML-izer; * the hash may be used for caching: if we receive the same message content in short succession (e.g. mailing lists, some forms of spam), we can avoid virus-checking and spam-scanning it over and over again; (this is certainly true for viruses, and perhaps not strictly true for spam because header lines may be different, but is still good enough and useful for all practical purposes); * depending on the configuration, spam-scanning can be skipped for certain groups of users, but they can still benefit from having Razor hash already computed for them and available in the header; * having message already stored on a file, and the file open, computing Razor SHA1 hash is relatively cheap, even for very long mail. - call virus checkers - if a virus is found, we skip spam checks - call Razor::Client::check, but limiting Razor lookup time to just a few seconds (e.g. 7). Razor server normally returns result in a second, and if after a few seconds we do not receive the response, it is very likely we will never get it, so for a busy mail gateway it is imperative to move on, and not waste 30 seconds or more waiting; Also Razor check is skipped for very short mail (zero or one lines of body), as there are too many false positives there - assigning a smaller score in such case than the usual Razor score, and still calling Razor, would probably be even better. - call Mail::SpamAssassin::NoMailAudit->new, but only for messages that are not too long (I chose 64k limit after analyzing size distribution of my spam collention - less than 1% of spam were over 64k); - now we evaluate the result: * normally we obtained both the Razor result _and_ the SA result, we combine them by adding scores and testing against SA threshold, just like if SA would have called Razor by itself; * if Razor result is missing (server timeout, short message), we believe SA; * if SA result is missing (e.g. message too long) and we do have Razor result, we believe whatever Razor tells us. This is certainly much better than nothing, but is good to have a whitelist of the most important mailing lists we are receiving, as this is Razor's weak spot (false positives for mailing lists). This brings us to the need for having whitelist for SMTP sender envelopes available, which is a topic of my other mail (still owe a reply to it - tomorrow), and I won't discuss it here. In summary: - we limit the time required to spam-check a message to a reasonable value, and still keep some level of protection for very long messages; - we always have X-Razor-id: line inserted in the mail header, facilitating user reporting spam; Now my wish list for SA, hopefully eliminating the need for my kludge: - have a mechanism for passing Razor SHA1 hash, either from the application to SA, or the other way around, obtaining it from SA; - have a separately configurable (short) time limit in SA for Razor check; - provide some mechanism to cut-off expensive tests for long messages, but still perform some basic tests (e.g. on headers, Razor, RBL, etc) - (another topic, but here for completeness) provide a mechanism (API) for an application to pass SMTP envelope information (if available) to SA - (optional, to allow hacks like mine) povide API for obtaining individual scores from the SA configuration (to go along with get_hits(), and get_required_hits() ), avoiding the need for application using SA to have to parse SA config files or hard-wire numbers. Regards Mark -- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! Mark Martinec (system manager) tel +386 1 4773-575 !! !! J. Stefan Institute, Jamova 39 fax +386 1 2519-385 !! !! SI-1000 Ljubljana, Slovenia [EMAIL PROTECTED] !! !!!!!!!!!!!!!!!!!!!!!!!!!! http://www.ijs.si/people/mark/ !!!! _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk