Re: [SAtalk] Razor checks from SA (was: Spam Tracking)

Mark Martinec Thu, 23 May 2002 12:24:51 -0700

Craig,

| MM> This is one of the three reasons why amavisd-new (a MTA-to-virus scanner
| MM> interface, which can call SA as well) prefers to call Razor by itself
| MM> and not letting SA do it:
| 
| Note that doing it this way instead of inside SA means that razor is being
| used as a black/white decision criterion, rather than as
| just-another-scored-rule in SA.  SA assigns razor a score of 3.0 by
| default, which means that by itself a razor positive is not enough to flag
| a message as spam.  Now probably, the score should be higher (4?  4.5?),
| but part of the strength of SA is that it doesn't tend to flag things based
| on a single criterion. 
| 
| I do like the concept of only calling Razor for messages bigger than a
| certain size.  I think maybe I'll add that to SA.  Well, I'll file a
| bugzilla ticket for now, maybe fix it later.


amavisd-new does the same thing with Razor score as SA does when
it calls Razor by itself, i.e. adds 3 to SA score (get_hits) if Razor
returns positive, and checks the sum against get_required_hits().
I didn't explain that in detail, and you misinterpreted it, sorry.

Here is how it works in amavisd-new (when SA support is enabled).
(note that amavisd-new is normally positioned at the mail gateway,
and not as a (or before/instead of a) final user local delivery agent):

- compute a SHA1 Razor hash of the message body by directly calling
  Razor::String::hash giving it a file handle. This is done early
  and directly (not later by Razor during check) for the following reasons:

  * the hash is always added (even for large mail or skipped spam checks)
    to the message as a X-Razor-id: header line, facilitating user reporting
    spam later to Razor servers - this way we can avoid user reporting
    incorrect SHA1 hash due to a message being possibly mangled later by
    some broken (or too smart for its own good) UA or filter or HTML-izer;

  * the hash may be used for caching: if we receive the same message
    content in short succession (e.g. mailing lists, some forms of spam),
    we can avoid virus-checking and spam-scanning it over and over again;
    (this is certainly true for viruses, and perhaps not strictly true
    for spam because header lines may be different, but is still
    good enough and useful for all practical purposes);

  * depending on the configuration, spam-scanning can be skipped for certain
    groups of users, but they can still benefit from having Razor hash
    already computed for them and available in the header;

  * having message already stored on a file, and the file open,
    computing Razor SHA1 hash is relatively cheap, even for very long mail.


- call virus checkers - if a virus is found, we skip spam checks

- call Razor::Client::check, but limiting Razor lookup time to
  just a few seconds (e.g. 7). Razor server normally returns result
  in a second, and if after a few seconds we do not receive the response,
  it is very likely we will never get it, so for a busy mail gateway
  it is imperative to move on, and not waste 30 seconds or more waiting;
  Also Razor check is skipped for very short mail (zero or one lines
  of body), as there are too many false positives there - assigning a
  smaller score in such case than the usual Razor score, and still
  calling Razor, would probably be even better.

- call Mail::SpamAssassin::NoMailAudit->new, but only for messages
  that are not too long (I chose 64k limit after analyzing size distribution
  of my spam collention - less than 1% of spam were over 64k);

- now we evaluate the result:

  * normally we obtained both the Razor result _and_ the SA result,
    we combine them by adding scores and testing against SA threshold,
    just like if SA would have called Razor by itself;

  * if Razor result is missing (server timeout, short message),
    we believe SA;

  * if SA result is missing (e.g. message too long) and we do have Razor
    result, we believe whatever Razor tells us. This is certainly much
    better than nothing, but is good to have a whitelist of the most
    important mailing lists we are receiving, as this is Razor's weak
    spot (false positives for mailing lists).  This brings us to the need
    for having whitelist for SMTP sender envelopes available, which is
    a topic of my other mail (still owe a reply to it - tomorrow),
    and I won't discuss it here.

In summary:
- we limit the time required to spam-check a message to a reasonable value,
  and still keep some level of protection for very long messages;
- we always have X-Razor-id: line inserted in the mail header,
  facilitating user reporting spam;


Now my wish list for SA, hopefully eliminating the need for my kludge:

- have a mechanism for passing Razor SHA1 hash, either from the
  application to SA, or the other way around, obtaining it from SA;

- have a separately configurable (short) time limit in SA for Razor check;

- provide some mechanism to cut-off expensive tests for long messages,
  but still perform some basic tests (e.g. on headers, Razor, RBL, etc)

- (another topic, but here for completeness) provide a mechanism (API)
  for an application to pass SMTP envelope information (if available) to SA
  
- (optional, to allow hacks like mine) povide API for obtaining
  individual scores from the SA configuration (to go along with get_hits(),
  and get_required_hits() ), avoiding the need for application using SA
  to have to parse SA config files or hard-wire numbers.

Regards
   Mark

-- 
  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  !!  Mark Martinec (system manager)     tel  +386 1 4773-575 !!
  !!  J. Stefan Institute, Jamova 39     fax  +386 1 2519-385 !!
  !!  SI-1000 Ljubljana, Slovenia        [EMAIL PROTECTED] !!
  !!!!!!!!!!!!!!!!!!!!!!!!!! http://www.ijs.si/people/mark/ !!!!

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] Razor checks from SA (was: Spam Tracking)

Reply via email to