Alex wrote:
> It turned out to be a bit of local config,

Care to share the specifics?  I can't think of any SA configuration that
might trigger this, TBH.

> but mostly not expecting it
> to take so long to check() a single message. I'm sorry for the
> trouble; perhaps I was impatient due to not understanding the SA perl
> API.

*nod*  Actually, until checking more closely due to this thread, I
wasn't aware that this was doing a full SA ruleset run on the message.
I could arguably fiddle things to use a (mostly) empty configuration to
speed things up, but at a bare minimum you would still need to have the
trust path available as local configuration.

> Do you think there's anything that can be done to further automate the
> process of inspecting the IP addresses it produces? If I understand
> your docs correctly, it's an entirely manual process with inserting
> them into your web cgi and determining whether it should be added to
> your DNSBL, correct?

Well, all you get back from my script is a list of IP addresses and hit
counts (or IPs formatted for dropping into tinydns).  There's no further
direct information about whether that IP might be a legitimate mail
system that just passed on a spam, or a compromised Exchange server, or
a previously unidentified white-hat ESP who got scammed long enough for
some spammer to send a blast of mail.

Internal whitelist notwithstanding, I regularly find IPs that probably
shouldn't be listed on a blacklist (even one that's just scored in SA),
and I prefer to keep the final decision in human hands.  If I were
seeing 10x the volume I might add more automation to try to detect some
of these cases, and live with causing FPs on the leftovers.

For some types of nominally legitimate relay systems, I've found it
useful to be informed about the IP, so I can go back and look up the
message in order to report it as spam to the relay provider.

> I'm finding that by the time I can collect the FNs and run the script
> on them, a few spot-checks show they're already listed in zen.

I don't bother checking IPs against other lists;  one of the key uses I
put the data to is in a "thin" SA instance that only checks DNSBLs and a
handful of very accurate stock rules, all scored a little higher than
stock.  (Less than 30 rules, in total.)  This skims off 80%+ of the junk
(that would otherwise score 30+ against the full ruleset) at a fraction
of the processing cost - and multiple DNSBL hits on the same bit of data
are one of the key indicators here.

> I've taken a handful of IPs that it's produced and added them to a
> postfix client restriction, which has worked quite well, until I can
> find the time to implement the local DNSBL.
> 
> I'd like to be able to enhance the script to run it against a
> whitelist of IPs

Long on the to-do list, but not (yet) important enough (to me) to
implement, is to (ab)use the database to flag netblocks and/or IPs as
"white" instead of "black".

> or some other way to make this process less manually
> intensive.

TBH there's not much more I would automate out of the workflow I use.  A
human decision-maker is required in the loop to double-check the
sorting;  I've lost count of the number of times I've missed a
spammy-looking legitimate message that I recovered for proper handling
while checking the extracted IPs (and less frequently, URIs).

> For now, I'm just picking a few that have been reported
> most frequently and checking them manually in my mail logs, and adding
> them if necessary.

Out of the box, the intent is to use it for scoring in SA;  if you get a
cluster of IPs reported in a leaf allocation, they will have one flag
set for the IP itself, and may have up to 3 others set entirely
automatically for the direct registrar allocation, one intermediate
reallocation, and the final leaf assignment.

The key is to feed enough data in that you start getting the
IP-count-for-netblock-size thresholds passed, so that new sending IPs at
least get some penalty with the SA scoring.

-kgd

Reply via email to