Re: Rewritten URIBL plugin

Jared Johnson Mon, 26 Jul 2010 00:15:18 -0700

>> - Introduces support for URIBL services that may not have worked right,
>> at
>> least out of the box, before.  Defines the subtle differences between
>> various known URIBL services in order to maximize compatibility.
>
> Is it worth pulling some of this config out of the code and putting it
> into some sort of config file?


Probably would be worth it, and not that difficult to do.  Any new config
scheme should probably be both backward compatible and non-spaghetti.  The
current scheme looks like:

multi.uribl.com 2,4 deny

the new plugin does add one additional feature, you can now do:

multi.uribl.com 2 deny
multi.uribl.com 4 log

but $zones defines a bunch of other things - labels for the zone in
general and specific masks, check_ip, check_host,
host_wildcards_supported, lookup_a_record, lookup_ns_record.  Ignoring the
label issue, all the other directives are related only to the zone;
perhaps we could add a new scheme on top of the mask definition scheme
(which we would continue to read in the old way), getting even further
away from the one-line-per-zone style:

multi.uribl.com action deny
multi.uribl.com check_ip 1
multi.uribl.com check_host 1
multi.uribl.com lookup_a_record 0
multi.uribl.com lookup_ns_record 0
multi.uribl.com host_wildcards_support 0

If nobody hates this scheme it does seem like something I could add pretty
easily send an updated copy

>> - Uses Net::DNS::Async to simplify the code, and also to ensure the
>> afore-mentioned A and NS lookups will prompt new URIBL lookups in an
>> efficient and simple manner via callbacks
>
> Does the code still work with the async qpsmtpd cores?

Well... I didn't change the existing plugin a whole lot; I did change from
data_handler() registered via register(), to plain old hook_data_post();
this may need changed considering that the comments above each sub (which
I also removed) suggested that hook_data_post() and register() were 'not
used' for async.  These things could easily be changed if needed.  However
we don't run async anywhere, not even in test environments, so I'm not
very qualified to know if they need changed, or give any guarantees.  And
I'm confused by that bit.  From what I can tell, the async/uribl plugin
ignores plugins/uribl entirely and uses
Qpsmtpd::Plugins::Async::DNSBLBase, which in turn uses ParaDNS.

Now I noticed the async rhsbl plugin used Q::P::A::D and that this used
ParaDNS before and looked at it briefly to see if I could use it for my
purposes, but at the top of the docs it's mentioned that only A lookups
are supported, so I skipped it and went to Net::DNS::Async.  I don't
really know the strongpoints of N::D::A vs. ParaDNS but it did note that
the latter more expressly claims to be interested in performance and
scalability.  So in conclusion:

(1) I think I'll look into using ParaDNS and maybe even
Q::P::Async::DNSBLBase if appropriate, even though my plugin is not in
async; it might still have advantages, and the interesting logic is in the
callbacks and not the module use anyway
(2) I don't know if it matters whether this particular plugin works with
async, I guess it depends on whether the old one really ever did.  At
least Net::DNS::Async should not make it any more blocking than the old
plugin's collect_results().  At any rate somebody who plays in async
territory should probably comment before any action is taken on the new
plugin :)

>> Attached also is tld_lists.pl, a companion file that needs to be dropped
>> in lib/Qpsmtpd/ which provides the list of first, second, and third
>> level
>> TLDs that we care about.  It's derived from our URIBL datafeed as well
>> as
>
> Do the owners of that data care about it being used this way?  You may
> be violating any agreement with them.  Would they be ok if this was
> released as an independent CPAN module?    Either way, can we
> structure this as an API instead of just an include file?

(1) I suspect that anyone who runs a URIBL service would be thrilled with
this information being available for this particular use, because it
minimizes unnecessary queries to their services.  Consider the old method
that sent 2 or 3 queries instead of one because it didn't know which one
was right.  The fact that SURBL and URIBL both publish the two and three
level lists and encourage their use is evidence of this :)

(2) First-level TLDs are public information, of course, and with the
two-level and three-level lists being public, the interesting business is
that the first-level list and the SURBL second and third level lists are
pruned based on URIBL's data.  So it comes down to whether anyone can be
considered to have a claim on the privilege of knowing how to *exclude*
results.  I guess maybe they could if it came down to it...

(3) I'm pretty sure I could get URIBL's explicit permission, but it would
take a long time.  The founder and sole owner of URIBL was also the
original author of DoubleCheck, so I feel we would have a bit of an in :) 
However even so he always takes forever to get back to us in
correspondence.  For instance we're still waiting around to hear from him
since I noticed, while working on the script that generated tld_lists.pl,
that our datafeed is no longer updating.  Maybe he 'fixed the glitch'...
:P

> It would be cool if it was configurable (i.e. use MIME::Parser or not.)

That does sound handy, maybe I'll work on that soon.  I think if one does
go back to the 'old' method, it would be good to skip any lines that
contain no whitespace, or some similar check that passes the
3-million-line test, so that we no longer pointlessly scan reams of
base64-encoded data etc. for URI's

-Jared

Re: Rewritten URIBL plugin

Reply via email to