Re: Rewritten URIBL plugin

Robert Spier Mon, 26 Jul 2010 15:51:04 -0700


Jared Johnson wrote:
> 
> >> - Introduces support for URIBL services that may not have worked right,
> >> at
> >> least out of the box, before.  Defines the subtle differences between
> >> various known URIBL services in order to maximize compatibility.
> >
> > Is it worth pulling some of this config out of the code and putting it
> > into some sort of config file?
> 
> Probably would be worth it, and not that difficult to do.  Any new config
> scheme should probably be both backward compatible and non-spaghetti.  The
> current scheme looks like:
> 
> multi.uribl.com 2,4 deny
> 
> the new plugin does add one additional feature, you can now do:
> 
> multi.uribl.com 2 deny
> multi.uribl.com 4 log
> 
> but $zones defines a bunch of other things - labels for the zone in
> general and specific masks, check_ip, check_host,
> host_wildcards_supported, lookup_a_record, lookup_ns_record.  Ignoring the
> label issue, all the other directives are related only to the zone;
> perhaps we could add a new scheme on top of the mask definition scheme
> (which we would continue to read in the old way), getting even further
> away from the one-line-per-zone style:
> 
> multi.uribl.com action deny
> multi.uribl.com check_ip 1
> multi.uribl.com check_host 1
> multi.uribl.com lookup_a_record 0
> multi.uribl.com lookup_ns_record 0
> multi.uribl.com host_wildcards_support 0
> 
> If nobody hates this scheme it does seem like something I could add pretty
> easily send an updated copy


I don't hate it, FWIW.  :)

Sticking with one line per zone is NOT a requirement.  (Heck, I'd even
be willing to go so far as saying we could go to a even more
complicated config scheme.  One of qpsmtpd's unwritten goals is to NOT
pull in half of CPAN to run, but we could expand a little.)

> 
> >> - Uses Net::DNS::Async to simplify the code, and also to ensure the
> >> afore-mentioned A and NS lookups will prompt new URIBL lookups in an
> >> efficient and simple manner via callbacks
> >
> > Does the code still work with the async qpsmtpd cores?
> 
> Well... I didn't change the existing plugin a whole lot; I did change from
> data_handler() registered via register(), to plain old hook_data_post();
> this may need changed considering that the comments above each sub (which
> I also removed) suggested that hook_data_post() and register() were 'not
> used' for async.  These things could easily be changed if needed.  However
> we don't run async anywhere, not even in test environments, so I'm not
> very qualified to know if they need changed, or give any guarantees.  And
> I'm confused by that bit.  From what I can tell, the async/uribl plugin
> ignores plugins/uribl entirely and uses
> Qpsmtpd::Plugins::Async::DNSBLBase, which in turn uses ParaDNS.

In that case, it's possible (and likely) that it may never have
worked with async.  When time is short, it's easy to just ask questions :)

> 
> Now I noticed the async rhsbl plugin used Q::P::A::D and that this used
> ParaDNS before and looked at it briefly to see if I could use it for my
> purposes, but at the top of the docs it's mentioned that only A lookups
> are supported, so I skipped it and went to Net::DNS::Async.  I don't
> really know the strongpoints of N::D::A vs. ParaDNS but it did note that
> the latter more expressly claims to be interested in performance and
> scalability.  So in conclusion:
> 
> (1) I think I'll look into using ParaDNS and maybe even
> Q::P::Async::DNSBLBase if appropriate, even though my plugin is not in
> async; it might still have advantages, and the interesting logic is in the
> callbacks and not the module use anyway
> (2) I don't know if it matters whether this particular plugin works with
> async, I guess it depends on whether the old one really ever did.  At
> least Net::DNS::Async should not make it any more blocking than the old
> plugin's collect_results().  At any rate somebody who plays in async
> territory should probably comment before any action is taken on the new
> plugin :)
> 
> >> Attached also is tld_lists.pl, a companion file that needs to be dropped
> >> in lib/Qpsmtpd/ which provides the list of first, second, and third
> >> level
> >> TLDs that we care about.  It's derived from our URIBL datafeed as well
> >> as
> >
> > Do the owners of that data care about it being used this way?  You may
> > be violating any agreement with them.  Would they be ok if this was
> > released as an independent CPAN module?    Either way, can we
> > structure this as an API instead of just an include file?
> 
> (1) I suspect that anyone who runs a URIBL service would be thrilled with
> this information being available for this particular use, because it
> minimizes unnecessary queries to their services.  Consider the old method
> that sent 2 or 3 queries instead of one because it didn't know which one
> was right.  The fact that SURBL and URIBL both publish the two and three
> level lists and encourage their use is evidence of this :)
> 
> (2) First-level TLDs are public information, of course, and with the
> two-level and three-level lists being public, the interesting business is
> that the first-level list and the SURBL second and third level lists are
> pruned based on URIBL's data.  So it comes down to whether anyone can be
> considered to have a claim on the privilege of knowing how to *exclude*
> results.  I guess maybe they could if it came down to it...
> 
> (3) I'm pretty sure I could get URIBL's explicit permission, but it would
> take a long time.  The founder and sole owner of URIBL was also the
> original author of DoubleCheck, so I feel we would have a bit of an in :) 
> However even so he always takes forever to get back to us in
> correspondence.  For instance we're still waiting around to hear from him
> since I noticed, while working on the script that generated tld_lists.pl,
> that our datafeed is no longer updating.  Maybe he 'fixed the glitch'...
> :P
> 
> > It would be cool if it was configurable (i.e. use MIME::Parser or not.)
> 
> That does sound handy, maybe I'll work on that soon.  I think if one does
> go back to the 'old' method, it would be good to skip any lines that
> contain no whitespace, or some similar check that passes the
> 3-million-line test, so that we no longer pointlessly scan reams of
> base64-encoded data etc. for URI's

Thanks Jared.  This is an exciting change.

-R

Re: Rewritten URIBL plugin

Reply via email to