>> - Introduces support for URIBL services that may not have worked right, >> at >> least out of the box, before. Defines the subtle differences between >> various known URIBL services in order to maximize compatibility. > > Is it worth pulling some of this config out of the code and putting it > into some sort of config file?
Probably would be worth it, and not that difficult to do. Any new config scheme should probably be both backward compatible and non-spaghetti. The current scheme looks like: multi.uribl.com 2,4 deny the new plugin does add one additional feature, you can now do: multi.uribl.com 2 deny multi.uribl.com 4 log but $zones defines a bunch of other things - labels for the zone in general and specific masks, check_ip, check_host, host_wildcards_supported, lookup_a_record, lookup_ns_record. Ignoring the label issue, all the other directives are related only to the zone; perhaps we could add a new scheme on top of the mask definition scheme (which we would continue to read in the old way), getting even further away from the one-line-per-zone style: multi.uribl.com action deny multi.uribl.com check_ip 1 multi.uribl.com check_host 1 multi.uribl.com lookup_a_record 0 multi.uribl.com lookup_ns_record 0 multi.uribl.com host_wildcards_support 0 If nobody hates this scheme it does seem like something I could add pretty easily send an updated copy >> - Uses Net::DNS::Async to simplify the code, and also to ensure the >> afore-mentioned A and NS lookups will prompt new URIBL lookups in an >> efficient and simple manner via callbacks > > Does the code still work with the async qpsmtpd cores? Well... I didn't change the existing plugin a whole lot; I did change from data_handler() registered via register(), to plain old hook_data_post(); this may need changed considering that the comments above each sub (which I also removed) suggested that hook_data_post() and register() were 'not used' for async. These things could easily be changed if needed. However we don't run async anywhere, not even in test environments, so I'm not very qualified to know if they need changed, or give any guarantees. And I'm confused by that bit. From what I can tell, the async/uribl plugin ignores plugins/uribl entirely and uses Qpsmtpd::Plugins::Async::DNSBLBase, which in turn uses ParaDNS. Now I noticed the async rhsbl plugin used Q::P::A::D and that this used ParaDNS before and looked at it briefly to see if I could use it for my purposes, but at the top of the docs it's mentioned that only A lookups are supported, so I skipped it and went to Net::DNS::Async. I don't really know the strongpoints of N::D::A vs. ParaDNS but it did note that the latter more expressly claims to be interested in performance and scalability. So in conclusion: (1) I think I'll look into using ParaDNS and maybe even Q::P::Async::DNSBLBase if appropriate, even though my plugin is not in async; it might still have advantages, and the interesting logic is in the callbacks and not the module use anyway (2) I don't know if it matters whether this particular plugin works with async, I guess it depends on whether the old one really ever did. At least Net::DNS::Async should not make it any more blocking than the old plugin's collect_results(). At any rate somebody who plays in async territory should probably comment before any action is taken on the new plugin :) >> Attached also is tld_lists.pl, a companion file that needs to be dropped >> in lib/Qpsmtpd/ which provides the list of first, second, and third >> level >> TLDs that we care about. It's derived from our URIBL datafeed as well >> as > > Do the owners of that data care about it being used this way? You may > be violating any agreement with them. Would they be ok if this was > released as an independent CPAN module? Either way, can we > structure this as an API instead of just an include file? (1) I suspect that anyone who runs a URIBL service would be thrilled with this information being available for this particular use, because it minimizes unnecessary queries to their services. Consider the old method that sent 2 or 3 queries instead of one because it didn't know which one was right. The fact that SURBL and URIBL both publish the two and three level lists and encourage their use is evidence of this :) (2) First-level TLDs are public information, of course, and with the two-level and three-level lists being public, the interesting business is that the first-level list and the SURBL second and third level lists are pruned based on URIBL's data. So it comes down to whether anyone can be considered to have a claim on the privilege of knowing how to *exclude* results. I guess maybe they could if it came down to it... (3) I'm pretty sure I could get URIBL's explicit permission, but it would take a long time. The founder and sole owner of URIBL was also the original author of DoubleCheck, so I feel we would have a bit of an in :) However even so he always takes forever to get back to us in correspondence. For instance we're still waiting around to hear from him since I noticed, while working on the script that generated tld_lists.pl, that our datafeed is no longer updating. Maybe he 'fixed the glitch'... :P > It would be cool if it was configurable (i.e. use MIME::Parser or not.) That does sound handy, maybe I'll work on that soon. I think if one does go back to the 'old' method, it would be good to skip any lines that contain no whitespace, or some similar check that passes the 3-million-line test, so that we no longer pointlessly scan reams of base64-encoded data etc. for URI's -Jared