Jared Johnson wrote: > > >> - Introduces support for URIBL services that may not have worked right, > >> at > >> least out of the box, before. Defines the subtle differences between > >> various known URIBL services in order to maximize compatibility. > > > > Is it worth pulling some of this config out of the code and putting it > > into some sort of config file? > > Probably would be worth it, and not that difficult to do. Any new config > scheme should probably be both backward compatible and non-spaghetti. The > current scheme looks like: > > multi.uribl.com 2,4 deny > > the new plugin does add one additional feature, you can now do: > > multi.uribl.com 2 deny > multi.uribl.com 4 log > > but $zones defines a bunch of other things - labels for the zone in > general and specific masks, check_ip, check_host, > host_wildcards_supported, lookup_a_record, lookup_ns_record. Ignoring the > label issue, all the other directives are related only to the zone; > perhaps we could add a new scheme on top of the mask definition scheme > (which we would continue to read in the old way), getting even further > away from the one-line-per-zone style: > > multi.uribl.com action deny > multi.uribl.com check_ip 1 > multi.uribl.com check_host 1 > multi.uribl.com lookup_a_record 0 > multi.uribl.com lookup_ns_record 0 > multi.uribl.com host_wildcards_support 0 > > If nobody hates this scheme it does seem like something I could add pretty > easily send an updated copy
I don't hate it, FWIW. :) Sticking with one line per zone is NOT a requirement. (Heck, I'd even be willing to go so far as saying we could go to a even more complicated config scheme. One of qpsmtpd's unwritten goals is to NOT pull in half of CPAN to run, but we could expand a little.) > > >> - Uses Net::DNS::Async to simplify the code, and also to ensure the > >> afore-mentioned A and NS lookups will prompt new URIBL lookups in an > >> efficient and simple manner via callbacks > > > > Does the code still work with the async qpsmtpd cores? > > Well... I didn't change the existing plugin a whole lot; I did change from > data_handler() registered via register(), to plain old hook_data_post(); > this may need changed considering that the comments above each sub (which > I also removed) suggested that hook_data_post() and register() were 'not > used' for async. These things could easily be changed if needed. However > we don't run async anywhere, not even in test environments, so I'm not > very qualified to know if they need changed, or give any guarantees. And > I'm confused by that bit. From what I can tell, the async/uribl plugin > ignores plugins/uribl entirely and uses > Qpsmtpd::Plugins::Async::DNSBLBase, which in turn uses ParaDNS. In that case, it's possible (and likely) that it may never have worked with async. When time is short, it's easy to just ask questions :) > > Now I noticed the async rhsbl plugin used Q::P::A::D and that this used > ParaDNS before and looked at it briefly to see if I could use it for my > purposes, but at the top of the docs it's mentioned that only A lookups > are supported, so I skipped it and went to Net::DNS::Async. I don't > really know the strongpoints of N::D::A vs. ParaDNS but it did note that > the latter more expressly claims to be interested in performance and > scalability. So in conclusion: > > (1) I think I'll look into using ParaDNS and maybe even > Q::P::Async::DNSBLBase if appropriate, even though my plugin is not in > async; it might still have advantages, and the interesting logic is in the > callbacks and not the module use anyway > (2) I don't know if it matters whether this particular plugin works with > async, I guess it depends on whether the old one really ever did. At > least Net::DNS::Async should not make it any more blocking than the old > plugin's collect_results(). At any rate somebody who plays in async > territory should probably comment before any action is taken on the new > plugin :) > > >> Attached also is tld_lists.pl, a companion file that needs to be dropped > >> in lib/Qpsmtpd/ which provides the list of first, second, and third > >> level > >> TLDs that we care about. It's derived from our URIBL datafeed as well > >> as > > > > Do the owners of that data care about it being used this way? You may > > be violating any agreement with them. Would they be ok if this was > > released as an independent CPAN module? Either way, can we > > structure this as an API instead of just an include file? > > (1) I suspect that anyone who runs a URIBL service would be thrilled with > this information being available for this particular use, because it > minimizes unnecessary queries to their services. Consider the old method > that sent 2 or 3 queries instead of one because it didn't know which one > was right. The fact that SURBL and URIBL both publish the two and three > level lists and encourage their use is evidence of this :) > > (2) First-level TLDs are public information, of course, and with the > two-level and three-level lists being public, the interesting business is > that the first-level list and the SURBL second and third level lists are > pruned based on URIBL's data. So it comes down to whether anyone can be > considered to have a claim on the privilege of knowing how to *exclude* > results. I guess maybe they could if it came down to it... > > (3) I'm pretty sure I could get URIBL's explicit permission, but it would > take a long time. The founder and sole owner of URIBL was also the > original author of DoubleCheck, so I feel we would have a bit of an in :) > However even so he always takes forever to get back to us in > correspondence. For instance we're still waiting around to hear from him > since I noticed, while working on the script that generated tld_lists.pl, > that our datafeed is no longer updating. Maybe he 'fixed the glitch'... > :P > > > It would be cool if it was configurable (i.e. use MIME::Parser or not.) > > That does sound handy, maybe I'll work on that soon. I think if one does > go back to the 'old' method, it would be good to skip any lines that > contain no whitespace, or some similar check that passes the > 3-million-line test, so that we no longer pointlessly scan reams of > base64-encoded data etc. for URI's Thanks Jared. This is an exciting change. -R