mouss put forth on 3/7/2011 5:45 PM: > Le 07/03/2011 15:13, Stan Hoeppner a écrit :
>> Ok, so if I'm doing what I've heard called a "fully qualified regular >> expression", WRT FQrDNS matching, should I use the anchors or not? >> postmap -q says these all work (the actuals with action and text that is). >> /^(\d{1,3}-){3}\d{1,3}\.dynamic\.chello\.sk$/ > .dynamic.chello.sk REJECT blah blah > >> /^(\d{1,3}\.){4}dsl\.dyn\.forthnet\.gr$/ > .dyn.forthnet.gr REJECT blah blah > >> /^(\d{1,3}-){4}adsl-dyn\.4u\.com\.gh$/ > /dyn\.4u.com\.gh$/ REJECT blah > assuming you get real mail from there. otherwise > .4u.com.gh REJECT blah Yes, these can all be done with a hash/cdb. But these are being added to my fqrdns.pcre file. As the name implies the goal is to exactly match fully qualified reverse DNS strings, at least, that's part of the goal. The other part is the exact opposite: _not_ matching them. I'll explain that a little later. >> /^[\d\w]{8}\.[\w]{2}-[\d]-[\d\w]{2}\.dynamic\.ziggo\.nl$/ > ahem? I fail to see what yoy're trying to match here. \d is a \w, so > [\d\w] is the same as \w. do you mean \W (capital letter)? anyway: I tried \d alone in those places and postmap -q wouldn't match it. I scoured my regex cheat sheet and it said \d is for digits, and \w is for alphas. I added \d\w and it worked. I was trying to match this oddball FQrDNS: 541ABE2E.cm-5-3c.dynamic.ziggo.nl > well, that's what regular expressions are about by default: > /foo/ means contains foo > /^foo/ means starts with foo > /foo$/ means ends with foo Got it. You (or Noel) already explained this, and it really helps understanding. > so > /^bart.*homer.*marge$/ means: starts with "bart", ends with "marge" and > somewhere between these contains "homer". Also good to understand. Ok, to explain the "not matching" goal. The PCRE file is almost 1700 expressions, and growing. In a couple of years it could be double that size. Over a longer period of time it could hit 5000 expressions. For users of this file, it is usually the first table checked against a connecting smtp client. That client rDNS will match 1 of 1700 expressions, or none. Thus, we want the fastest processing of the "does not match case, as this is the common case. A match is "rare" from a mathematical and cycles consumed standpoint. Modern processors are extremely fast. But if our expressions aren't speed optimized for the "does not match" case, we're slowing our system down. For most systems this is irrelevant. But for an extremely high volume MX gateway system, receiving say, 3000 connects/second, consisting of 2700 spam bots and showshoe servers, with 300 legit mails to be relayed to downstream mailbox servers, a few extra milliseconds of table processing time per connection adds up quickly. Assuming this host is running the full gamut of anti spam checks, policy daemons, content filters, etc, we need to keep each as lean as possible. If this example MX gateway sees spikes of 5000 connections/second due to a large botnet targeting multiple users, any extra delay this PCRE table imposes may contribute to bogging the system down, and cause unwanted delays. So, the question is, which form of expression processes the "does not match" case faster? The fully qualified expression, or the simple expression? Noel mentioned that the fully qualified expressions will tend to process faster. Is this true? Is it true for both the "matches" and "does not match" case? Thanks again for continuing my regex education guys. :) This knew knowledge and understanding is already paying dividends, mostly in time savings and I'm knocking expressions out more easily without having to reference help docs. :) -- Stan