On Fri, 2014-10-24 at 17:27 -0700, John Hardin wrote: > On Sat, 25 Oct 2014, Martin Gregorie wrote: > > > ..... Does \b match end of string? That > > never occurred to me. I've always used $ to do that and it certainly > > works as part of a URI rule. > > No, \b matches the transition from a word-character (\w, [0-9a-z_]) to a > non-word character (anything else, plus beginning and end of line). > > Think "\b = Boundary". > OK, thanks. I knew it matches word boundaries inside a string but hadn't got it through my head that it also matches the start/end of words at the end of strings.
Less obviously, it doesn't seem to matter whether you write the rule as /\.link\b/ or /\.link$/ - both give identical matches. Both match the following regexes just as you'd expect: http://www.linkedin.com/home/user/data.link http://www.example.link but, less obviously, both also match this: http://www.example.link/path/to/file.txt ...but "grep -P '\.link\b'" matches it, but "grep -P '\.link$'" does not. I presume that this means that the uri rule tests against two strings: one being just the domain name and the other being the whole URI and declares a rule hit if either string matches. Meanwhile, I've revised my subrule again and now it reads: uri __MG_LTD1 /\.link\b/i So, thanks to all who pointed to this improvement. Martin