On Fri, 2014-10-24 at 17:27 -0700, John Hardin wrote:
> On Sat, 25 Oct 2014, Martin Gregorie wrote:
> 
> > ..... Does \b match end of string? That
> > never occurred to me. I've always used $ to do that and it certainly
> > works as part of a URI rule.
> 
> No, \b matches the transition from a word-character (\w, [0-9a-z_]) to a 
> non-word character (anything else, plus beginning and end of line).
> 
> Think "\b = Boundary".
> 
OK, thanks. I knew it matches word boundaries inside a string but hadn't
got it through my head that it also matches the start/end of words at
the end of strings. 

Less obviously, it doesn't seem to matter whether you write the rule
as /\.link\b/  or /\.link$/ - both give identical matches. Both match
the following regexes just as you'd expect:
   http://www.linkedin.com/home/user/data.link
   http://www.example.link

but, less obviously, both also match this:
   http://www.example.link/path/to/file.txt

...but
   "grep -P '\.link\b'" matches it, but 
   "grep -P '\.link$'"  does not.

I presume that this means that the uri rule tests against two strings:
one being just the domain name and the other being the whole URI and
declares a rule hit if either string matches.

Meanwhile, I've revised my subrule again and now it reads:

   uri      __MG_LTD1   /\.link\b/i

So, thanks to all who pointed to this improvement.


Martin





Reply via email to