Michael Hutchinson wrote:
> > body NICE_GIRL_01       /Hello! I am (?:bored|tired) (?:today|this
> > (?:afternoon|evening)|tonight)\./
> 
> Forgive my ignorance, but what does the question mark and colon do at
> the start of the brackets? I have (bored|tired) in my own rules, so how
> does (?:bored|tired) affect the outcome?

Using (?: avoids creating backreferences.  It should be slightly
faster if the backreference is not used.

  (?:bored|tired)

Is the same as:

  (bored|tired)

But without creating \1 or $1 reference to it.

SpamAssassin is written in Perl and uses PCRE (Perl Compatible Regular
Expressions).  Those are not quite the same as standard Extended
Regular Expressions.  For a full description see the 'perlre' man page.

  man perlre

       "(?:pattern)"
       "(?imsx-imsx:pattern)"
                 This is for clustering, not capturing; it groups
                 subexpressions like "()", but doesn’t make
                 backreferences as "()" does.  So

                     @fields = split(/\b(?:a|b|c)\b/)

                 is like

                     @fields = split(/\b(a|b|c)\b/)

                 but doesn’t spit out extra fields.  It’s also cheaper
                 not to capture characters if you don’t need to.

                 Any letters between "?" and ":" act as flags
                 modifiers as with "(?imsx-imsx)".  For example,

                     /(?s-i:more.*than).*million/i

                 is equivalent to the more verbose

                     /(?:(?s-i)more.*than).*million/i

HTH,
Bob

Reply via email to