Mark Martinec wrote:
> Theo Van Dinter writes:
>   
>> body rules aren't run on lines, they're run on paragraphs,
>> so that text is in the middle of a string.
>>     
>
> Matt Kettler writes:
>   
>> Use rawbody for this. Body rules have CR/LF stripped out.
>>     
>
> Giving whole paragraphs to regexp is fine, but why are newlines
> stripped out in 'body' rules? 
In order to normalize whitespace. This way rules don't have to care
about whitespace, they can just be written normally.

Otherwise
/Hello I'm a spammer/i

Would fail to match:
Hello I'm
a spammer.

SA also reduces excess spaces in normal body rules, that way spammers
can't obfuscate text by simply inserting piles of spaces.

It would be really a pain to have to rewrite the above rule as:

/Hello\s*I'm\s*a\s*spammer/m

And also much slower if you have to do that for a few hundred rules.


>  Perl regexp modifiers m (and s)
> would be handy:
>
> body L_TEST  /^[A-Z]\s[A-Z]\s[A-Z]\s[A-Z]$/m
>
> but as it stands now the m modifier is of no use in 'body' rules
> (unlike in 'rawbody').
True. If you care about whitespace formatting and EOLs, use rawbody.

If you want to match text in a straightforward way, use body and let
SA's pre-processing of the text deal with simplifying whitespace.



Reply via email to