In get_decoded_stripped_body_text_array(), there is:

  # join all consecutive whitespace into a single space
  $text =~ s/\s+/ /sg;

  # reinsert para breaks
  $text =~ s/<p>/\n\n/gis;

The first regexp, in addition to compacting down normal whitespace, also 
turns all newlines in spaces; I guess this is why double newlines (paragraph 
breaks) where encoded as "<p>".  However, this makes all paragraphs into one 
long line without any line breaks.  Hmmm...  I guess this was intended, so 
that when the regexps from the rules files are done, a newline won't mess the 
regexp up.  But this means that any eval test which is expecting the old 
newlines within a paragraph to be preserved can get messed up.  For instance, 
check_for_yelling() will miss a line of yelling if it's in the same paragraph 
as a non-yelling line, because the newlines get taken out of paragraphs.

Wouldn't it better for Conf.pm (or something) to add the "s" modifier to 
regexps, so that the paragraph will get treated like a single line?  Hmm...  
Then regexps could match across paragraphs, rather than just within a single 
paragraph.

-- 
Visit http://dmoz.org, the world's   | Give a man a match, and he'll be warm
largest human edited web directory.  | for a minute, but set him on fire, and
                                     | he'll be warm for the rest of his life.
[EMAIL PROTECTED]  ICQ: 132152059 |

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to