On Wed, Mar 22, 2017 at 11:17:33AM -0700, Joe Perches wrote: > On Wed, 2017-03-22 at 08:25 -0700, Darren Hart wrote: > > On Tue, Mar 21, 2017 at 11:31:08AM -0700, Joe Perches wrote: > > > On Tue, 2017-03-21 at 09:30 -0700, John 'Warthog9' Hawley (VMware) wrote: > > > > Spamassassin sticks a long (~79 character) long string after a > > > > line that has a single space in it. The line with space causes > > > > checkpatch to erroniously think that it's in the content body, as > > > > opposed to headers and thus flag a mail header as an unwrapped long > > > > comment line. > > > > > > If the spammassassin header is like > > > > > > email-header-n: foo > > > email-header-m: bar > > > > > > X-Spam-Report: bar > > > > The specific content of the X-Spam-Report that triggers this for me, > > from this patch for example, is: > > > > === 8< === > > X-Spam-Report: SpamAssassin version 3.4.1 on casper.infradead.org summary: > > Content analysis details: (-1.9 points, 5.0 required) > > > > pts rule name description > > ---- ---------------------- > > -------------------------------------------------- > > -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% > > [score: 0.0000] > > X-TUID: alGBIuPZmqOj > > > > === >8 === > > > > The long ---- ----... line is over 75 characters and triggers the test > > for long commit_log lines. > > > > > > > > Does that form follow rfc 5322? > > > > By my reading, this is governed by the long header fields defined by > > 2.2.3, with whitespace folding defined as "a CRLF may be inserted before > > any WSP." > > > > > > > > If it does then any email header could have that > > > form and the header wrapping test should be > > > > Yes, agreed. > > > > So the logic we want is: > > > > If we are in headers and we detect a CRLF and the next line starts with a > > WSP, > > then we are still in headers (and therefor not in the commit log). The CRLF > > information does not appear to be available as it is replaced with just \n. > > > > > updated from > > > > > > if ($in_header_lines && $realfile =~ /^$/ && > > > !($rawline =~ /^\s+\S/ || > > > $rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)) { > > > $in_header_lines = 0; > > > $in_commit_log = 1; > > > $has_commit_log = 1; > > > } > > > > > > to something like > > > > > > if ($in_header_lines && $realfile =~ /^$/ && > > > !($rawline =~ /^ (?:\s*\S|$)/ || > > > > Hrm... lines that start with maybe a space followed by a : ... Why did you > > introduce that part of the check? > > The regex doesn't care about colons. > It's a perl non-capturing group. > https://perldoc.perl.org/perlretut.html#Non-capturing-groupings
Aha, thank you for the pointer. > > > Looking at this more closely, I was also not clear why the original test > > looked > > for several spaces followed by non-space. What case is this for? > > Not several spaces, one or more spaces then a non-space. > The only change here is allowing an initial space followed > by either: > > 1: optional spaces, then non-space. > 2: EOL > > I supposed you could argue that case 2 should > also allow optional spaces before EOL and the > test should be > > if ($in_header_lines && $realfile =~ /^$/ && > !($rawline =~ /^\s+(?:\S|$)/ || I still haven't figured out why we test for this specific set of patterns. Why is a line that starts with a space and ends with a newline considered still in_header_lines. Or more specifically, why aren't we just testing for an empty line (RFC 5322 Section 2.1, defining the separation of headers and the body). -- Darren Hart VMware Open Source Technology Center