On Wed, 23 Nov 2016, Rich Wales wrote:

/The RE at that line looks pretty firmly anchored... Can you gzip up a
sample that fails for you and send it to me?/

Sure.  See the attachment.

OK, I can repro on trunk:

Nov 23 19:17:00.141 [18349] dbg: message: HTML::Parser utf8_mode on (assumed 
UTF-8 octets)
Nov 23 19:17:00.187 [18349] warn: Complex regular subexpression recursion limit 
(32766) exceeded at lib/Mail/SpamAssassin/HTML.pm line 745.
Nov 23 19:17:00.193 [18349] dbg: message: spaces (octets) in HTML: 952 out of 
3954

It's that very long block of QP blanks right at the end. If you edit out all those =20s after the </td> it stops emitting that warning.

That would be a workaround for you to make sa-learn shut up about your corpus until the problem is fixed. Blanks don't affect Bayes (at least, not until we implement multi-word tokens) so it shouldn't affect what gets learned.

Please open a bug and attach that spample as a repro test case. I'm not too familiar with that bit of the code so I don't have a fast fix.


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
 338 days since the first successful real return to launch site (SpaceX)

Reply via email to