Bill Cole <sausers-20150...@billmail.scconsult.com> writes:

> It would generally be a bad idea to increase the Postfix timeout, as
> that passes the problem back upstream as senders will generally time
> out at 300s as well.
>
> So, add '--timeout-child=295' to your spamd arguments if you want to
> make spamd timeout faster than Postfix reliably.

Thanks; I didn't think of the previous timeout.  Before getting your
mail, I did set my postfix milter timeout to 330s, but the actual delay
was ~301s since the spamd timeout worked.  That resulted in delivery and
the remote system (also postfix) not giving up.   I have since changed
to --timeout-child=290 in spamd and restored postfix to default.

>>   need to figure out why there is a timeout
>
> That's the important part.

I am narrowing the circumstances and will follow up when I figure it out.

>> The first is surely manual reading, but I wonder why it isn't default.
>
> We don't try very hard to guess what users will want in the
> integration details between SA and the tools like MTAs that use
> it. 300s is the SMTP default timeout at end-of-data, which presumably
> is why it is spamd's default. I think it makes sense to reduce that
> for most circumstances, but I'm a bit hesitant to do so in the
> distribution because there could be people relying on the specific
> idiosyncratic behavior of spamd timing out after its caller has given
> up rather than before.

It strikes me that timeouts happening is basically a symptom of bugs and
each layer should be set up to avoid being non-responsive to the calling
layer.  While I see your point about not tuning for what people might
want, it seems that if a system is to meet the 300s SMTP data timeout,
spamd needs to take less than 300s, so going for 290 or 295 seems
sensible.  I would guess, without any real basis, that far more people
are just sitting on latent trouble than really intend to have a milter
callout give up about a second just before spamd.

> The most common reason for SA to hit its internal timeout is the
> combination of a rule with a pattern that can generate a large number
> of backtracks while scanning (exponential or factorial order) and a
> message which causes such backtracking. Typically that's caused by a
> '*' or '+' in a pattern where a fixed range for the number of repeats
> should be used instead. A few years ago we tried to fix all cases of
> dangerous rules in the default ruleset, and I think we succeeded. I
> believe the KAM rules have also been audited for likely problems. If
> you have any unbounded wildcards in your local rules, tightening those
> rules up should be your first step. If you can't find and fix the
> problematic rule by eye, you can get clues about it by scanning a
> problematic message with the "-D all" option to get a detailed rundown
> of what SA does in scanning a message. That will show you what rules
> are checked successfully. You can find a problematic rule by comparing
> that debug output from a bad message to that of a message which
> doesn't hang SA.

Thanks, that regexp hint is a huge clue to me.

Attachment: signature.asc
Description: PGP signature

Reply via email to