Victor Duchovni <[EMAIL PROTECTED]> wrote:
> You can skip waiting for future occurences, the behaviour you describe
> (especially on fallback relays where dead destinations are to be expected)
> fits the known issue like a glove (and we are not at the OJ trial :-).

Regardless, I definitely sometimes get qmgr dying due to a watchdog
timeout when it's deferring many thousands of messages to the same
destination, without the deadlock.

As a temporary workaround, I tried doubling daemon_timeout.

However, I'm puzzled - it defaults to 18000s but the watchdog timer
seems to kill qmgr during these incidents after about a half hour,
which is 1800 seconds.  Is the value of daemon_timeout actually
representing tenths of seconds?  Or is daemon_timeout not really the
timer that controls how long the watchdog gives qmgr in these cases?

> You may also consider tuning the feedback controls on the fallback relay,
> so that problematic destinations are throttled less aggressively, this
> is appropriate when most of the deliveries fail, but the site  is not
> dead and more than 0%, but less than 50%, of the deliveries succeed.

Thank you.  And yes, it's definitely the case with the domains that
are involved, that some deliveries succeed, but fewer than 50% (at the
times when this problem shows up).

It is not possible to tune the feedback controls on a version earlier
than 2.5, correct?
  -- Cos

Reply via email to