* Jon August <jonaug...@gmail.com>:
> I've been running Postfix/MySQL/Courier for months with no problems.
>  Suddenly in the last day or so, mail has been taking around 3 hours to
> process.  I don't have a clue where to start looking.  When I do a qshape, I
> see this:

Taking a look at the output of "qshape" is indeed a good way to start
a bootleneck analysis. An excellent readme can be found at:

http://www.postfix.org/QSHAPE_README.html

> 
>                                     T  5 10  20  40  80 160 320 640 1280 1280+
>                            TOTAL 2094 47 53 180 160 300 585 769   0    0    0
>                                a  422 13  5  36  35  54 119 160   0    0    0
>                                b  199  5  6  18  20  29  58  63   0    0    0
>                                c  196  4  2  14  12  31  65  68   0    0    0
>                                d  125  1  3  11  16  15  38  41   0    0    0
>                                e  125  7  3   2   5  20  39  49   0    0    0
>                                f   87  2  6   6   7  12  26  28   0    0    0
>                                g   74  2  2   4   6   7  24  29   0    0    0
>                                h   58  0  1   2   7   9  20  19   0    0    0
>                                i   51  0  0   4   4   8  13  22   0    0    0
>                                j   47  0  1   1   0   6  12  27   0    0    0
>                                k   34  0  3   1   2   9  10   9   0    0    0
>                                l   32  0  2   5   0   4  12   9   0    0    0
>                                m   29  1  0   2   2   9   3  12   0    0    0
>                                n   29  0  0   4   0   6   7  12   0    0    0
>                                o   28  0  1   3   5   6   6   7   0    0    0
>                                p   26  1  1   4   2   1   8   9   0    0    0
>                                q   24  1  1   3   3   3   5   8   0    0    0
>                                r   22  1  1   1   1   4   6   8   0    0    0
>                                s   21  0  0   4   0   2   7   8   0    0    0

Since we don't know how that output changes over time, we can only
assume that "something" changed between 160 and 320 minutes ago. As
mentioned in the documentation, the above is a union of the "active"
and "deferred" queues. If you want to look at the contents of specific
queues, issue commands like:

qshape deferred
qshape incoming
qshape active

Adding a "-s" before the queue name will show you output sorted by the
occurence of alleged sender names.

> But, I don't have a baseline.  I don't know what it should look like.  But,
> it seems backlogged.  Not sure how to fix that.  Any suggestions are greatly
> appreciated.

Start by identifying the log entries for mails to or from the
backlogged destinations. Quoting from QSHAPE_README (you need to ^:set
nopaste replace example.com with the destinations/sender domains from
the qshape output. Furthermore, you mail system might log to a
different file and not "/var/log/maillog", and there's always the
possibility that the log file got rotated in between):

,----[ QSHAPE_README ]
| # Find deliveries to example.com
| #
| $ tail -10000 /var/log/maillog |
|         egrep -i ': to=<....@example\.com>,' |
|         less
|
| # Find messages from example.com
| #
| $ tail -10000 /var/log/maillog |
|         egrep -i ': from=<....@example\.com>,' |
|         less
|
| # Find all messages for a specific queue id.
| #
| $ tail -10000 /var/log/maillog | egrep ': 2B2173FF68: '
|
| # helpful messages from qmgr
| $ egrep 'qmgr.*(panic|fatal|error|warning):' /var/log/maillog
`----

This additional information should help you quickly identifiyng the
problem. If you need help interpreting the output, you might want to
followup on your original post - some very experienced postmasters are
reading this list, and given enough debugging information, they might
be able to help you in diagnosing your problem.


Stefan

Reply via email to