On Wed, May 20, 2009 at 02:44:57PM +1000, MacShane, Tracy wrote:

> Just to be sure I'm not barking up the wrong tree, would I expect to see
> a log entry for the EOM in the verbose log from the sending server if it
> existed? Here're some snipped logs:
> 
> May 20 10:27:25 smtp3 postfix/smtpd[17136]: >
> dfw-mailout1.example.com[199.xxx.xxx.xx]: 421 4.4.2
> smtp3.ourdomain.example.net Error: timeout exceeded

What I see for the same host is just overly-aggressive connection caching,
they hold idle connections open for 120s after ".":

    2009-05-19T02:10:01-0400 amnesiac postfix/smtpd[23273]: timeout after
        END-OF-MESSAGE from dfw-mailout1.example.com[199.xx.xxx.198]
    2009-05-19T11:32:40-0400 hqmtaext01 postfix/smtpd[3035]: timeout after
        END-OF-MESSAGE from dfw-mailout1.example.com[199.xx.xxx.198]
    2009-05-19T15:51:23-0400 hqmtaext01 postfix/smtpd[10892]: timeout after
        END-OF-MESSAGE from dfw-mailout1.example.com[199.xx.xxx.198]

Otherwise, these and all other deliveries complete normally:

    2009-05-19T02:08:30-0400 amnesiac postfix/smtpd[23273]: connect from
        dfw-mailout1.example.com[199.xx.xxx.198]
    2009-05-19T02:08:31-0400 amnesiac postfix/smtpd[23273]:
        Anonymous TLS connection established from
        dfw-mailout1.example.com[199.xx.xxx.198]:
        TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)
    2009-05-19T02:08:31-0400 amnesiac postfix/smtpd[23273]: 5FC8F168D24C:
        client=dfw-mailout1.example.com[199.xx.xxx.198]
    2009-05-19T02:08:31-0400 amnesiac postfix/cleanup[23221]: 5FC8F168D24C:
        message-id=<uni...@dom.ain>
    2009-05-19T02:08:31-0400 amnesiac postfix/qmgr[11294]: 5FC8F168D24C:
        from=<sen...@example.com>, size=3701, nrcpt=1 (queue active)
    2009-05-19T02:08:31-0400 amnesiac postfix/smtp[24647]: 5FC8F168D24C:
        to=<mail...@mailhost.example.net>,
        orig_to=<first.l...@example.org>,
        relay=127.0.0.1[127.0.0.1]:27, delay=0.35, delays=0.16/0/0.01/0.18,
        dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 9CBBB168D246)
    2009-05-19T02:08:31-0400 amnesiac postfix/qmgr[11294]: 5FC8F168D24C:
        removed
    2009-05-19T02:10:01-0400 amnesiac postfix/smtpd[23273]: timeout after
        END-OF-MESSAGE from dfw-mailout1.example.com[199.xx.xxx.198]
    2009-05-19T02:10:01-0400 amnesiac postfix/smtpd[23273]: disconnect from
        dfw-mailout1.example.com[199.xx.xxx.198]

> It seems pretty clear to me that we didn't receive an EOM (especially
> since the timeout-exceeded caused the disconnection), but since I'm
> going to be telling them it's a problem at their end, I'd like to be
> sure I'm not telling them a pile of rubbish. 

The evidence that the problem is on their end is not yet in hand. Either
side could have path MTU issues, window-scaling issues, ... with some
insufficiently robust or slightly misconfigured firewall.

> I'm also going to try some tcpdump logging to see what I can find - any
> recommendations for what I should be looking for?

Retransmission, unusual TCP options, what fraction of the message data
was sent if any, ... Capture full binary packets with "-s 0 -w /some/file".
If the contents are sensitive, you'll have to do some analysis solo, and
ask for similar packet captures on their side for comparison.

The largest message I found from a quick log search is:

    2009-05-18T05:58:14-0400 amnesiac postfix/qmgr[11294]: F368D18C8031:
        from=<sen...@example.com>, size=66536, nrcpt=1 (queue active)

this is not quite big enough to rule out systemic issues with large
messages on the path from them to us, but suggests that message size is
most likely not an issue on this path. The sending MTA appears to be
Sendmail:

    Received: from dfw-mailout1.example.com (dfw-mailout1.example.com 
[199.xx.xxx.198])
        (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by amnesiac.example.net (Postfix) with ESMTPS id F368D18C8031
        for <recipi...@example.org>; Mon, 18 May 2009 05:58:13 -0400 (EDT)
    Received: from dmoutc00.directory.example.com 
(dmoutc00.directory.example.com [147.xx.xxx.116])
        by dfw-mailout1.example.com (Switch-3.3.3mp/Switch-3.3.2mp) with ESMTP 
id n4I9wA2Z012902
        for <recipi...@example.org>; Mon, 18 May 2009 09:58:10 GMT

It is unlikely that Sendmail is not sending correctly, so a firewall
issue, specific to the firewalls between you and them and the negotiated
TCP parameters seems most likely.


-- 
        Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
<mailto:majord...@postfix.org?body=unsubscribe%20postfix-users>

If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.

Reply via email to