Hello!

Thank you for the exhausting explanation of the problem.

Here you got the requested information about the system:
root@outgoingmail-2:~# uname -a
Linux outgoingmail-2 3.16.0-29-generic #39-Ubuntu SMP Mon Dec 15 22:27:29 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
root@outgoingmail-2:~# postconf mail_version
mail_version = 2.11.1

I've installed postfix from the Ubuntu utopic package.

More insformation about the system is that it is a linux container. But I don't think that has any effect to the software.

It's no problem for me to let you get access to the system to debug if you would prefer that. If you are familiar with linux-containers I could also make a clone and you could get it. I could also get the source code from the package, compile and add the debugging code that you mentioned in the mail.

Mail me what method that you want to use.

/Regards Mats

Quoting Wietse Venema <wie...@porcupine.org>:

Thanks for the "smtp -v/relay -v" logging.  Your logging confirms
that there is a bogus error talking to your bounce daemon.

Although Postfix detects the bogus error, unfortunately it produces
no informative logging for this particular error.

Questions:

- What is the output from "uname -a"? Postfix has bug workarounds
for MacOSX and Solaris, but maybe they improved their software and
we need to add a new workaround.

- What is the output from "postconf mail_version"? maybe you are
running a Postfix version that does not have all the known OS bug
workarounds.

- Did you build Postfix from source code, or is this a binary
distribution? If you built from source I will ask you to modify the
source for better logging. If you installed a binary distribution,
I can set up a VM with the same OS and Postfix distribution and do
the experiments myself.

The remainder of this email is annotated logging that shows problems
talking to the bounce daemon, in both the Postfix SMTP client and
in the queue manager. The logging also illustrates that is a bogus
error because the bounce service requests actually succeed.

First, the Postfix SMTP client receives a 550 reply to an RCPT TO command,
and attempts to report the delivery error to the bounce service (this
appends the error report to a file /var/spool/postfix/48E68BF5F).

2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: < mail.telia.com[62.20.233.128]:25: 550 RCPT TO:<nonexistingm...@telia.com> User unknown 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: connect to subsystem private/bounce 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr nrequest = 0 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr flags = 0 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr queue_id = 48E68BF5F 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr original_recipient = nonexistingm...@telia.com 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr recipient = nonexistingm...@telia.com 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr offset = 751 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr dsn_orig_rcpt = rfc822;nonexistingm...@telia.com 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr notify_flags = 0 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr status = 5.0.0 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr diag_type = smtp 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr diag_text = 550 RCPT TO:<nonexistingm...@telia.com> User unknown 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr mta_type = dns 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr mta_mname = mail.telia.com 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr action = failed 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr reason = host mail.telia.com[62.20.233.128] said: 550 RCPT TO:<nonexistingm...@telia.com> User unknown (in reply to RCPT TO command) 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: connect to subsystem private/defer

After "send attr reason" the bounce daemon is supposed to reply
with a status (success or failure). But the logging shows that the
Postfix SMTP client does not attempt to receive that response.
Instead it has decided that the bounce request could not be sent,
and makes a connection to the defer service to report that error.

As we will see later the bounce request did actually succeed.

Here is the function that sends the bounce request and receives the
status reply.

int     mail_command_client(const char *class, const char *name,...)
{
    va_list ap;
    VSTREAM *stream;
    int     status;

    /*
     * Talk a little protocol with the specified service.
     */
    if ((stream = mail_connect(class, name, BLOCKING)) == 0)
        return (-1);
    va_start(ap, name);
    status = attr_vprint(stream, ATTR_FLAG_NONE, ap);
    va_end(ap);
    if (status != 0
        || attr_scan(stream, ATTR_FLAG_STRICT,
                     RECV_ATTR_INT(MAIL_ATTR_STATUS, &status), 0) != 1)
        status = -1;
    (void) vstream_fclose(stream);
    return (status);
}

Apparently, attr_vprint (an alias for attr_vprint0) returns that
the request could not be sent, OR attr_scan (an alias for attr_scan0)
fails to receive the reply.

Here is the attr_vprint0 code fragment:

int     attr_vprint0(VSTREAM *fp, int flags, va_list ap)
{
        ...
    /*
     * Iterate over all (type, name, value) triples, and produce output on
     * the fly.
     */
    while ((attr_type = va_arg(ap, int)) != ATTR_TYPE_END) {
        switch (attr_type) {
            ... code that logs "send attr xxx = yyy" and that uses
            vstream_fprintf() or vstream_fwrite() depending on the
            attribute type...
    }
    if ((flags & ATTR_FLAG_MORE) == 0)
        VSTREAM_PUTC('\0', fp);
    return (vstream_ferror(fp));
}

The vstream_ferror() at the end implements the error check on exit
from the attr_vprint0 function.

Now it is extremely unlikely that vstream_ferror() reports an error
on a newly-opened stream that has only a few bytes queued for output
and that has never been flushed.

So let's have a look at attr_scan0() and see if this function
can return without trying to read any input.

int     attr_vscan0(VSTREAM *fp, int flags, va_list ap)
{
    ...declarations...

    /*
     * EOF check.
     */
    if ((ch = VSTREAM_GETC(fp)) == VSTREAM_EOF)
        return (0);
    vstream_ungetc(fp, ch);

    ...after this comes code that logs "wanted attribute: <name of
    attribute>" and that attempts to receive the attribute, and
    reports "premature EOF" or other read errors...

So the more likely explanation is that VSTREAM_GETC() reports an
error. That error happens either as the queued output with bounce
request is auto-flushed, or it waits for the socket to become
readable.

This activity is logged with "-v -v -v", but that may actually slow
down the program enough to affect kernel bug race-condition errors
and change the result, just like program tracing does.

At this point I am pretty sure that we are looking at a kernel bug
with UNIX-domain sockets, but I don't want to debug kernel bugs
over email. I can debug a system that runs in a local VM.

Debugging should be straightforward: mail_command_client() is used
only for bounce daemon requests (including defer and trace requests)
so this function can be ripped apart, and the fault will be with
one of the following steps:

        write-wait for the socket to become writable

        write the queued bounce server request

        read-wait for the socket to become readable

The remainder of the log shows that the bounce request succeeded anyway.

First, the defer service request to report the bounce service error:

2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: connect to subsystem private/defer 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr nrequest = 0 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr flags = 0 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr queue_id = 48E68BF5F 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr original_recipient = nonexistingm...@telia.com 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr recipient = nonexistingm...@telia.com 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr offset = 751 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr dsn_orig_rcpt = rfc822;nonexistingm...@telia.com 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr notify_flags = 0 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr status = 4.3.0 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr diag_type = smtp 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr diag_text = 550 RCPT TO:<nonexistingm...@telia.com> User unknown 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr mta_type = dns 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr mta_mname = mail.telia.com 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr action = delayed 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr reason = bounce or trace service failure 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: 48E68BF5F: to=<nonexistingm...@telia.com>, relay=mail.telia.com[62.20.233.128]:25, delay=5.3, delays=0.16/0/0.06/5.1, dsn=4.3.0, status=deferred (bounce or trace service failure)

Then some code to update the fast flush log if applicable (it is not):

2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: flush_add: site telia.com id 48E68BF5F 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: match_hostname: telia.com ~? irf.se 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: match_list_match: telia.com: no match 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: flush_add: site telia.com id 48E68BF5F status 4

The last part of the outbound SMTP session:

2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: > mail.telia.com[62.20.233.128]:25: RSET 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: smtp_stream_setup: maxtime=20 enable_deadline=0 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: < mail.telia.com[62.20.233.128]:25: 250 RSET 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: > mail.telia.com[62.20.233.128]:25: QUIT 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: name_mask: resource 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: name_mask: software

The report from SMTP client to queue manager that delivery was deferred:

2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: deliver_request_final: send: "" -1 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr status = 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr diag_type = 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr diag_text = 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr mta_type = 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr mta_mname = 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr action = 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr reason = 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr status = 4294967295 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: master_notify: status 1 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: connection closed

And that is the end of the SMTP client's attempt to deliver the message with queue ID 48E68BF5F.

And now for the surprise: the bounce daemon did record the request
to bounce the message with queue ID 48E68BF5F. The queue manager
sees that this file exists and asks the bounce daemon to create a
non-delivery notification for the message with queue ID 87282BF62.

And although the queue manager reports an error talking to the bounce service, the bounce daemon creates a delivery status notification with queue ID 87282BF62.

2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/cleanup[12867]: 87282BF62: message-id=<20150214081828.87282b...@outgoingmail-2.irf.se> 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/bounce[12876]: 48E68BF5F: sender non-delivery notification: 87282BF62 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/qmgr[12816]: 87282BF62: from=<>, size=2552, nrcpt=1 (queue active) 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/qmgr[12816]: 48E68BF5F: status=deferred (bounce failed)

The next fragment is the SMTP client delivering the delivery status
notification with queue ID 87282BF62.

2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: connection established 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: master_notify: status 0 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: deliver_request_initial: send initial status 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr status = 0 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: smtp socket: wanted attribute: flags 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input attribute name: flags 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input attribute value: 3 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: smtp socket: wanted attribute: queue_name 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input attribute name: queue_name 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input attribute value: active 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: smtp socket: wanted attribute: queue_id 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input attribute name: queue_id 2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input attribute value: 87282BF62

This is then delivered.




--
Mats Luspa
Phone: +46 (0)980 79 022
Institutet för rymdfysik               Fax: +46 (0)980 79 050
Swedish Institute of Space Physics      email: ma...@irf.se
Visiting/Delivery address: Rymdcampus 1, SE-981 92 Kiruna
Postal address: Box 812, SE-981 28 Kiruna
--
PGP Public Key: http://www.irf.se/~matsl/pgpmatsl.asc

Reply via email to