Thanks for the "smtp -v/relay -v" logging.  Your logging confirms
that there is a bogus error talking to your bounce daemon.

Although Postfix detects the bogus error, unfortunately it produces
no informative logging for this particular error.

Questions:

- What is the output from "uname -a"? Postfix has bug workarounds
for MacOSX and Solaris, but maybe they improved their software and
we need to add a new workaround. 

- What is the output from "postconf mail_version"? maybe you are
running a Postfix version that does not have all the known OS bug
workarounds.

- Did you build Postfix from source code, or is this a binary
distribution? If you built from source I will ask you to modify the
source for better logging. If you installed a binary distribution,
I can set up a VM with the same OS and Postfix distribution and do
the experiments myself.

The remainder of this email is annotated logging that shows problems
talking to the bounce daemon, in both the Postfix SMTP client and
in the queue manager. The logging also illustrates that is a bogus
error because the bounce service requests actually succeed.

First, the Postfix SMTP client receives a 550 reply to an RCPT TO command,
and attempts to report the delivery error to the bounce service (this
appends the error report to a file /var/spool/postfix/48E68BF5F).

2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: < 
mail.telia.com[62.20.233.128]:25: 550 RCPT TO:<nonexistingm...@telia.com> User 
unknown
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: connect to 
subsystem private/bounce
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
nrequest = 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr flags = 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
queue_id = 48E68BF5F
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
original_recipient = nonexistingm...@telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
recipient = nonexistingm...@telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr offset 
= 751
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
dsn_orig_rcpt = rfc822;nonexistingm...@telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
notify_flags = 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr status 
= 5.0.0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
diag_type = smtp
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
diag_text = 550 RCPT TO:<nonexistingm...@telia.com> User unknown
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
mta_type = dns
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
mta_mname = mail.telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr action 
= failed
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr reason 
= host mail.telia.com[62.20.233.128] said: 550 RCPT 
TO:<nonexistingm...@telia.com> User unknown (in reply to RCPT TO command)
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: connect to 
subsystem private/defer

After "send attr reason" the bounce daemon is supposed to reply
with a status (success or failure). But the logging shows that the
Postfix SMTP client does not attempt to receive that response.
Instead it has decided that the bounce request could not be sent,
and makes a connection to the defer service to report that error.

As we will see later the bounce request did actually succeed.

Here is the function that sends the bounce request and receives the
status reply.

int     mail_command_client(const char *class, const char *name,...)
{
    va_list ap;
    VSTREAM *stream;
    int     status;

    /*
     * Talk a little protocol with the specified service.
     */
    if ((stream = mail_connect(class, name, BLOCKING)) == 0)
        return (-1);
    va_start(ap, name);
    status = attr_vprint(stream, ATTR_FLAG_NONE, ap);
    va_end(ap);
    if (status != 0
        || attr_scan(stream, ATTR_FLAG_STRICT,
                     RECV_ATTR_INT(MAIL_ATTR_STATUS, &status), 0) != 1)
        status = -1;
    (void) vstream_fclose(stream);
    return (status);
}

Apparently, attr_vprint (an alias for attr_vprint0) returns that
the request could not be sent, OR attr_scan (an alias for attr_scan0)
fails to receive the reply.

Here is the attr_vprint0 code fragment:

int     attr_vprint0(VSTREAM *fp, int flags, va_list ap)
{
        ...
    /*
     * Iterate over all (type, name, value) triples, and produce output on
     * the fly.
     */
    while ((attr_type = va_arg(ap, int)) != ATTR_TYPE_END) {
        switch (attr_type) {
            ... code that logs "send attr xxx = yyy" and that uses
            vstream_fprintf() or vstream_fwrite() depending on the
            attribute type...
    }
    if ((flags & ATTR_FLAG_MORE) == 0)
        VSTREAM_PUTC('\0', fp);
    return (vstream_ferror(fp));
}

The vstream_ferror() at the end implements the error check on exit
from the attr_vprint0 function.

Now it is extremely unlikely that vstream_ferror() reports an error
on a newly-opened stream that has only a few bytes queued for output
and that has never been flushed.

So let's have a look at attr_scan0() and see if this function
can return without trying to read any input.

int     attr_vscan0(VSTREAM *fp, int flags, va_list ap)
{
    ...declarations...

    /*
     * EOF check.
     */
    if ((ch = VSTREAM_GETC(fp)) == VSTREAM_EOF)
        return (0);
    vstream_ungetc(fp, ch);

    ...after this comes code that logs "wanted attribute: <name of
    attribute>" and that attempts to receive the attribute, and
    reports "premature EOF" or other read errors...

So the more likely explanation is that VSTREAM_GETC() reports an
error. That error happens either as the queued output with bounce
request is auto-flushed, or it waits for the socket to become
readable.  

This activity is logged with "-v -v -v", but that may actually slow
down the program enough to affect kernel bug race-condition errors
and change the result, just like program tracing does.

At this point I am pretty sure that we are looking at a kernel bug
with UNIX-domain sockets, but I don't want to debug kernel bugs
over email. I can debug a system that runs in a local VM.

Debugging should be straightforward: mail_command_client() is used
only for bounce daemon requests (including defer and trace requests)
so this function can be ripped apart, and the fault will be with
one of the following steps:

        write-wait for the socket to become writable

        write the queued bounce server request

        read-wait for the socket to become readable

The remainder of the log shows that the bounce request succeeded anyway.

First, the defer service request to report the bounce service error:

2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: connect to 
subsystem private/defer
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
nrequest = 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr flags = 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
queue_id = 48E68BF5F
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
original_recipient = nonexistingm...@telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
recipient = nonexistingm...@telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr offset 
= 751
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
dsn_orig_rcpt = rfc822;nonexistingm...@telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
notify_flags = 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr status 
= 4.3.0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
diag_type = smtp
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
diag_text = 550 RCPT TO:<nonexistingm...@telia.com> User unknown
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
mta_type = dns
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
mta_mname = mail.telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr action 
= delayed
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr reason 
= bounce or trace service failure
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: 48E68BF5F: 
to=<nonexistingm...@telia.com>, relay=mail.telia.com[62.20.233.128]:25, 
delay=5.3, delays=0.16/0/0.06/5.1, dsn=4.3.0, status=deferred (bounce or trace 
service failure)

Then some code to update the fast flush log if applicable (it is not):

2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: flush_add: site 
telia.com id 48E68BF5F
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: match_hostname: 
telia.com ~? irf.se
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: match_list_match: 
telia.com: no match
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: flush_add: site 
telia.com id 48E68BF5F status 4

The last part of the outbound SMTP session:

2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: > 
mail.telia.com[62.20.233.128]:25: RSET
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: 
smtp_stream_setup: maxtime=20 enable_deadline=0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: < 
mail.telia.com[62.20.233.128]:25: 250 RSET
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: > 
mail.telia.com[62.20.233.128]:25: QUIT
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: name_mask: 
resource
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: name_mask: 
software

The report from SMTP client to queue manager that delivery was deferred:

2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: 
deliver_request_final: send: "" -1
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr status =
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
diag_type =
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
diag_text =
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
mta_type =
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr 
mta_mname =
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr action =
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr reason =
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr status 
= 4294967295
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: master_notify: 
status 1
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: connection closed

And that is the end of the SMTP client's attempt to deliver the message with 
queue ID 48E68BF5F.

And now for the surprise: the bounce daemon did record the request
to bounce the message with queue ID 48E68BF5F. The queue manager
sees that this file exists and asks the bounce daemon to create a
non-delivery notification for the message with queue ID 87282BF62.

And although the queue manager reports an error talking to the bounce service,
the bounce daemon creates a delivery status notification with queue ID 
87282BF62.

2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/cleanup[12867]: 87282BF62: 
message-id=<20150214081828.87282b...@outgoingmail-2.irf.se>
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/bounce[12876]: 48E68BF5F: 
sender non-delivery notification: 87282BF62
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/qmgr[12816]: 87282BF62: 
from=<>, size=2552, nrcpt=1 (queue active)
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/qmgr[12816]: 48E68BF5F: 
status=deferred (bounce failed)

The next fragment is the SMTP client delivering the delivery status
notification with queue ID 87282BF62.

2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: connection 
established
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: master_notify: 
status 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: 
deliver_request_initial: send initial status
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send attr status 
= 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: smtp socket: 
wanted attribute: flags
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input attribute 
name: flags
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input attribute 
value: 3
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: smtp socket: 
wanted attribute: queue_name
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input attribute 
name: queue_name
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input attribute 
value: active
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: smtp socket: 
wanted attribute: queue_id
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input attribute 
name: queue_id
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input attribute 
value: 87282BF62

This is then delivered.

Reply via email to