Hello!
Thank you for the exhausting explanation of the problem.
Here you got the requested information about the system:
root@outgoingmail-2:~# uname -a
Linux outgoingmail-2 3.16.0-29-generic #39-Ubuntu SMP Mon Dec 15
22:27:29 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
root@outgoingmail-2:~# postconf mail_version
mail_version = 2.11.1
I've installed postfix from the Ubuntu utopic package.
More insformation about the system is that it is a linux container.
But I don't think that has any effect to the software.
It's no problem for me to let you get access to the system to debug if
you would prefer that. If you are familiar with linux-containers I
could also make a clone and you could get it.
I could also get the source code from the package, compile and add the
debugging code that you mentioned in the mail.
Mail me what method that you want to use.
/Regards Mats
Quoting Wietse Venema <wie...@porcupine.org>:
Thanks for the "smtp -v/relay -v" logging. Your logging confirms
that there is a bogus error talking to your bounce daemon.
Although Postfix detects the bogus error, unfortunately it produces
no informative logging for this particular error.
Questions:
- What is the output from "uname -a"? Postfix has bug workarounds
for MacOSX and Solaris, but maybe they improved their software and
we need to add a new workaround.
- What is the output from "postconf mail_version"? maybe you are
running a Postfix version that does not have all the known OS bug
workarounds.
- Did you build Postfix from source code, or is this a binary
distribution? If you built from source I will ask you to modify the
source for better logging. If you installed a binary distribution,
I can set up a VM with the same OS and Postfix distribution and do
the experiments myself.
The remainder of this email is annotated logging that shows problems
talking to the bounce daemon, in both the Postfix SMTP client and
in the queue manager. The logging also illustrates that is a bogus
error because the bounce service requests actually succeed.
First, the Postfix SMTP client receives a 550 reply to an RCPT TO command,
and attempts to report the delivery error to the bounce service (this
appends the error report to a file /var/spool/postfix/48E68BF5F).
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: <
mail.telia.com[62.20.233.128]:25: 550 RCPT
TO:<nonexistingm...@telia.com> User unknown
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
connect to subsystem private/bounce
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr nrequest = 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr flags = 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr queue_id = 48E68BF5F
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr original_recipient = nonexistingm...@telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr recipient = nonexistingm...@telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr offset = 751
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr dsn_orig_rcpt = rfc822;nonexistingm...@telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr notify_flags = 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr status = 5.0.0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr diag_type = smtp
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr diag_text = 550 RCPT TO:<nonexistingm...@telia.com> User unknown
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr mta_type = dns
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr mta_mname = mail.telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr action = failed
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr reason = host mail.telia.com[62.20.233.128] said: 550 RCPT
TO:<nonexistingm...@telia.com> User unknown (in reply to RCPT TO
command)
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
connect to subsystem private/defer
After "send attr reason" the bounce daemon is supposed to reply
with a status (success or failure). But the logging shows that the
Postfix SMTP client does not attempt to receive that response.
Instead it has decided that the bounce request could not be sent,
and makes a connection to the defer service to report that error.
As we will see later the bounce request did actually succeed.
Here is the function that sends the bounce request and receives the
status reply.
int mail_command_client(const char *class, const char *name,...)
{
va_list ap;
VSTREAM *stream;
int status;
/*
* Talk a little protocol with the specified service.
*/
if ((stream = mail_connect(class, name, BLOCKING)) == 0)
return (-1);
va_start(ap, name);
status = attr_vprint(stream, ATTR_FLAG_NONE, ap);
va_end(ap);
if (status != 0
|| attr_scan(stream, ATTR_FLAG_STRICT,
RECV_ATTR_INT(MAIL_ATTR_STATUS, &status), 0) != 1)
status = -1;
(void) vstream_fclose(stream);
return (status);
}
Apparently, attr_vprint (an alias for attr_vprint0) returns that
the request could not be sent, OR attr_scan (an alias for attr_scan0)
fails to receive the reply.
Here is the attr_vprint0 code fragment:
int attr_vprint0(VSTREAM *fp, int flags, va_list ap)
{
...
/*
* Iterate over all (type, name, value) triples, and produce output on
* the fly.
*/
while ((attr_type = va_arg(ap, int)) != ATTR_TYPE_END) {
switch (attr_type) {
... code that logs "send attr xxx = yyy" and that uses
vstream_fprintf() or vstream_fwrite() depending on the
attribute type...
}
if ((flags & ATTR_FLAG_MORE) == 0)
VSTREAM_PUTC('\0', fp);
return (vstream_ferror(fp));
}
The vstream_ferror() at the end implements the error check on exit
from the attr_vprint0 function.
Now it is extremely unlikely that vstream_ferror() reports an error
on a newly-opened stream that has only a few bytes queued for output
and that has never been flushed.
So let's have a look at attr_scan0() and see if this function
can return without trying to read any input.
int attr_vscan0(VSTREAM *fp, int flags, va_list ap)
{
...declarations...
/*
* EOF check.
*/
if ((ch = VSTREAM_GETC(fp)) == VSTREAM_EOF)
return (0);
vstream_ungetc(fp, ch);
...after this comes code that logs "wanted attribute: <name of
attribute>" and that attempts to receive the attribute, and
reports "premature EOF" or other read errors...
So the more likely explanation is that VSTREAM_GETC() reports an
error. That error happens either as the queued output with bounce
request is auto-flushed, or it waits for the socket to become
readable.
This activity is logged with "-v -v -v", but that may actually slow
down the program enough to affect kernel bug race-condition errors
and change the result, just like program tracing does.
At this point I am pretty sure that we are looking at a kernel bug
with UNIX-domain sockets, but I don't want to debug kernel bugs
over email. I can debug a system that runs in a local VM.
Debugging should be straightforward: mail_command_client() is used
only for bounce daemon requests (including defer and trace requests)
so this function can be ripped apart, and the fault will be with
one of the following steps:
write-wait for the socket to become writable
write the queued bounce server request
read-wait for the socket to become readable
The remainder of the log shows that the bounce request succeeded anyway.
First, the defer service request to report the bounce service error:
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
connect to subsystem private/defer
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr nrequest = 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr flags = 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr queue_id = 48E68BF5F
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr original_recipient = nonexistingm...@telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr recipient = nonexistingm...@telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr offset = 751
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr dsn_orig_rcpt = rfc822;nonexistingm...@telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr notify_flags = 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr status = 4.3.0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr diag_type = smtp
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr diag_text = 550 RCPT TO:<nonexistingm...@telia.com> User unknown
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr mta_type = dns
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr mta_mname = mail.telia.com
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr action = delayed
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr reason = bounce or trace service failure
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
48E68BF5F: to=<nonexistingm...@telia.com>,
relay=mail.telia.com[62.20.233.128]:25, delay=5.3,
delays=0.16/0/0.06/5.1, dsn=4.3.0, status=deferred (bounce or trace
service failure)
Then some code to update the fast flush log if applicable (it is not):
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
flush_add: site telia.com id 48E68BF5F
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
match_hostname: telia.com ~? irf.se
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
match_list_match: telia.com: no match
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
flush_add: site telia.com id 48E68BF5F status 4
The last part of the outbound SMTP session:
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: >
mail.telia.com[62.20.233.128]:25: RSET
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
smtp_stream_setup: maxtime=20 enable_deadline=0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: <
mail.telia.com[62.20.233.128]:25: 250 RSET
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: >
mail.telia.com[62.20.233.128]:25: QUIT
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
name_mask: resource
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
name_mask: software
The report from SMTP client to queue manager that delivery was deferred:
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
deliver_request_final: send: "" -1
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr status =
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr diag_type =
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr diag_text =
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr mta_type =
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr mta_mname =
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr action =
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr reason =
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr status = 4294967295
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
master_notify: status 1
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
connection closed
And that is the end of the SMTP client's attempt to deliver the
message with queue ID 48E68BF5F.
And now for the surprise: the bounce daemon did record the request
to bounce the message with queue ID 48E68BF5F. The queue manager
sees that this file exists and asks the bounce daemon to create a
non-delivery notification for the message with queue ID 87282BF62.
And although the queue manager reports an error talking to the
bounce service,
the bounce daemon creates a delivery status notification with queue
ID 87282BF62.
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/cleanup[12867]:
87282BF62: message-id=<20150214081828.87282b...@outgoingmail-2.irf.se>
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/bounce[12876]:
48E68BF5F: sender non-delivery notification: 87282BF62
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/qmgr[12816]:
87282BF62: from=<>, size=2552, nrcpt=1 (queue active)
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/qmgr[12816]:
48E68BF5F: status=deferred (bounce failed)
The next fragment is the SMTP client delivering the delivery status
notification with queue ID 87282BF62.
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
connection established
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
master_notify: status 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]:
deliver_request_initial: send initial status
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: send
attr status = 0
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: smtp
socket: wanted attribute: flags
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input
attribute name: flags
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input
attribute value: 3
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: smtp
socket: wanted attribute: queue_name
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input
attribute name: queue_name
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input
attribute value: active
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: smtp
socket: wanted attribute: queue_id
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input
attribute name: queue_id
2015-02-14T09:18:28+02:00 outgoingmail-2 postfix/smtp[12869]: input
attribute value: 87282BF62
This is then delivered.
--
Mats Luspa
Phone: +46 (0)980 79 022
Institutet för rymdfysik Fax: +46 (0)980 79 050
Swedish Institute of Space Physics email: ma...@irf.se
Visiting/Delivery address: Rymdcampus 1, SE-981 92 Kiruna
Postal address: Box 812, SE-981 28 Kiruna
--
PGP Public Key: http://www.irf.se/~matsl/pgpmatsl.asc