On 1/26/2010 9:06 AM, Erik Sonn wrote:
Dear everyone,

I'm working on some Antispam-Proxy, using Postfix as MTA. Postfix is
2.6.2-RC1 on an Ubuntu 8.04 LTS base-system.


Preconditions:
* Postfix shall only accept mails addressed to valid (=existing)
   recipients. To accomplish this, I'm using a regexp:/ map on
   relay_recipient_maps (the specific file is called "usermaps").
* This usermaps file is automatically generated from an hourly cron-job,
   fetching all valid email-addresses via LDAP (however, the Postfix
   installation doesn't care about LDAP at all, this is autonomously done
   by some perl script).
* The data gathered from LDAP is stuffed into a temporary file until
   finished, and then "atomatically" copied over the original usermaps
   file, before Postfix is triggered to reload.

Problem:
* At very irregular intervals, varying in time and quantity, Postfix
   refuses to accept Mails because the recipient address is seemingly
   unknown, altough that specific mail address (changes every time,
   unpredictable) is correctly defined in the usermaps file. The
   log-messages are like:

2010-01-26T15:10:29+01:00 hostmail postfix/smtpd[22884]: NOQUEUE:
reject: RCPT from smtp.citrix.com[66.165.176.89]: 550 5.1.1
<alexxxx...@xxxxxxx.de>: Recipient address rejected: User unknown in
relay recipient table; from=<no.repl...@citrix.com>
to=<alexxxxx...@xxxxxxxx.de>  proto=ESMTP helo=<SMTP.CITRIX.COM>

* Assuming the hourly cron-job is executed 24 times a day, 1-4 times
   Postfix logs the following message:

2010-01-26T08:57:25+01:00 hostmail postfix/smtpd[3398]: warning: regexp
map /etc/postfix/usermaps, line 2434: no closing regexp delimiter "/":
skipping this rule

The lines-number is always randomly changing, and I have made quite some
effort to make sure that the usermaps file is always complete,
syntactically correct and consistent. As you see, the logentry above is
timed "08:57:25" (the cron-job begins fetching addresses via LDAP always
at *:57).
Interestingly, my 'watch stat /etc/postfix/usermaps' shows this:

# Before the 08:57 cron-job touches usermaps
@Tue Jan 26 08:57:24 CET 2010
Access: 2010-01-26 07:57:24.000000000 +0100
Modify: 2010-01-26 07:57:22.000000000 +0100
Change: 2010-01-26 07:57:22.000000000 +0100

# After the 08:57 cron-job re-wrote usermaps, but Postfix hasn't read it
# yet
@Tue Jan 26 08:57:26 CET 2010
Access: 2010-01-26 08:57:25.000000000 +0100
Modify: 2010-01-26 08:57:25.000000000 +0100
Change: 2010-01-26 08:57:25.000000000 +0100

# After Postfix read the new usermaps after reloading
@Tue Jan 26 08:57:36 CET 2010
Access: 2010-01-26 08:57:35.000000000 +0100
Modify: 2010-01-26 08:57:25.000000000 +0100
Change: 2010-01-26 08:57:25.000000000 +0100

If you look at these times, the file is *read* by Postfix at 08:57:35,
but the log-line above claims the warning at 07:57:25. How can this be?
The 10 seconds delay is because of an intended sleep() between writing
the usermaps and reloading Postfix.

Moreover, when mails a rejected as described above, the *time* these
rejects happen do not seem to correlate with the regexp-warnings, nor do
the rejected recipient mail-addresses. It seems like everything happens
quite random here.

What I've already checked:
* Generation of usermaps file is OK and always succeeds. All addresses
   are successfully fetched, the file is writen syntactically correct and
   complete.
* I/O- and buffering-issues have been tested and shouldn't be the
   problem (e.g. reloading Postfix while I/O buffer hasn't been flushed
   yet).
* The basic Postfix configuration works perfectly and never made any
   troubles. That usermaps issue seems to occur only then the usermaps is
   getting large (>1k lines; in this specific case, it's about 10k lines
   large).

The installation runs on a virtualized platform, using XEN. Postfinger
output is attached. I should also mention that, for various reasons,
it's not *easily* possible for me to simply upgrade the Postfix version.


Thank you very much,
Erik

Postfix is reading a half-written file. A new smtpd process started while the file copy was in progress.

Running "postfix reload" on a busy system is a killer for performance. With 10k entries, you'll be much better off using the hash: or cdb: file type since these file types detect changes automatically with no need for a reload.[1]

Here's an example how to fix your problem. Although this uses hash:, the same basic idea (atomic move rather than copy) will work with regexp: files.
http://www.postfix.org/DATABASE_README.html#safe_db

It's probably not a good idea to run -RC level software long term.

  -- Noel Jones

[1] half-baked workaround: include an empty hash: file along with your regexp file in your config, then rebuild the hash: file whenever the regexp file changes.

Reply via email to