Hello,

I'm stuck with a problem where postfix is hanging sometimes when issuing a 
"postfix stop" command.  In my configuration, I have two domains I'm relaying 
mail for with postfix.  The lists of email addresses are in these virtual 
files, 
defined with this line in main.cf:
virtual_alias_maps = 
hash:$config_directory/usermanaged/virtual.domain1.com,hash:$config_directory/usermanaged/virtual.domain2.com


I have a script called update-postfix.pl that runs every half an hour which 
shuts down postfix, runs postmap on the virtual alias files, and then restarts 
postfix.  Most of the time, the script runs without any problems.  Actually, 
this configuration was running on a different server previously with no 
problems 
at all.  Because the old server was too slow, I set up a new postfix 
installation on a new server, and it's under the new setup that I'm running 
into 
this intermitent hanging problem.

Usually, when the update-postfix.pl script runs, it tells Postfix to shut down 
and we get a logged message that says "postfix/postfix-script: stopping the 
Postfix mail system".  Right after that, postfix responds with something like 
"postfix/master[11211]: terminating on signal 15"

However, sometimes (once every day or so), the script runs and we get the first 
message "postfix/postfix-script: stopping the Postfix mail system", but then 
postfix does not respond to it and keeps running for a while, until it sees the 
virtual.domain2.com.db was updated, at which point it logs 
"postfix/trivial-rewrite[16529]: table 
hash:/etc/postfix/usermanaged/virtual.domain2.com(0,lock|fold_fix) has changed 
-- restarting", and then after that, it appears to be hung.

My first thought was that maybe postfix didn't have enough of an opportunity to 
shut itself down before the .db was updated by the postmap command.  So I put 
in 
a sleep 60 right after the postfix stop command.  Even though when the stop 
command works, we see the "terminating on signal" response almost instantly. 
 Since the 60 seconds didn't work, I increased to 2 minutes, but that also 
didn't help.

I found a forum post (http://www.howtoforge.com/forums/showthread.php?t=15898) 
where someone had a somewhat similar problem.  Someone suggested running 
"newaliases".  So I tried deleting both virtual.domain2.com.db and 
virtual.domain1.com.db, then ran "newaliases", manually ran postmap for domain2 
and domain1, and restarted postfix.  But even after those steps, the problem 
kept happening.

This most recent time it hung, I tried issuing another "service postfix stop", 
as well as a plain "postfix stop", but neither of those caused postfix to 
respond with "terminating on signal".  I checked the ps aux process list, and 
tried killing the postfix processes I saw.  Then tried restarting, and got the 
"already running" error.  So I checked ps aux again and noticed there were a 
bunch of processes being run by the postfix user that were tagged with 
<defunct>.  I tried killing those processes but couldn't kill them.

Does anyone have any ideas on what could be wrong?

Thanks very much in advance for any suggestions!

Scott


Below is a snippet from maillog showing what happens when it hangs.  You can 
see 
that at 23:30:02, the update-postfix.pl script kicks in and tries to stop 
postfix.  It doesn't succeed, and one minute later, Postfix sees the .db was 
updated and tries to restart.  Shortly after that, the update script tries to 
restart postfix but fails because it's already running.  Then postfix stays in 
a 
hung state, not accepting any incoming connections.  Half an hour later, the 
cron job runs again but fails to do anything because postfix is hung.

Oct  9 23:30:02 myserver postfix/postfix-script: stopping the Postfix mail 
system
Oct  9 23:31:04 myserver postfix/trivial-rewrite[16529]: table 
hash:/etc/postfix/usermanaged/virtual.domain2.com(0,lock|fold_fix) has changed 
-- 
restarting
Oct  9 23:31:14 myserver postfix/postfix-script: fatal: the Postfix mail system 
is 
already running
Oct  9 23:32:44 myserver postfix/anvil[16528]: statistics: max connection rate 
1/60s for 
(smtp:110.36.0.252) at Oct  9 23:29:03
Oct  9 23:32:44 myserver postfix/anvil[16528]: statistics: max connection count 
1 for 
(smtp:110.36.0.252) at Oct  9 23:29:03
Oct  9 23:32:44 myserver postfix/anvil[16528]: statistics: max cache size 2 at 
Oct  9 
23:29:23
Oct 10 00:00:02 myserver postfix/postfix-script: stopping the Postfix mail 
system
Oct 10 00:01:13 myserver postfix/postfix-script: fatal: the Postfix mail system 
is 
already running

update-postfix.pl:
-------------------------

#!/usr/bin/perl
$|=1;


my $dir= "/var";

my @fulldfinfo = `df $dir`;
#/dev/da0s1f    33851580 28087462 3055992    90%    /var

my ($dfdevice,$total,$used,$avail,$pct,$dfmount)=split /[\b\t 
]+/,$fulldfinfo[1];

print "space available=$avail\nspace used=$used";
if ($avail > 0) {

open(SH, "|/bin/sh");
print SH <<"EOM";
umask 022
cd /etc/postfix
/sbin/service postfix stop
sleep 120
/usr/sbin/postmap -c /etc/postfix hash:/etc/postfix/access
/usr/sbin/postmap -c /etc/postfix hash:/etc/postfix/transport
/usr/sbin/postmap -c /etc/postfix 
hash:/etc/postfix/usermanaged/virtual.domain1.com
/usr/sbin/postmap -c /etc/postfix 
hash:/etc/postfix/usermanaged/virtual.domain2.com
/usr/sbin/postalias -c /etc/postfix hash:aliases
sleep 10
/sbin/service postfix start

#/usr/sbin/postfix -c /etc/postfix reload
EOM
#mv /etc/postfix/usermanaged/virtual.tmp.db /etc/postfix/usermanaged/virtual.db

} else {

print "Not enough space to generate new db files!";

}


      

Reply via email to