On 2008-04-12 14:57:52 +0000, R wrote:
> On Sat, 12 Apr 2008, Peter J. Holzer wrote:
> > On 2008-04-11 15:20:21 +0000, R wrote:
> > > On Fri, 11 Apr 2008, Charlie Brady wrote:
> > > > I suspect that the only reason you hit the limit was because some plugin
> > > > was not returning. I suspect that qpsmtpd wasn't actually hanging, it
> > > > just wasn't accepting more connections.
> > >
> > > Tracing a non-returning plugin could be very difficult :-(
> >
> > Shouldn't be that bad. If you run at log level LOGINFO, all calls to
> > plugins are logged (in vanilla 0.32, if you are using my RPMs, you need
> > LOGDEBUG). The "Too many connections: ..." message is also at priority
> > LOGINFO, so if you are running at a higher log level, that explains why
> > you don't see that message.
> 
> I set up from scratch. I do get that log message for too many connections
> from the same IP.

That's also LOGINFO.

> I'm set for LOGNOTICE, though all of my "comment" logging is LOGCRIT -
> I think all plugins used are highly customized for this particular
> site.

Probably. You woudn't see the the message above with the unmodified
plugin at that setting. (Yep, LOGINFO is way too low for such an
important message, but log levels in qpsmtpd are generally strange).
Anyway, if you modified your code you'll have to check for yourself if
the messages you expect to see have the correct level - we can't help
you there.



> Agreed. It seems unusual to me that it would happen with ALL 31
> connections, which would all stay hung for several hours before I
> discovered it and restarted forkserver.
> 
> It sounds more, to me, as if the 32nd connection triggered something that
> hung all of the connections, permanently.

They are different processes, so they cannot cause each other to hang.
I can only think of two scenarios which would cause all of them to hang:

1) The parent process hangs. In that case the child processes do not
   hang, but run to completion. But they won't be reaped by the parent
   so their corpses hang around forever as zombies (state Z).

2) One of the processes sends (or causes the OS to send) a SIGSTOP to
   the process group. That would really cause all processes to hang but
   you would see that they are all in state T.

Anyway, you can test that pretty easily. Just open 32 parallel
connections to the server yourself. Something like 

for i in `seq 1 32`
do
    xterm -e telnet yourserver 25 &
done

should do nicely. Does the server really hang at that point? Or are you
able to speak SMTP in the open sessions and quit them? Does the server
accept a new connection for each one you quit?


> > As for the latter, I don't think that is happening. It is more probable
> > that number of hanging processes is slowly increasing. Unless you are
> > monitoring for long-running qpsmtpd processes you won't notice this at
> > first (if you have 10 hanging processes, you can still accept 20 mails
> > in parallel). Only when the number of hanging processes approaches your
> > limit, you will notice that less and less mail gets through until when
> > you hit the limit, no new connections are accepted at all (which is
> > probably the point where users start to complain).
> 
> As I indicated, I monitor connections a lot - just grep the for forkserver
> processes. They usually range from 5-20 at any one time, and they
> constantly change. I think it's too much of a coincidence to suggest that
> 31 connections gradually, or even quickly, got filled and hung at the same
> point in a plugin.

If you don't see the same processes hanging around for some time before
the hang that shoots down the "gradually filling up" theory. It's still
possible that it fills up quickly. All you need is one client opening a
lot of connections at once and then keeping them open (either by hitting
a bug which hangs the process or simply by keeping qpsmtpd busy for long
enough for you to notice - Something like sending "RSET" every few
minutes would suffice).


> When I tried a manual connection, there was neither logging nor any
> kind of response from qpsmtpd to my telnet to port 25.

That's to be expected. qpsmtpd doesn't accept any new connections when
the limit is reached. All it does is wait for children to die and to log
a message once per second (but you probably won't see that at your
settings).

        hp

-- 
   _  | Peter J. Holzer    | It took a genius to create [TeX],
|_|_) | Sysadmin WSR       | and it takes a genius to maintain it.
| |   | [EMAIL PROTECTED]         | That's not engineering, that's art.
__/   | http://www.hjp.at/ |    -- David Kastrup in comp.text.tex

Attachment: signature.asc
Description: Digital signature

Reply via email to