Hi! Sorry for the delay in replying: I was waiting for the problem to recur so I could double-check the logs and the states of the imap-login processes.
2009/3/13 Timo Sirainen <t...@iki.fi> > On Mon, 2009-03-09 at 17:41 +0000, Mike Brudenell wrote: > > We have grown to suspect it is to do with one of the imap-login processes > > having a large number of files open. Killing the process seems to get rid > of > > the problem. > > You didn't mention if you actually saw "Too many open files" errors in > log file. If you didn't, the problem isn't with ulimits. No, there's no sign of the "Too many open files" error message in the logfiles. > Likewise the output of the pfiles command on process 12436 (which is the > one > > I believe to be problematic) indicates its limit still has some available > -- > > I'm guessing Dovecot has reduced the limit down to 533 from the 10128 set > in > > the startup script: > > > > Current rlimit: 533 file descriptors > > Yes, v1.1 drops the number of fds to the maximum number that it needs. > Since you had login_max_connections=256, it doesn't need more than twice > as much of them. The 12436 process probably was very close to the 256 > connections, and after reaching that it would have stopped accepting > more. Ah, I see. When I upgraded from 1.0.15 I had 1.1.11 telling me off for having the fd limit set too low at 2048 when I started Dovecot. Instead it told me to raise the limit to at least 10128, so I did. Hence I was a bit surprised to find the limit lowered down to 533 if it had told me it wanted the higher number. > But there do seem to be bugs related to reaching login_max_connections. > I'm not entirely sure what bugs exactly though. It's just better not to > reach it. Perhaps you should change the settings to something like: > > login_processes_count = 2 > login_max_connections = 1024 > > login_processes_count should be about the same as the number of > CPUs/cores on the system (maybe +1). > We're running a pair of servers, each with 8 CPUs. So I'm guessing my login_processes_count = 10 should be OK? The servers are handling a LOT of client machines. For example I've just checked the two machines and as I write there are 1881 "imap" processes on one, and 1808 on the other. I'm guessing that if I increase login_max_connections from its current 256 to 1024 this might delay the problem occurring? And perhaps if I were restart Dovecot in the small hours of the night every few days? Or is an alternative workaround to change login_process_per_connection from no to yes? ...If I were to do this am I right in thinking that imap-login then plays no part in SSL-connected IMAP sessions? As it's imap-login that seems to be having the problem, anything I can do ti reduce the number of connections its handling would presumably help? If it's any help in working out what the problem might be I have the output from the Solaris "pfiles" command, which lists all the open files each process has. The output for a "rogue" imap-login process shows lots of these as being S_IFSOCK and connected to clients as expected. There are also lots which are AF_UNIX as well -- I'm guessing the proxying of SSL connections through imap-login to the imap process? I can send you (Timo) this file privately if you think it might help any. Cheers, Mike B-)