On 19 Apr 2016, at 12:55, Aki Tuomi <aki.tu...@dovecot.fi> wrote:
> 
> I am planning to add foreman component to dovecot core and I am hoping
> for some feedback:
> 
> Foreman - generic per-user worker handling component

First an explanation of what this was planned to be used for: Think about many 
short-lived JMAP (HTTP) connections with each connection creating a new jmap 
process that opens the user's mailbox, processes the JMAP command, closes the 
mailbox and kills the process. Repeat for each command. Not very efficient when 
the same jmap process could handle all of the user's JMAP requests. The same 
problem exists also with most webmails' IMAP connections that are very 
short-lived.

One annoying problem with the foreman concept is that it requires an open UNIX 
socket for all the worker processes. Which could mean >10k open UNIX sockets, 
which all too often runs into file descriptor limits. We could of course just 
increase it high enough, and it probably would work ok.. But I also hate adding 
more of these "master" processes because they don't scale easily to multiple 
CPUs so they might become bottlenecks at some point (and some of these existing 
master processes already have become bottlenecks).

I've been trying to figure out a nice solution for the above problem for years 
already, but never really came up with anything better. Except today finally I 
had the new realization that anvil process already contains all of the needed 
information. We don't need a new process containing duplicated data, just some 
expansion of anvil and master. Of course, anvil is still kind of a "master" 
process that knows about all users, but it's already there anyway. And there's 
the new idea of how to avoid a single process using a ton of sockets:

(Talking only about IMAP here for clarity, but the same applies to POP3, JMAP 
and others.)

 - Today anvil already keeps track of (user, protocol, imap-process-pid), which 
is where "doveadm who" gets the user list.
 - Today imap-login process already does anvil lookup to see if the user has 
too many open connections. This lookup could be changed to also return the 
imap-process-pid[] array.
 - We'll add a new feature to Dovecot master: Ability to specify service imap { 
unix_listener /var/run/dovecot/login/imap-%{pid} { .. } }, which would cause 
such a UNIX socket path to be dynamically created for each created process. 
Only that one process is listening in the socket, master process itself 
wouldn't keep it open. When the process gets destroyed, the socket gets deleted 
automatically.
 - When imap process starts serving an IMAP connection, it does fchmod(socket, 
0) for its imap-%{pid} listener. When it stops serving an active IMAP 
connection it does fchmod(socket, original-permissions).
 - imap-login process attempts to connect to each imap-%{pid} socket based on 
the imap-process-pid[] list returned by anvil. It ignores each EACCES failure, 
because those are already serving IMAP connections. If it succeeds in 
connecting, it sends the IMAP connection fd to it. If not, it connects to the 
default imap socket to create a new process.
 - The above method of trying to connect to every imap-process-pid[] is probaly 
efficient enough, although it probably ends up doing a lot of unnecessary 
connect() syscalls to sockets that are already handling existing connections. 
If this needs to be optimized, we could also enhance anvil to keep track of the 
"does this process have an active connection" flag and it would only return 
imap-process-pid[] for the processes without an active connection. There are of 
course some race conditions with this in any case but the worst that can happen 
is that a new imap process is created when there was another existing one 
already that could have served the connection, so slightly worse performance in 
some rare situations.

These same per-process sockets might be useful for other purposes too.. I've 
many times wanted an ability to communicate with an existing process. The "ipc" 
process was an attempt to do something about it, but it's not very nice and has 
the same problems with potentially using a huge number of fds.

Then there's the issue of how the management of idle processes (= processes 
with no active IMAP connections) goes:
 - service { idle_kill } already specifies when processes without clients are 
killed. We can use this here as well, so when IMAP connection has closed the 
process stays alive for idle_kill number of seconds until it gets closed.
 - If idle_kill times are set large enough on a busy system, we're usually 
reaching service { process_limit } constantly. So when no new processes can be 
created, we need the ability to kill an existing process instead. I think this 
is master process's job. When connection comes to "imap" and process_limit is 
reached, master picks the imap process with the longest idle-time and kills it 
(*). Then it waits for it to die and creates a new process afterwards. There's 
race condition here though and the process may not die but instead notify 
master that it's serving a new client. In this case master needs to retry with 
the next process. The process destroying might also not be fast always. To 
avoid unnecessarily large latencies due to waiting for process destruction, I 
think master should always try to stay a bit below process_limit (= a new 
service setting).
 - (*) I'm not sure if longest idle-time is the ideal algorithm. Some more 
heuristics would be useful, but this complicates master process too much. The 
processes themselves could try to influence master's decisions with some status 
notifications. For example if we've determined that u...@example.com constantly 
logs in every 5 minutes, and the process has been idle for 4mins59 seconds, 
which is also the oldest idling process, we still don't want to kill it because 
we know that it's going to be recreated in 1 second anyway. This is probably 
not going to be in the first version though.

Reply via email to