On 22 Feb 2017, at 17.07, KT Walrus <ke...@my.walr.us> wrote:
> 
> I have seen proposals for a new client protocol called JMAP that seem to be 
> all about running a mail server at scale like an NGINX https web server can 
> scale. That got me thinking about wether there is anything fundamental about 
> IMAP that causes it to be difficult to scale. After looking into Dovecot’s 
> current IMAP implementation, I think the approach was taken that 
> fundamentally would have scaling issues (as in, one backend process per IMAP 
> session). I see a couple years ago, work was done to “migrate” idling IMAP 
> sessions to a single process that “remembers” the state of the IMAP session 
> and can restore it back to a backend process when the idling is done.
> 
> But, the only estimate that I have read about the “migrate idling” is that 
> you are likely to see only a 20% reduction of the number of concurrent 
> processes you need if you are running at 50,000 IMAP sessions per mail 
> server. 20% reduction is not nearly enough of a benefit for scale. I would 
> need to see at least an order of magnitude improvement to scale (and 
> hopefully, several orders of magnitude).

My long-term plans are something like this:

 * imap-hibernate process can be used more aggressively. Not necessarily even 
for just IDLEing sessions, but for any session that isn't actively being used. 
And actually if the server is too busy, even active sessions could be 
hibernated. That would be somewhat similar to cooperative multitasking. When 
this is done, you can think of the current imap processes as the worker 
processes.

 * More state will be transferred to imap-hibernate process, so it can perform 
simpler commands without recreating the IMAP process. For example STATUS 
replies can be returned from cached state as long as it hasn't actually changed.

 * imap-hibernate is currently tracking changed state via inotify (etc.) This 
mostly work, but it's also unnecessarily sometimes waking up. For example just 
because one IMAP session performed a FETCH that added something to 
dovecot.index.cache, it doesn't mean that there are any real changes. We'll 
need some mail plugin that notifies imap-hibernate process when some real 
change has happened.

 * Hibernated sessions can even be moved away entirely from backends into IMAP 
proxies. The IMAP proxy can then reconnect to backend to re-establish the 
session. This allows even switching backends entirely, as long as the storage 
is shared. This requires that backends notify the proxy whenever something 
changes to the user, which is mostly a continuation of the previous item (just 
TCP notification instead of UNIX socket notification).

 * IMAP proxies can also perform similar limited functionality as 
imap-hibernate processes. Possibly running the same imap-hibernate processes.

 * And kind of a reverse of hibernation: imap processes can also preserve the 
user's imap session and opened folder indexes in memory even after the IMAP 
client has disconnected. If the same user connects back, the imap process can 
quickly be re-used with all the state already open. This is especially useful 
for client that create many short-lived connections, such as webmails.

So after all these changes there would practically be something like 1000 imap 
processes constantly open and either doing work or waiting for a recently 
disconnected IMAP client to come back.

As Christian already mentioned, the Dovecot proxies are supposed to be able to 
handle quite a lot of connections. I wouldn't be surprised if you can already 
do millions of connections with them. Most of our customers haven't tried 
scaling them very hard because they don't really want to create multiple IP 
addresses for servers, which is required to avoid running out of TCP ports (or 
I guess there could be multiple destination ports, but that also complicates 
things and Dovecot doesn't currently support that in an easy way either).

> Is there anything about the IMAP protocol that would prevent an 
> implementation from scaling to 10 Million users per server? Or, do we need to 
> push for a new protocol like JMAP that has been designed to scale better (by 
> being stateless with the server requests)?

I guess mainly the message sequence numbers in IMAP protocol makes this more 
difficult, but it's not an impossible problem to solve.

Reply via email to