Stan Hoeppner: > This is making good progress. Seeing the smtpd's memory footprint > drop so dramatically is fantastic. However, I'm still curious as > to why proxymap doesn't appear to be honoring $max_idle or $max_use. > Maybe my understanding of $max_use is not correct? It's currently > set to 100, the default. Watching top while sending a test message > through, I see proxymap launch but then exit within 5 seconds, > while smtpd honors max_idle. Is there some other setting I need > to change to keep proxymap around longer?
Short answer (workaround for low-traffic sites): set ipc_idle=$max_idle to approximate the expected behavior. This keeps the smtpd-to-proxymap connection open for as long as smtpd runs. Then, proxymap won't terminate before its clients terminate. Better: apply the long-term solution, in the form of the patch below. This undoes the max_idle override (a workaround that I introduced with Postfix 2.3). I already introduced the better solution with Postfix 2.4 while solving a different problem. Long answer: in ancient times, all Postfix daemons except qmgr implemented the well-known max_idle=100s and max_use=100, as well as the lesser-known ipc_idle=100s (see "short answer" for the effect of that parameter). While this worked fine for single-client servers such as smtpd, it was not so great for multi-client servers such as proxymap or trivial-rewrite. This problem was known, and the idea was that it would be solved over time. Theoretically, smtpd could run for up to $max_idle * $max_use = 3 hours, while proxymap and trivial-rewrite could run for up to $max_idle * $max_use * $max_use = 12 days on low-traffic systems (one SMTP client every 100s, or a little under 900 SMTP clients a day), and it would run forever on systems with a steady mail flow. This was a problem. The point of max_use is to limit the impact of bugs such as memory or file handle leaks, by retiring a process after doing a limited amount of work. I can test Postfix itself with tools such as Purify and Valgrind, but I can't do those tests with every version of everyone's system libraries. If a proxymap or trivial-rewrite server can run for 11 days even on systems with a minuscule load, then max_use isn't working as intended. The main cause is that the proxymap etc. clients reuse a connection to improve efficiency. Therefore, the proxymap etc. server politely waits until all its clients have disconnected before checking the max_use counter. While this politeness thing can't be changed easily, it is relatively easy to play with the proxymap etc. server's max_idle value, and with the smtpd etc. ipc_ttl value. Postfix 2.3 reduced the proxymap etc. max_idle to a fixed 1s value to make those processes go away sooner when idle. I think that this was a mistake, because it makes processes terminate too soon, and thereby worsens the low-traffic behavior. Instead, we should speed up the proxymap etc. server's max_use counter. Postfix 2.4 reduced ipc_ttl to 5s. This was done for a different purpose: to allow proxymap etc. clients to switch to the least-loaded proxymap etc. server. But, I think that this was also the right way to deal with long-lived proxymap etc. processes, because it speeds up the proxymap etc. max_use counter. The patch below keeps the reduced ipc_ttl from Postfix 2.4, and removes the max_idle overrides from Postfix 2.3. Wietse *** ./src/proxymap/proxymap.c- Thu Jan 10 09:03:55 2008 --- ./src/proxymap/proxymap.c Sun Jan 31 10:52:50 2010 *************** *** 594,605 **** myfree(saved_filter); /* - * This process is called by clients that already enforce the max_idle - * time, so we don't have to do it another time. - */ - var_idle_limit = 1; - - /* * Never, ever, get killed by a master signal, as that could corrupt a * persistent database when we're in the middle of an update. */ --- 594,599 ---- *** ./src/trivial-rewrite/trivial-rewrite.c- Wed Dec 9 18:39:51 2009 --- ./src/trivial-rewrite/trivial-rewrite.c Sun Jan 31 10:53:01 2010 *************** *** 565,576 **** if (resolve_verify.transport_info) transport_post_init(resolve_verify.transport_info); check_table_stats(0, (char *) 0); - - /* - * This process is called by clients that already enforce the max_idle - * time, so we don't have to do it another time. - */ - var_idle_limit = 1; } MAIL_VERSION_STAMP_DECLARE; --- 565,570 ----