Marek Kozlowski wrote: > Have you ever read you own code (or quick fixes) written >5 years > ago if you'd forgotten to place comments? ;-)
I often say, "I miss my younger brain." Back then I could remember all of the details. These days I write notes to my future self. My future self who will read them and not remember having written them. These days if I find something like what you have found and find it completely undocumented then I know that I must remove it as part of a cleanup back to the mainstream. If a problem were to appear again then now it is in my recent cache memory. I will then be able to deal with it. And if that requires modifying the configuration then I will leave a comment with it about why it is there and other details. > No, such configurable limits are great. My question was different. I suppose > that many many years ago, many versions ago I had some problem with this > server and I tried to solve it or apply a quick fix by incrementing the > limit. Unfortunately I don't remember the problem. I don't even know if it > could reappear if I set it to the default. Can anyone guess the potential > problem given the solution? ;-)) I think if you were increasing the fork_attempts value that as Wietse noted your system was overloaded. And the root cause of the problem was elsewhere in the system. But to help understand your question specifically it is needed to understand fork() and why it might fail. NetBSD is closest to the original and documents it the simplest. Every kernel is different. But similar. https://man.netbsd.org/fork.2 ERRORS fork() will fail and no child process will be created if: [EAGAIN] The system-imposed limit on the total number of processes under execution would be exceeded. This limit is configura- tion-dependent; or the limit RLIMIT_NPROC on the total number of processes under execution by this user id would be exceeded. [ENOMEM] There is insufficient swap space for the new process. If the kernel is out of process slots then fork() fails. The shell usually says "cannot fork" in that case. Too many processes are running on the system at the same time. Fork-bombs can create this situation. [[ When out of process slots is sometimes possible to run exactly one command by replacing the current shell with it using "exec" to overlay the new process on the same PID as the shell that invokes it. ]] If increasing the number of fork attempts improved things then unlikely a fork-bomb but probably something else running some thousands of processes "for a while" that drained out "after a bit". And so a retry of fork() then happens to work later when there is a process slot available for it. If this is a web server, if it is running CGI or FPM or Apache or many other things that react to Internet activity by starting processes, if this is not limited with a reasonable maximum number of processes, then it is possible for Internet activity to force the system into process stress by causing it to start godzillians of processes. If that process creation is not limited. For example. There are many other possibilities too. And high max defaults of 256 might be completely unsuitable if the actual maximum the system can handle of something before running out of memory is 12. Don't guess. Do the math and figure it out. Then test it to verify. If there is insufficient memory then the fork() fails. This might happen if all of the available system memory used for processes has been consumed. Again if increasing the fork retry limit helped then it means that something transiently consumed all memory "for a while" and then that drained out "after a bit" and subsequently allowed a later retry to succeed. Note that different kernels handle these cases very differently. Linux notably uses memory overcommit. Which means fork() won't fail due to out of memory. If the system is out of memory the fork() succeeds but then later when memory is written too and the system does not have memory available the OOM Out-of-Memory Killer is invoked to kill processes until enough memory becomes available again. It's a completely different memory model. And harder to be as robust on servers as the traditional model. Because the OOM might kill something necessary. And then it would need to be restarted. Better to avoid it getting killed at all. This is a big discussion topic just by itself. Personally I would never see increasing fork_attempts as a suitable solution for either of the main reasons fork() might fail. Instead I would want to understand what is causing the resource stress. If it is too many processes from something XYZ then limit those in XYZ to below the process slot limit of the system. If the problem is virtual memory then I would increase the amount of virtual memory available on the system. Or again limit the number of XYZ processes that are consuming memory. Or if it is a single process then limit the amount of memory that single process can consume. If either of these are problems on your system then it is better to get an understanding of the system resources. Then make sure there are enough resources available. Memory. CPU. Storage. Whatever. For a simple single system I like using Munin to monitor the trending state of the system. Munin is one of many system monitoring systems which allow looking at what happened on a system after the fact. And also what is happening now on the system. There are many popular monitoring tools. Try several. Find one you like. If you are monitoring your system and nothing bad is happening then leave the defaults in place. Be happy. If your system is periodically spiking into problems then address those problems to prevent them rather than working around them by increasing the number of retries of other things. Note that while Postfix has retries on fork() failures almost nothing else on the system does that. Which means that if it is in a state where fork() is failing then many other random things on the system will also be failing as they will be unprotected. Bob