[ redirecting to -hackers ] I wrote: > =?utf-8?Q?R=C3=A9mi_Zara?= <remi_z...@mac.com> writes: > Le 20 févr. 2020 à 12:15, Thomas Munro <thomas.mu...@gmail.com> a écrit : >>> Remi, any chance you could run gmake installcheck under >>> contrib/postgres_fdw on that host, to see if this is repeatable? Can >>> you tell us about the relevant limits? Maybe ulimit -n (for the user >>> that runs the build farm), and also sysctl -a | grep descriptors, >>> sysctl -a | grep maxfiles?
> I have a working NetBSD 8/ppc installation, will try to reproduce there. Yup, it reproduces fine here. I see $ ulimit -a ... nofiles (-n descriptors) 128 which squares with the sysctl values: proc.curproc.rlimit.descriptors.soft = 128 proc.curproc.rlimit.descriptors.hard = 1772 kern.maxfiles = 1772 and also with set_max_safe_fds' results: 2020-02-20 14:29:38.610 EST [2218] DEBUG: max_safe_fds = 115, usable_fds = 125, already_open = 3 It seems fairly obvious now that I look at it, but: the epoll and kqueue variants of CreateWaitEventSet are both *fundamentally* unsafe, because they assume that they can always get a FD when they want one, which is not a property that we generally want backend code to have. The only reason we've not seen this before with epoll is a lack of testing under lots-of-FDs stress. The fact that they'll likely leak those FDs on subsequent failures is another not-very-nice property. I think we ought to redesign this so that those FDs are handed out by fd.c, which can ReleaseLruFile() and retry if it gets EMFILE or ENFILE. fd.c could also be responsible for the resource tracking needed to prevent leakage. regards, tom lane