On Tue, Dec 11, 2018 at 05:46:10PM +0100, Havard Eidnes wrote: > > Hmm, I already have that, but I wonder, how big is "bigger"? Well, > looks like the answer is that BIND tries to probe for the biggest it > can be allowed to set on startup, by starting with a large value and > approximately halfing it successively if I read the code right. BIND > doesn't log what setting it is using, though... >
I stumbled across this thread today after also investigating what socket buffer size is actually chosen by BIND. I noticed the code behaved a bit differently then what I tought from first looking at it. On my linux machine with net.core.rmem_max set to the system default of "212992" I was expecting the "again" goto loop to decrease the rcvbuf until setsockopt succeeded, but setsockopt actually succeeds even if the requested size is larger than what is allowed by that limit. What happens is that the setsockopt succeeds, but the actual value set is the maximum allowed by net.core.rmem_max (which ends up being doubled). The doubling is described in socket(7): === SO_RCVBUF Sets or gets the maximum socket receive buffer in bytes. The kernel doubles this value (to allow space for bookkeeping overhead) when it is set using setsockopt(2), and this doubled value is returned by getsock‐ opt(2). The default value is set by the /proc/sys/net/core/rmem_default file, and the maximum allowed value is set by the /proc/sys/net/core/rmem_max file. The minimum (doubled) value for this option is 256. === Here is a standalone version of set_rcvbuf() that I yanked out of BIND 9.11.6 codebase with some printfs sprinkled in for added visibility: === #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <netinet/udp.h> #include <errno.h> #include <unistd.h> #include <string.h> #define ISC_SOCKADDR_LEN_T unsigned int #define ISC_PLATFORM_HAVEIPV6 1 #define TUNE_LARGE 1 /*% * The size to raise the receive buffer to (from BIND 8). */ #ifdef TUNE_LARGE #ifdef sun #define RCVBUFSIZE (1*1024*1024) #else #define RCVBUFSIZE (16*1024*1024) #endif #else #define RCVBUFSIZE (32*1024) #endif /* TUNE_LARGE */ static int rcvbuf = RCVBUFSIZE; static void set_rcvbuf(void) { int fd; int max = rcvbuf, min; ISC_SOCKADDR_LEN_T len; // Added stuff int final; ISC_SOCKADDR_LEN_T final_len; printf("requested SO_RCVBUF size (max): %d\n", max); fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP); #if defined(ISC_PLATFORM_HAVEIPV6) if (fd == -1) { switch (errno) { case EPROTONOSUPPORT: case EPFNOSUPPORT: case EAFNOSUPPORT: /* * Linux 2.2 (and maybe others) return EINVAL instead of * EAFNOSUPPORT. */ case EINVAL: fd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_UDP); break; } } #endif if (fd == -1) return; len = sizeof(min); if (getsockopt(fd, SOL_SOCKET, SO_RCVBUF, (void *)&min, &len) == 0 && min < rcvbuf) { printf("initial SO_RCVBUF size (min) %d is less than %d, attempting to increase it\n", min, rcvbuf); again: printf("attempting to set SO_RCVBUF to rcvbuf (%d)\n", rcvbuf); if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, (void *)&rcvbuf, sizeof(rcvbuf)) == -1) { printf("setsockopt failed\n"); if (errno == ENOBUFS && rcvbuf > min) { printf("errno was ENOBUFS\n"); printf("max: %d\n", max); max = rcvbuf - 1; printf("new max: %d\n", max); rcvbuf = (rcvbuf + min) / 2; printf("new rcvbuf: %d\n", max); goto again; } else { //printf("errno was not ENOBUFS (was: %s)\n", strerror(errno)); rcvbuf = min; printf("min rcvbuf: %d\n", rcvbuf); goto cleanup; } } else { printf("setsockopt succeeded\n"); min = rcvbuf; printf("new min: %d\n", min); } if (min != max) { printf("min (%d) not equal to max (%d)\n", min, max); rcvbuf = max; goto again; } } final_len = sizeof(final); if (getsockopt(fd, SOL_SOCKET, SO_RCVBUF, (void *)&final, &final_len) == 0 ) { printf("final SO_RCVBUF size: %d\n", final); } cleanup: close (fd); } int main() { set_rcvbuf(); return 0; } === And the result from running it on my machine: === $ sysctl net.core.rmem_max net.core.rmem_max = 212992 $ gcc -Wall -pedantic -Wextra bind_rcvbuf.c -o bind_rcvbuf $ ./bind_rcvbuf requested SO_RCVBUF size (max): 16777216 initial SO_RCVBUF size (min) 212992 is less than 16777216, attempting to increase it attempting to set SO_RCVBUF to rcvbuf (16777216) setsockopt succeeded new min: 16777216 final SO_RCVBUF size: 425984 === So here the socket buffer ends up at 425984 (that is, net.core.rmem_max*2), and after setting net.core.rmem_max to 16777216 (the requested value when using --with-tuning=large): === $ ./bind_rcvbuf requested SO_RCVBUF size (max): 16777216 initial SO_RCVBUF size (min) 212992 is less than 16777216, attempting to increase it attempting to set SO_RCVBUF to rcvbuf (16777216) setsockopt succeeded new min: 16777216 final SO_RCVBUF size: 33554432 === As Håvard pointed out BIND does not log what it ends up using, which given the above can be pretty confusing. Would it make sense to add some logging to set_rcvbuf (from what I can tell it would only be run once since it is guarded by rcvbuf_once)? For later versions of BIND I guess the same holds true for set_sndbuf(). > > However, it appears that BIND applies this same setting to each and > every UDP socket BIND creates, ref. lib/isc/unix/socket.c's > opensocket() function, which is probably not required. I would have > thought it would be sufficient to set it on those sockets which serve > port 53, and not on those temporary sockets BIND creates to talk to > other name servers in the process of doing recursion. On a system > which doesn't overcommit resources, this is responsible for needless > waste. > I noticed this as well, is there a reason the increased SO_RCVBUF is used by all sockets, not just the ones listening for requests? -- Patrik Lundin _______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users