Hi, had some long network stalls during initscripts in 2.4.0-test9. newaliases appeared to block until timeout in do_poll() on udp socket. I've written a demo program to show what's going on without depending on initscript and config (DNS/RPC/NIS) issues - attached polltest.c It sends something to an *unbound* udp port on localhost and polls for reply. In 2.2.16 it returned immediately with ECONNREFUSED while for 2.4.0-t9 it blocks until timout, despite the ICMP port unreachable packet returned in both cases - tcpdump -i lo gives: 14:05:13.591422 localhost.localdomain.32768 > localhost.localdomain.12345: udp 8 (DF) 14:05:13.591422 localhost.localdomain.32768 > localhost.localdomain.12345: udp 8 (DF) 14:05:13.591527 localhost.localdomain > localhost.localdomain: icmp: localhost.localdomain udp port 12345 unreachable (DF) [tos 0xc0] 14:05:13.591527 localhost.localdomain > localhost.localdomain: icmp: localhost.localdomain udp port 12345 unreachable (DF) [tos 0xc0] (i believe everything appears twice because tcpdump sees both ends of lo) So my impression is, the returned ICMP packet is disregarded somehow. I also include a diff of the straces for polltest on 2.2.16 vs. 2.4.0-t9 (trivial things like getpid, gettimeofday, fstat64 removed): --- polltrace-2.2.16 Thu Oct 5 13:57:13 2000 +++ polltrace-2.4.0-t9 Thu Oct 5 14:05:23 2000 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3 bind(3, {sin_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}}, 16) = 0 sendto(3, "foo bar\0", 8, 0, {sin_family=AF_INET, sin_port=htons(12345), sin_addr=inet_addr("127.0.0.1")}}, 16) = 8 -poll([{fd=3, events=POLLIN, revents=POLLERR}], 1, 10000) = 1 -recvfrom(3, 0xbffffcbc, 100, 0, 0xbffffd24, 0xbffffcb8) = -1 ECONNREFUSED (Connection refused) +poll([{fd=3, events=POLLIN}], 1, 10000) = 0 -write(1, "recvfrom: errno=111 - Connection"..., 41) = 41 +write(1, "poll - timeout\n", 15) = 15 To reproduce the problem just start polltest on 2.2.16 (or similar ?) and recent 2.4 to compare the results. It accepts an optional command line argument which is the remote port to poll on. This should be an unbound port and defaults to 12345 if not given. Is this an intentional change? Is it required by SuS e.g.? Am I completely wrong when believing this might brake a number of network programs (traceroute e.g.)? Haven't tried neither non-loopback connection nor other tcp/udp/poll/select combinations. But sane semantics should be consistent IMHO. The testbox was UP in case that matters (lost events at scheduling points???) What am I missing? Regards Martin
#include <stdio.h> #include <unistd.h> #include <string.h> #include <errno.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys/poll.h> inline int CheckErrOut( int retcode, const char *msg ) { if (retcode != -1) return 0; printf("%s: errno=%d - %s\n", msg, errno, strerror(errno)); exit(1); } int main(int argc, char *argv[]) { const char send_msg[] = "foo bar"; int ret = 0; int fd = -1; struct pollfd pfd; struct sockaddr_in from, to; int port = 0; if (argc > 1 && sscanf(argv[1], " %d", &port)==1) ; else port = 12345; fd = socket(PF_INET,SOCK_DGRAM,IPPROTO_UDP); CheckErrOut(fd,"socket"); memset(&from,0,sizeof(from)); from.sin_family = AF_INET; from.sin_addr.s_addr = INADDR_ANY; from.sin_port = 0; ret = bind(fd,&from,sizeof(from)); CheckErrOut(ret,"bind"); memset(&to,0,sizeof(to)); to.sin_family = AF_INET; to.sin_addr.s_addr = htonl(INADDR_LOOPBACK); to.sin_port = htons(port); ret = sendto(fd,send_msg,sizeof(send_msg),0,&to,sizeof(to)); CheckErrOut(ret,"sendto"); memset(&pfd,0,sizeof(pfd)); pfd.fd = fd; pfd.events = POLLIN; ret = poll(&pfd,1,10000); CheckErrOut(ret,"poll"); if (ret != 0) { char recv_buf[100]; int addrlen; ret = recvfrom(fd,recv_buf,sizeof(recv_buf),0,&to,&addrlen); CheckErrOut(ret,"recvfrom"); printf("received: %u bytes\n", ret); } else printf("poll - timeout\n"); close(fd); return 0; }