2019-06-21 21:17:03 UTC - Sinan Bir: @Sinan Bir has joined the channel ---- 2019-06-21 23:33:18 UTC - Fredrick P Eisele: I have encountered what appears to be a race condition in pulsar. The pulsar cluster is running on OpenStack. The configuration consists of three zk virtual-machines and three pulsar+bookkeeper virtual-machines. I run a command such as the following: ./bin/pulsar-admin brokers list use Sometimes I get what I expect: pulsar01:8080 pulsar02:8080 pulsar03:8080 And sometimes I get: null Reason: javax.ws.rs.ProcessingException: Connection refused: localhost/127.0.0.1:8080 ---- 2019-06-21 23:37:14 UTC - Fredrick P Eisele: If I run with `strace -f -e trace=network ./bin/pulsar-admin brokers list use` I get the following:
Successful: excerpt strace: Process conn:02 attached [pid conn:02 connect(288, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) [pid conn:02 getsockopt(288, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 [pid conn:02 getsockname(288, {sa_family=AF_INET, sin_port=htons(59512), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 [pid conn:02 getsockname(288, {sa_family=AF_INET, sin_port=htons(59512), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 The failure mode retries a few times before giving up. [pid conn:02 socketpair(AF_UNIX, SOCK_STREAM, 0, [289, 290]) = 0 [pid conn:02 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x8} --- Failure: excerpt strace: Process conn:02 attached [pid conn:02 connect(288, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) [pid conn:02 getsockopt(288, SOL_SOCKET, SO_ERROR, [111], [4]) = 0 [pid conn:02 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x8} --- ---- 2019-06-21 23:42:44 UTC - Fredrick P Eisele: The 111 error code is ECONNREFUSED per /usr/include/asm-generic/errno.h ----