Thanks, Jaikiran I was trying to duplicate the same issue by running the same performance test on master node of cluster , say exemplary-birds.master, and I did see such error again org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
At the same time, I did "lsof -I" here is the screenshot java 6991 root 28u IPv6 4334833 0t0 TCP *:50320 (LISTEN) java 6991 root 29u IPv6 4351835 0t0 TCP exemplary-birds.master:9092->exemplary-birds.master:50472 (ESTABLISHED) java 6991 root 38u IPv6 4366588 0t0 TCP exemplary-birds.master:59536->complicated-laugh.master:2181 (ESTABLISHED) java 6991 root 150u IPv6 4361502 0t0 TCP *:9092 (LISTEN) java 6991 root 151u IPv6 4368439 0t0 TCP exemplary-birds.master:9092->harmful-jar.master:51131 (ESTABLISHED) java 6991 root 154u IPv6 4365924 0t0 TCP exemplary-birds.master:55248->voluminous-mass.master:9092 (ESTABLISHED) java 6991 root 155u IPv6 4366591 0t0 TCP exemplary-birds.master:55245->voluminous-mass.master:9092 (ESTABLISHED) java 6991 root 157u IPv6 4365923 0t0 TCP exemplary-birds.master:41946->harmful-jar.master:9092 (ESTABLISHED) java 6991 root 176u IPv6 4358833 0t0 TCP exemplary-birds.master:55251->voluminous-mass.master:9092 (ESTABLISHED) java 6991 root 179u IPv6 4292501 0t0 TCP exemplary-birds.master:41951->harmful-jar.master:9092 (ESTABLISHED) java 6991 root 180u IPv6 4338331 0t0 TCP exemplary-birds.master:9092->harmful-jar.master:51133 (ESTABLISHED) java 6991 root 181u IPv6 4364530 0t0 TCP exemplary-birds.master:9092->voluminous-mass.master:42897 (ESTABLISHED) java 6991 root 182u IPv6 4358834 0t0 TCP exemplary-birds.master:9092->harmful-jar.master:51134 (ESTABLISHED) java 6991 root 183u IPv6 4354353 0t0 TCP exemplary-birds.master:9092->voluminous-mass.master:42898 (ESTABLISHED) java 6991 root 190u IPv6 4351836 0t0 TCP exemplary-birds.master:9092->localhost:40786 (ESTABLISHED) java 6991 root 201u IPv6 4364543 0t0 TCP exemplary-birds.master:9092->harmful-jar.master:51135 (ESTABLISHED) java 6991 root 202u IPv6 4364544 0t0 TCP exemplary-birds.master:9092->voluminous-mass.master:42899 (ESTABLISHED) java 7218 root 44u IPv6 4366240 0t0 TCP *:46256 (LISTEN) java 7218 root 48u IPv6 4366602 0t0 TCP exemplary-birds.master:50472->exemplary-birds.master:9092 (ESTABLISHED) java 7218 root 50u IPv6 4350446 0t0 TCP exemplary-birds.master:41960->harmful-jar.master:9092 (ESTABLISHED) java 7218 root 51u IPv6 4350447 0t0 TCP localhost:40786->exemplary-birds.master:9092 (ESTABLISHED) java 7218 root 52u IPv6 4350448 0t0 TCP exemplary-birds.master:55263->voluminous-mass.master:9092 (ESTABLISHED) java 17582 root 44u IPv6 4326187 0t0 TCP *:46316 (LISTEN) ntpd 18649 ntp 16u IPv4 656334 0t0 UDP *:ntp ntpd 18649 ntp 17u IPv6 656335 0t0 UDP *:ntp ntpd 18649 ntp 18u IPv4 656341 0t0 UDP localhost:ntp ntpd 18649 ntp 19u IPv4 656342 0t0 UDP exemplary-birds.master:ntp ntpd 18649 ntp 20u IPv6 656343 0t0 UDP localhost:ntp ntpd 18649 ntp 21u IPv6 656344 0t0 UDP [fe80::7a2b:cbff:fe1f:2e77]:ntp sshd 21995 root 3u IPv4 4277546 0t0 TCP exemplary-birds.master:ssh->10.100.68.15:60642 (ESTABLISHED) sshd 22091 fitsum 3u IPv4 4277546 0t0 TCP exemplary-birds.master:ssh->10.100.68.15:60642 (ESTABLISHED) java 22152 root 21u IPv6 213140 0t0 TCP *:52411 (LISTEN) java 22152 root 26u IPv6 213145 0t0 TCP *:2181 (LISTEN) java 22152 root 27u IPv6 211541 0t0 TCP exemplary-birds.master:3888 (LISTEN) java 22152 root 28u IPv6 443527 0t0 TCP exemplary-birds.master:3888->complicated-laugh.master:43940 (ESTABLISHED) java 22152 root 29u IPv6 23347 0t0 TCP exemplary-birds.master:43797->harmful-jar.master:2888 (ESTABLISHED) java 22152 root 30u IPv6 204517 0t0 TCP exemplary-birds.master:3888->harmful-jar.master:50791 (ESTABLISHED) java 22152 root 31u IPv6 4278513 0t0 TCP exemplary-birds.master:3888->voluminous-mass.master:50452 (ESTABLISHED) java 22152 root 32u IPv6 4345845 0t0 TCP exemplary-birds.master:2181->harmful-jar.master:45048 (ESTABLISHED) java 22152 root 33u IPv6 443552 0t0 TCP exemplary-birds.master:3888->beloved-judge.master:56370 (ESTABLISHED) java 22152 root 35u IPv6 4364514 0t0 TCP exemplary-birds.master:2181->voluminous-mass.master:60600 (ESTABLISHED) ssh 24632 sa 3u IPv4 4289852 0t0 TCP exemplary-birds.master:60510->harmful-jar.master:ssh (ESTABLISHED) ssh 24645 sa 3u IPv4 4289867 0t0 TCP exemplary-birds.master:33295->voluminous-mass.master:ssh (ESTABLISHED) I didn't see anything wrong with it, but seem, the connection was temporally closed...... Anyone has similar issue? thanks On Wed, Jan 7, 2015 at 10:32 PM, Jaikiran Pai <jai.forums2...@gmail.com> wrote: > There are different ways to find the connection count and each one depends > on the operating system that's being used. "lsof -i" is one option, for > example, on *nix systems. > > -Jaikiran > > On Thursday 08 January 2015 11:40 AM, Sa Li wrote: > >> Yes, it is weird hostname, ;), that is what our system guys name it. How >> to >> take a note to measure the connections open to 10.100.98.102? >> >> Thanks >> >> AL >> On Jan 7, 2015 9:42 PM, "Jaikiran Pai" <jai.forums2...@gmail.com> wrote: >> >> On Thursday 08 January 2015 01:51 AM, Sa Li wrote: >>> >>> see this type of error again, back to normal in few secs >>>> >>>> [2015-01-07 20:19:49,744] WARN Error in I/O with harmful-jar.master/ >>>> 10.100.98.102 >>>> >>>> That's a really weird hostname, the "harmful-jar.master". Is that >>> really >>> your hostname? You mention that this happens during performance testing. >>> Have you taken a note of how many connection are open to that >>> 10.100.98.102 >>> IP when this "Connection refused" exception happens? >>> >>> -Jaikiran >>> >>> >>> (org.apache.kafka.common.network.Selector) >>> >>>> java.net.ConnectException: Connection refused >>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>> at >>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) >>>> at org.apache.kafka.common.network.Selector.poll( >>>> Selector.java:232) >>>> at >>>> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:191) >>>> at >>>> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184) >>>> at >>>> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115) >>>> at java.lang.Thread.run(Thread.java:745) >>>> [2015-01-07 20:19:49,754] WARN Error in I/O with harmful-jar.master/ >>>> 10.100.98.102 (org.apache.kafka.common.network.Selector) >>>> java.net.ConnectException: Connection refused >>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>> at >>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) >>>> at org.apache.kafka.common.network.Selector.poll( >>>> Selector.java:232) >>>> at >>>> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:191) >>>> at >>>> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184) >>>> at >>>> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115) >>>> at java.lang.Thread.run(Thread.java:745) >>>> [2015-01-07 20:19:49,764] WARN Error in I/O with harmful-jar.master/ >>>> 10.100.98.102 (org.apache.kafka.common.network.Selector) >>>> java.net.ConnectException: Connection refused >>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>> at >>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) >>>> at org.apache.kafka.common.network.Selector.poll( >>>> Selector.java:232) >>>> at >>>> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:191) >>>> at >>>> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184) >>>> at >>>> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115) >>>> at java.lang.Thread.run(Thread.java:745) >>>> 160403 records sent, 32080.6 records/sec (91.78 MB/sec), 507.0 ms avg >>>> latency, 2418.0 max latency. >>>> 109882 records sent, 21976.4 records/sec (62.87 MB/sec), 672.7 ms avg >>>> latency, 3529.0 max latency. >>>> 100315 records sent, 19995.0 records/sec (57.21 MB/sec), 774.8 ms avg >>>> latency, 3858.0 max latency. >>>> >>>> On Wed, Jan 7, 2015 at 12:07 PM, Sa Li <sal...@gmail.com> wrote: >>>> >>>> Hi, All >>>> >>>>> I am doing performance test by >>>>> >>>>> bin/kafka-run-class.sh org.apache.kafka.clients. >>>>> tools.ProducerPerformance >>>>> test-rep-three 500000000 100 -1 acks=1 bootstrap.servers= >>>>> 10.100.98.100:9092,10.100.98.101:9092,10.100.98.102:9092 >>>>> buffer.memory=67108864 batch.size=8196 >>>>> >>>>> where the topic test-rep-three is described as follow: >>>>> >>>>> bin/kafka-topics.sh --describe --zookeeper 10.100.98.101:2181 --topic >>>>> test-rep-three >>>>> Topic:test-rep-three PartitionCount:8 ReplicationFactor:3 >>>>> Configs: >>>>> Topic: test-rep-three Partition: 0 Leader: 100 >>>>> Replicas: >>>>> 100,102,101 Isr: 102,101,100 >>>>> Topic: test-rep-three Partition: 1 Leader: 101 >>>>> Replicas: >>>>> 101,100,102 Isr: 102,101,100 >>>>> Topic: test-rep-three Partition: 2 Leader: 102 >>>>> Replicas: >>>>> 102,101,100 Isr: 101,102,100 >>>>> Topic: test-rep-three Partition: 3 Leader: 100 >>>>> Replicas: >>>>> 100,101,102 Isr: 101,100,102 >>>>> Topic: test-rep-three Partition: 4 Leader: 101 >>>>> Replicas: >>>>> 101,102,100 Isr: 102,100,101 >>>>> Topic: test-rep-three Partition: 5 Leader: 102 >>>>> Replicas: >>>>> 102,100,101 Isr: 100,102,101 >>>>> Topic: test-rep-three Partition: 6 Leader: 102 >>>>> Replicas: >>>>> 100,102,101 Isr: 102,101,100 >>>>> Topic: test-rep-three Partition: 7 Leader: 101 >>>>> Replicas: >>>>> 101,100,102 Isr: 101,100,102 >>>>> >>>>> Apparently, it produces the messages and run for a while, but it >>>>> periodically have such exceptions: >>>>> >>>>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>>>> server >>>>> is not the leader for that topic-partition. >>>>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>>>> server >>>>> is not the leader for that topic-partition. >>>>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>>>> server >>>>> is not the leader for that topic-partition. >>>>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>>>> server >>>>> is not the leader for that topic-partition. >>>>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>>>> server >>>>> is not the leader for that topic-partition. >>>>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>>>> server >>>>> is not the leader for that topic-partition. >>>>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>>>> server >>>>> is not the leader for that topic-partition. >>>>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>>>> server >>>>> is not the leader for that topic-partition. >>>>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>>>> server >>>>> is not the leader for that topic-partition. >>>>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>>>> server >>>>> is not the leader for that topic-partition. >>>>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>>>> server >>>>> is not the leader for that topic-partition. >>>>> 141292 records sent, 28258.4 records/sec (80.85 MB/sec), 551.2 ms avg >>>>> latency, 1494.0 max latency. >>>>> 142526 records sent, 28505.2 records/sec (81.55 MB/sec), 580.8 ms avg >>>>> latency, 1513.0 max latency. >>>>> 146564 records sent, 29312.8 records/sec (83.86 MB/sec), 557.9 ms avg >>>>> latency, 1431.0 max latency. >>>>> 146755 records sent, 29351.0 records/sec (83.97 MB/sec), 556.7 ms avg >>>>> latency, 1480.0 max latency. >>>>> 147963 records sent, 29592.6 records/sec (84.67 MB/sec), 556.7 ms avg >>>>> latency, 1546.0 max latency. >>>>> 146931 records sent, 29386.2 records/sec (84.07 MB/sec), 550.9 ms avg >>>>> latency, 1715.0 max latency. >>>>> 146947 records sent, 29389.4 records/sec (84.08 MB/sec), 555.1 ms avg >>>>> latency, 1750.0 max latency. >>>>> 146422 records sent, 29284.4 records/sec (83.78 MB/sec), 557.9 ms avg >>>>> latency, 1818.0 max latency. >>>>> 147516 records sent, 29503.2 records/sec (84.41 MB/sec), 555.6 ms avg >>>>> latency, 1806.0 max latency. >>>>> 147877 records sent, 29575.4 records/sec (84.62 MB/sec), 552.1 ms avg >>>>> latency, 1821.0 max latency. >>>>> 147201 records sent, 29440.2 records/sec (84.23 MB/sec), 554.5 ms avg >>>>> latency, 1826.0 max latency. >>>>> 148317 records sent, 29663.4 records/sec (84.87 MB/sec), 558.1 ms avg >>>>> latency, 1792.0 max latency. >>>>> 147756 records sent, 29551.2 records/sec (84.55 MB/sec), 550.9 ms avg >>>>> latency, 1806.0 max latency >>>>> >>>>> then back into correct process state, is that because rebalance? >>>>> >>>>> thanks >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Alec Li >>>>> >>>>> >>>>> >>>> > -- Alec Li