Yes, it is weird hostname, ;), that is what our system guys name it. How to take a note to measure the connections open to 10.100.98.102?
Thanks AL On Jan 7, 2015 9:42 PM, "Jaikiran Pai" <jai.forums2...@gmail.com> wrote: > On Thursday 08 January 2015 01:51 AM, Sa Li wrote: > >> see this type of error again, back to normal in few secs >> >> [2015-01-07 20:19:49,744] WARN Error in I/O with harmful-jar.master/ >> 10.100.98.102 >> > > That's a really weird hostname, the "harmful-jar.master". Is that really > your hostname? You mention that this happens during performance testing. > Have you taken a note of how many connection are open to that 10.100.98.102 > IP when this "Connection refused" exception happens? > > -Jaikiran > > > (org.apache.kafka.common.network.Selector) >> java.net.ConnectException: Connection refused >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> at >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) >> at org.apache.kafka.common.network.Selector.poll( >> Selector.java:232) >> at >> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:191) >> at >> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184) >> at >> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115) >> at java.lang.Thread.run(Thread.java:745) >> [2015-01-07 20:19:49,754] WARN Error in I/O with harmful-jar.master/ >> 10.100.98.102 (org.apache.kafka.common.network.Selector) >> java.net.ConnectException: Connection refused >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> at >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) >> at org.apache.kafka.common.network.Selector.poll( >> Selector.java:232) >> at >> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:191) >> at >> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184) >> at >> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115) >> at java.lang.Thread.run(Thread.java:745) >> [2015-01-07 20:19:49,764] WARN Error in I/O with harmful-jar.master/ >> 10.100.98.102 (org.apache.kafka.common.network.Selector) >> java.net.ConnectException: Connection refused >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> at >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) >> at org.apache.kafka.common.network.Selector.poll( >> Selector.java:232) >> at >> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:191) >> at >> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:184) >> at >> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115) >> at java.lang.Thread.run(Thread.java:745) >> 160403 records sent, 32080.6 records/sec (91.78 MB/sec), 507.0 ms avg >> latency, 2418.0 max latency. >> 109882 records sent, 21976.4 records/sec (62.87 MB/sec), 672.7 ms avg >> latency, 3529.0 max latency. >> 100315 records sent, 19995.0 records/sec (57.21 MB/sec), 774.8 ms avg >> latency, 3858.0 max latency. >> >> On Wed, Jan 7, 2015 at 12:07 PM, Sa Li <sal...@gmail.com> wrote: >> >> Hi, All >>> >>> I am doing performance test by >>> >>> bin/kafka-run-class.sh org.apache.kafka.clients. >>> tools.ProducerPerformance >>> test-rep-three 500000000 100 -1 acks=1 bootstrap.servers= >>> 10.100.98.100:9092,10.100.98.101:9092,10.100.98.102:9092 >>> buffer.memory=67108864 batch.size=8196 >>> >>> where the topic test-rep-three is described as follow: >>> >>> bin/kafka-topics.sh --describe --zookeeper 10.100.98.101:2181 --topic >>> test-rep-three >>> Topic:test-rep-three PartitionCount:8 ReplicationFactor:3 >>> Configs: >>> Topic: test-rep-three Partition: 0 Leader: 100 >>> Replicas: >>> 100,102,101 Isr: 102,101,100 >>> Topic: test-rep-three Partition: 1 Leader: 101 >>> Replicas: >>> 101,100,102 Isr: 102,101,100 >>> Topic: test-rep-three Partition: 2 Leader: 102 >>> Replicas: >>> 102,101,100 Isr: 101,102,100 >>> Topic: test-rep-three Partition: 3 Leader: 100 >>> Replicas: >>> 100,101,102 Isr: 101,100,102 >>> Topic: test-rep-three Partition: 4 Leader: 101 >>> Replicas: >>> 101,102,100 Isr: 102,100,101 >>> Topic: test-rep-three Partition: 5 Leader: 102 >>> Replicas: >>> 102,100,101 Isr: 100,102,101 >>> Topic: test-rep-three Partition: 6 Leader: 102 >>> Replicas: >>> 100,102,101 Isr: 102,101,100 >>> Topic: test-rep-three Partition: 7 Leader: 101 >>> Replicas: >>> 101,100,102 Isr: 101,100,102 >>> >>> Apparently, it produces the messages and run for a while, but it >>> periodically have such exceptions: >>> >>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>> server >>> is not the leader for that topic-partition. >>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>> server >>> is not the leader for that topic-partition. >>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>> server >>> is not the leader for that topic-partition. >>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>> server >>> is not the leader for that topic-partition. >>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>> server >>> is not the leader for that topic-partition. >>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>> server >>> is not the leader for that topic-partition. >>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>> server >>> is not the leader for that topic-partition. >>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>> server >>> is not the leader for that topic-partition. >>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>> server >>> is not the leader for that topic-partition. >>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>> server >>> is not the leader for that topic-partition. >>> org.apache.kafka.common.errors.NotLeaderForPartitionException: This >>> server >>> is not the leader for that topic-partition. >>> 141292 records sent, 28258.4 records/sec (80.85 MB/sec), 551.2 ms avg >>> latency, 1494.0 max latency. >>> 142526 records sent, 28505.2 records/sec (81.55 MB/sec), 580.8 ms avg >>> latency, 1513.0 max latency. >>> 146564 records sent, 29312.8 records/sec (83.86 MB/sec), 557.9 ms avg >>> latency, 1431.0 max latency. >>> 146755 records sent, 29351.0 records/sec (83.97 MB/sec), 556.7 ms avg >>> latency, 1480.0 max latency. >>> 147963 records sent, 29592.6 records/sec (84.67 MB/sec), 556.7 ms avg >>> latency, 1546.0 max latency. >>> 146931 records sent, 29386.2 records/sec (84.07 MB/sec), 550.9 ms avg >>> latency, 1715.0 max latency. >>> 146947 records sent, 29389.4 records/sec (84.08 MB/sec), 555.1 ms avg >>> latency, 1750.0 max latency. >>> 146422 records sent, 29284.4 records/sec (83.78 MB/sec), 557.9 ms avg >>> latency, 1818.0 max latency. >>> 147516 records sent, 29503.2 records/sec (84.41 MB/sec), 555.6 ms avg >>> latency, 1806.0 max latency. >>> 147877 records sent, 29575.4 records/sec (84.62 MB/sec), 552.1 ms avg >>> latency, 1821.0 max latency. >>> 147201 records sent, 29440.2 records/sec (84.23 MB/sec), 554.5 ms avg >>> latency, 1826.0 max latency. >>> 148317 records sent, 29663.4 records/sec (84.87 MB/sec), 558.1 ms avg >>> latency, 1792.0 max latency. >>> 147756 records sent, 29551.2 records/sec (84.55 MB/sec), 550.9 ms avg >>> latency, 1806.0 max latency >>> >>> then back into correct process state, is that because rebalance? >>> >>> thanks >>> >>> >>> >>> -- >>> >>> Alec Li >>> >>> >> >> >