Thanks a lot Guozhang. I was able to nail it down by looking at the log which you suggested. The log revealed that it was trying to connect to localhost and it was a problem with one of my sub component. It was trying to read the broker configuration from a different property file which didn't exist and defaulted to loopback address.
Thanks a lot for the help. Thanks, Tony On Wed, Feb 7, 2018 at 3:05 AM, Guozhang Wang <wangg...@gmail.com> wrote: > Hi Tony, > > Your Streams configs look good to me, and the additional streams log from > StreamThread are normal operational logs that do not related to the issue. > I suspect there is a network partition between your client to the broker > node, and to investigate which host this `node -1` is referring to (note > that -1 is actually a valid node id, read from the bootstrapped list), you > can look into the same log file and search for the following line: > > "Initiating connection to node.." > > and the find out the host:port of node with id -1 . > > Guozhang > > On Tue, Feb 6, 2018 at 8:16 AM, Tony John <tonyjohnant...@gmail.com> > wrote: > > > Hi Guozhang, > > > > Thanks for looking into this. Below are the stream config values. > > > > INFO 2018-02-02 08:33:25.708 [main] org.apache.kafka.streams. > > StreamsConfig > > - StreamsConfig values: > > application.id = cv-v1 > > application.server = > > bootstrap.servers = [172.31.10.35:9092, 172.31.14.8:9092] > > buffered.records.per.partition = 1000 > > cache.max.bytes.buffering = 104857600 > > client.id = I2 > > commit.interval.ms = 30000 > > connections.max.idle.ms = 540000 > > default.key.serde = class > > org.apache.kafka.common.serialization.Serdes$ByteArraySerde > > default.timestamp.extractor = class > > org.apache.kafka.streams.processor.FailOnInvalidTimestamp > > default.value.serde = class > > org.apache.kafka.common.serialization.Serdes$ByteArraySerde > > key.serde = null > > metadata.max.age.ms = 300000 > > metric.reporters = [] > > metrics.num.samples = 2 > > metrics.recording.level = DEBUG > > metrics.sample.window.ms = 30000 > > num.standby.replicas = 1 > > num.stream.threads = 1 > > partition.grouper = class > > org.apache.kafka.streams.processor.DefaultPartitionGrouper > > poll.ms = 100 > > processing.guarantee = at_least_once > > receive.buffer.bytes = 32768 > > reconnect.backoff.max.ms = 1000 > > reconnect.backoff.ms = 50 > > replication.factor = 2 > > request.timeout.ms = 40000 > > retry.backoff.ms = 100 > > rocksdb.config.setter = null > > security.protocol = PLAINTEXT > > send.buffer.bytes = 131072 > > state.cleanup.delay.ms = 600000 > > state.dir = /mnt/store/kafka-streams > > timestamp.extractor = null > > value.serde = null > > windowstore.changelog.additional.retention.ms = 86400000 > > zookeeper.connect = > > INFO 2018-02-02 08:33:25.870 [main] org.apache.kafka.streams. > > StreamsConfig > > - StreamsConfig values: > > application.id = pe-v1 > > application.server = > > bootstrap.servers = [172.31.10.35:9092, 172.31.14.8:9092] > > buffered.records.per.partition = 1000 > > cache.max.bytes.buffering = 2147483648 > > client.id = I2 > > commit.interval.ms = 30000 > > connections.max.idle.ms = 540000 > > default.key.serde = class > > org.apache.kafka.common.serialization.Serdes$ByteArraySerde > > default.timestamp.extractor = class > > org.apache.kafka.streams.processor.FailOnInvalidTimestamp > > default.value.serde = class > > org.apache.kafka.common.serialization.Serdes$ByteArraySerde > > key.serde = null > > metadata.max.age.ms = 300000 > > metric.reporters = [] > > metrics.num.samples = 2 > > metrics.recording.level = DEBUG > > metrics.sample.window.ms = 30000 > > num.standby.replicas = 1 > > num.stream.threads = 3 > > partition.grouper = class > > org.apache.kafka.streams.processor.DefaultPartitionGrouper > > poll.ms = 100 > > processing.guarantee = at_least_once > > receive.buffer.bytes = 32768 > > reconnect.backoff.max.ms = 1000 > > reconnect.backoff.ms = 50 > > replication.factor = 2 > > request.timeout.ms = 40000 > > retry.backoff.ms = 100 > > rocksdb.config.setter = null > > security.protocol = PLAINTEXT > > send.buffer.bytes = 131072 > > state.cleanup.delay.ms = 600000 > > state.dir = /mnt/store/kafka-streams > > timestamp.extractor = null > > value.serde = null > > windowstore.changelog.additional.retention.ms = 86400000 > > zookeeper.connect = > > > > Please note there are 2 streams application running. > > > > Apart from the Broker connectivity issue, I see the below logs as well. > > What does that indicate and does it have any impact on the processing? > > > > DEBUG 2018-02-05 11:06:07.369 [I2-StreamThread-5] > > org.apache.kafka.streams.processor.internals.StreamThread - > stream-thread > > [I2-StreamThread-5] processing latency 6694 < commit time 30000 for 10000 > > records. Adjusting up recordsProcessedBeforeCommit=44816 > > DEBUG 2018-02-05 11:06:07.518 [I2-StreamThread-3] > > org.apache.kafka.streams.processor.internals.StreamThread - > stream-thread > > [I2-StreamThread-3] processing latency 3684 < commit time 30000 for 10000 > > records. Adjusting up recordsProcessedBeforeCommit=81433 > > DEBUG 2018-02-05 11:06:07.562 [I2-StreamThread-4] > > org.apache.kafka.streams.processor.internals.StreamThread - > stream-thread > > [I2-StreamThread-4] processing latency 3620 < commit time 30000 for 10000 > > records. Adjusting up recordsProcessedBeforeCommit=82872 > > DEBUG 2018-02-05 11:06:09.838 [I2-StreamThread-7] > > org.apache.kafka.streams.processor.internals.StreamThread - > stream-thread > > [I2-StreamThread-7] processing latency 3843 < commit time 30000 for 10000 > > records. Adjusting up recordsProcessedBeforeCommit=78064 > > DEBUG 2018-02-05 11:06:10.073 [I2-StreamThread-3] > > org.apache.kafka.streams.processor.internals.StreamThread - > stream-thread > > [I2-StreamThread-3] processing latency 2552 < commit time 30000 for 7193 > > records. Adjusting up recordsProcessedBeforeCommit=84557 > > DEBUG 2018-02-05 11:06:10.209 [I2-StreamThread-4] > > org.apache.kafka.streams.processor.internals.StreamThread - > stream-thread > > [I2-StreamThread-4] processing latency 2645 < commit time 30000 for 7161 > > records. Adjusting up recordsProcessedBeforeCommit=81221 > > DEBUG 2018-02-05 11:06:11.940 [I2-StreamThread-5] > > org.apache.kafka.streams.processor.internals.StreamThread - > stream-thread > > [I2-StreamThread-5] processing latency 4568 < commit time 30000 for 7583 > > records. Adjusting up recordsProcessedBeforeCommit=49800 > > DEBUG 2018-02-05 11:06:12.538 [I2-StreamThread-7] > > org.apache.kafka.streams.processor.internals.StreamThread - > stream-thread > > [I2-StreamThread-7] processing latency 2698 < commit time 30000 for 6880 > > records. Adjusting up recordsProcessedBeforeCommit=76501 > > > > > > Thanks, > > Tony > > > > On Tue, Feb 6, 2018 at 3:21 AM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > > > Hello Tony, > > > > > > > > > Could you share your Streams config values so that people can help > > further > > > investigating your issue? > > > > > > > > > Guozhang > > > > > > > > > On Mon, Feb 5, 2018 at 12:00 AM, Tony John <tonyjohnant...@gmail.com> > > > wrote: > > > > > > > Hi All, > > > > > > > > I have been running a streams application for sometime. The > application > > > > runs fine for sometime but after a day or two I see the below log > > getting > > > > printed continuously on to the console. > > > > > > > > WARN 2018-02-05 02:50:04.060 [kafka-producer-network-thread | > > > producer-1] > > > > org.apache.kafka.clients.NetworkClient - Connection to node -1 could > > not > > > > be > > > > established. Broker may not be available. > > > > > > > > WARN 2018-02-05 02:50:04.160 [kafka-producer-network-thread | > > > producer-1] > > > > org.apache.kafka.clients.NetworkClient - Connection to node -1 could > > not > > > > be > > > > established. Broker may not be available. > > > > > > > > WARN 2018-02-05 02:50:04.261 [kafka-producer-network-thread | > > > producer-1] > > > > org.apache.kafka.clients.NetworkClient - Connection to node -1 could > > not > > > > be > > > > established. Broker may not be available. > > > > > > > > WARN 2018-02-05 02:50:04.311 [kafka-producer-network-thread | > > > producer-1] > > > > org.apache.kafka.clients.NetworkClient - Connection to node -1 could > > not > > > > be > > > > established. Broker may not be available. > > > > > > > > WARN 2018-02-05 02:50:04.361 [kafka-producer-network-thread | > > > producer-1] > > > > org.apache.kafka.clients.NetworkClient - Connection to node -1 could > > not > > > > be > > > > established. Broker may not be available. > > > > > > > > WARN 2018-02-05 02:50:04.411 [kafka-producer-network-thread | > > > producer-1] > > > > org.apache.kafka.clients.NetworkClient - Connection to node -1 could > > not > > > > be > > > > established. Broker may not be available. > > > > > > > > At this time, though the application is able to process the > messages, I > > > > could also see lag building up in the consumers and the processing > time > > > for > > > > a batch has increased 15 folds. > > > > > > > > I am using a single zoo-keeper instance with 2 brokers and 4 > > application > > > > instances. I checked the broker and zoo-keeper status, they are all > > > running > > > > fine as I could see. I have also verified the connectivity between > the > > > > application and broker instances using telnet and it seems intact. > The > > > > kafka broker and streams/client versions are 0.11.0.2. Results of > > broker > > > > status results from zoo-keeper below > > > > > > > > > > > > [root@app100 kafka]# echo dump | nc localhost 2181 > > > > > > > > SessionTracker dump: > > > > > > > > Session Sets (3): > > > > > > > > 0 expire at Mon Feb 05 06:16:39 UTC 2018: > > > > > > > > 1 expire at Mon Feb 05 06:16:42 UTC 2018: > > > > > > > > 0x161562860970001 > > > > > > > > 1 expire at Mon Feb 05 06:16:45 UTC 2018: > > > > > > > > 0x161562860970000 > > > > > > > > ephemeral nodes dump: > > > > > > > > Sessions with Ephemerals (2): > > > > > > > > 0x161562860970000: > > > > > > > > /brokers/ids/0 > > > > > > > > /controller > > > > > > > > 0x161562860970001: > > > > > > > > /brokers/ids/1 > > > > > > > > [root@app100 kafka]# ./kafka_2.11-0.11.0.2/bin/zookeeper-shell.sh > > > > localhost:2181 <<< "get /brokers/ids/0" > > > > > > > > Connecting to localhost:2181 > > > > > > > > Welcome to ZooKeeper! > > > > > > > > JLine support is disabled > > > > > > > > > > > > WATCHER:: > > > > > > > > > > > > WatchedEvent state:SyncConnected type:None path:null > > > > > > > > {"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"} > > > > ,"endpoints":["PLAINTEXT://172.31.10.35:9092"],"jmx_port": > > > > 55555,"host":"172.31.10.35","timestamp":"1517569007467"," > > > > port":9092,"version":4} > > > > > > > > cZxid = 0x1c > > > > > > > > ctime = Fri Feb 02 10:56:47 UTC 2018 > > > > > > > > mZxid = 0x1c > > > > > > > > mtime = Fri Feb 02 10:56:47 UTC 2018 > > > > > > > > pZxid = 0x1c > > > > > > > > cversion = 0 > > > > > > > > dataVersion = 0 > > > > > > > > aclVersion = 0 > > > > > > > > ephemeralOwner = 0x161562860970000 > > > > > > > > dataLength = 197 > > > > > > > > numChildren = 0 > > > > > > > > [root@app100 kafka]# ./kafka_2.11-0.11.0.2/bin/zookeeper-shell.sh > > > > localhost:2181 <<< "get /brokers/ids/1" > > > > > > > > Connecting to localhost:2181 > > > > > > > > Welcome to ZooKeeper! > > > > > > > > JLine support is disabled > > > > > > > > > > > > WATCHER:: > > > > > > > > > > > > WatchedEvent state:SyncConnected type:None path:null > > > > > > > > {"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"} > > > > ,"endpoints":["PLAINTEXT://172.31.14.8:9092"],"jmx_port": > > > > 55555,"host":"172.31.14.8","timestamp":"1517569016562"," > > > > port":9092,"version":4} > > > > > > > > cZxid = 0x21 > > > > > > > > ctime = Fri Feb 02 10:56:56 UTC 2018 > > > > > > > > mZxid = 0x21 > > > > > > > > mtime = Fri Feb 02 10:56:56 UTC 2018 > > > > > > > > pZxid = 0x21 > > > > > > > > cversion = 0 > > > > > > > > dataVersion = 0 > > > > > > > > aclVersion = 0 > > > > > > > > ephemeralOwner = 0x161562860970001 > > > > > > > > dataLength = 195 > > > > > > > > numChildren = 0 > > > > > > > > Could you please throw some light on this as to what could be going > > wrong > > > > here? > > > > > > > > Thanks, > > > > Tony > > > > > > > > > > > > > > > > -- > > > -- Guozhang > > > > > > > > > -- > -- Guozhang >