[jira] [Commented] (KAFKA-5584) Incorrect log size for topics larger than 2 GB
[ https://issues.apache.org/jira/browse/KAFKA-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086986#comment-16086986 ] Gregor Uhlenheuer commented on KAFKA-5584: -- It seems the bug was introduced with commit {{e71dce89c0da50f3eccc47d0fc050c92d5a99b88}} I think. > Incorrect log size for topics larger than 2 GB > -- > > Key: KAFKA-5584 > URL: https://issues.apache.org/jira/browse/KAFKA-5584 > Project: Kafka > Issue Type: Bug > Components: log >Reporter: Gregor Uhlenheuer >Priority: Critical > Labels: reliability > Fix For: 0.11.0.1 > > Attachments: Screen Shot 2017-07-12 at 09.10.53.png > > > The {{size}} of a {{Log}} is calculated incorrectly due to an Integer > overflow. For large topics (larger than 2 GB) this value overflows. > This is easily observable in the reported metrics values of the path > {{log.Log.partition.*.topic..Size}} (see attached screenshot). > Moreover I think this breaks the size-based retention (via > {{log.retention.bytes}} and {{retention.bytes}}) of large topics as well. > I am not sure on the recommended workflow, should I open a pull request on > github with a fix? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5591) Infinite loop during failed Handshake
Marcin Łuczyński created KAFKA-5591: --- Summary: Infinite loop during failed Handshake Key: KAFKA-5591 URL: https://issues.apache.org/jira/browse/KAFKA-5591 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 0.11.0.0 Environment: Linux x86_64 x86_64 x86_64 GNU/Linux Reporter: Marcin Łuczyński Attachments: client.truststore.jks For testing purposes of a connection from my client app to my secured Kafka broker (via SSL) I followed preparation procedure described in this section [http://kafka.apache.org/090/documentation.html#security_ssl]. There is a flow there in description of certificates generation. I was able to find a proper sequence of generation of certs and keys on Confluent.io [https://www.confluent.io/blog/apache-kafka-security-authorization-authentication-encryption/], but before that, when I used the first trust store I generated, it caused handshake exception as shown below: {quote}[2017-07-14 05:24:48,958] DEBUG Accepted connection from /10.20.40.20:55609 on /10.20.40.12:9093 and assigned it to processor 3, sendBufferSize [actual|requested]: [102400|102400] recvBufferSize [actual|requested]: [102400|102400] (kafka.network.Acceptor) [2017-07-14 05:24:48,959] DEBUG Processor 3 listening to new connection from /10.20.40.20:55609 (kafka.network.Processor) [2017-07-14 05:24:48,971] DEBUG SSLEngine.closeInBound() raised an exception. (org.apache.kafka.common.network.SslTransportLayer) javax.net.ssl.SSLException: Inbound closed before receiving peer's close_notify: possible truncation attack? at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1666) at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1634) at sun.security.ssl.SSLEngineImpl.closeInbound(SSLEngineImpl.java:1561) at org.apache.kafka.common.network.SslTransportLayer.handshakeFailure(SslTransportLayer.java:730) at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:313) at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:74) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:374) at org.apache.kafka.common.network.Selector.poll(Selector.java:326) at kafka.network.Processor.poll(SocketServer.scala:499) at kafka.network.Processor.run(SocketServer.scala:435) at java.lang.Thread.run(Thread.java:748) [2017-07-14 05:24:48,971] DEBUG Connection with /10.20.40.20 disconnected (org.apache.kafka.common.network.Selector) javax.net.ssl.SSLProtocolException: Handshake message sequence violation, state = 1, type = 1 at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1487) at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535) at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:813) at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) at org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:411) at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:269) at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:74) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:374) at org.apache.kafka.common.network.Selector.poll(Selector.java:326) at kafka.network.Processor.poll(SocketServer.scala:499) at kafka.network.Processor.run(SocketServer.scala:435) at java.lang.Thread.run(Thread.java:748) Caused by: javax.net.ssl.SSLProtocolException: Handshake message sequence violation, state = 1, type = 1 at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:213) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026) at sun.security.ssl.Handshaker$1.run(Handshaker.java:966) at sun.security.ssl.Handshaker$1.run(Handshaker.java:963) at java.security.AccessController.doPrivileged(Native Method) at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1416) at org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:335) at org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:416) ... 7 more {quote} Which is ok obviously for the broken trust store case. However my client app did not receive any exception or error message back. It did not stop either. Instead it fell into a infinite loop of re-tries, generating huge log with exceptions as shown above. I tried to check if there is any client app property that controls the number of re-attempt
[jira] [Commented] (KAFKA-5587) Processor got uncaught exception: NullPointerException
[ https://issues.apache.org/jira/browse/KAFKA-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087123#comment-16087123 ] ASF GitHub Bot commented on KAFKA-5587: --- GitHub user rajinisivaram opened a pull request: https://github.com/apache/kafka/pull/3526 KAFKA-5587: Remove channel only after staged receives are delivered When idle connections are closed, ensure that channels with staged receives are retained in `closingChannels` until all staged receives are completed. Also ensure that only one staged receive is completed in each poll, even when channels are closed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rajinisivaram/kafka KAFKA-5587 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3526.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3526 commit d6848a3b1dcf9cc73caf06b68bdc89570c396c81 Author: Rajini Sivaram Date: 2017-07-14T08:36:22Z KAFKA-5587: Remove channel only after staged receives are delivered > Processor got uncaught exception: NullPointerException > -- > > Key: KAFKA-5587 > URL: https://issues.apache.org/jira/browse/KAFKA-5587 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.1 >Reporter: Dan >Assignee: Rajini Sivaram > > [2017-07-12 21:56:39,964] ERROR Processor got uncaught exception. > (kafka.network.Processor) > java.lang.NullPointerException > at > kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:490) > at > kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:487) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > kafka.network.Processor.processCompletedReceives(SocketServer.scala:487) > at kafka.network.Processor.run(SocketServer.scala:417) > at java.lang.Thread.run(Thread.java:745) > Anyone knows the cause of this exception? What's the effect of it? > When this exception occurred, the log also showed that the broker was > frequently shrinking ISR to itself. Are these two things interrelated? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception
[ https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087136#comment-16087136 ] Rajini Sivaram commented on KAFKA-4669: --- I have submitted a PR for one scenario that can cause this behaviour under KAFKA-5587. Couldn't think of any other cases, but it is hard to be sure that there aren't others. > KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws > exception > - > > Key: KAFKA-4669 > URL: https://issues.apache.org/jira/browse/KAFKA-4669 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.1 >Reporter: Cheng Ju >Priority: Critical > Labels: reliability > Fix For: 0.11.0.1 > > > There is no try catch in NetworkClient.handleCompletedReceives. If an > exception is thrown after inFlightRequests.completeNext(source), then the > corresponding RecordBatch's done will never get called, and > KafkaProducer.flush will hang on this RecordBatch. > I've checked 0.10 code and think this bug does exist in 0.10 versions. > A real case. First a correlateId not match exception happens: > 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] > (org.apache.kafka.clients.producer.internals.Sender.run:130) - Uncaught > error in kafka producer I/O thread: > java.lang.IllegalStateException: Correlation id for response (703766) does > not match request (703764) > at > org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > at java.lang.Thread.run(Thread.java:745) > Then jstack shows the thread is hanging on: > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425) > at > org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544) > at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145) > at java.lang.Thread.run(Thread.java:745) > client code -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5592) Connection with plain client to SSL-secured broker causes OOM
Marcin Łuczyński created KAFKA-5592: --- Summary: Connection with plain client to SSL-secured broker causes OOM Key: KAFKA-5592 URL: https://issues.apache.org/jira/browse/KAFKA-5592 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 0.11.0.0 Environment: Linux x86_64 x86_64 x86_64 GNU/Linux Reporter: Marcin Łuczyński Attachments: heapdump.20170713.100129.14207.0002.phd, Heap.PNG, javacore.20170713.100129.14207.0003.txt, Snap.20170713.100129.14207.0004.trc, Stack.PNG While testing connection with client app that does not have configured truststore with a Kafka broker secured by SSL, my JVM crashes with OutOfMemoryError. I saw it mixed with StackOverfowError. I attach dump files. The stack trace to start with is here: {quote}at java/nio/HeapByteBuffer. (HeapByteBuffer.java:57) at java/nio/ByteBuffer.allocate(ByteBuffer.java:331) at org/apache/kafka/common/network/NetworkReceive.readFromReadableChannel(NetworkReceive.java:93) at org/apache/kafka/common/network/NetworkReceive.readFrom(NetworkReceive.java:71) at org/apache/kafka/common/network/KafkaChannel.receive(KafkaChannel.java:169) at org/apache/kafka/common/network/KafkaChannel.read(KafkaChannel.java:150) at org/apache/kafka/common/network/Selector.pollSelectionKeys(Selector.java:355) at org/apache/kafka/common/network/Selector.poll(Selector.java:303) at org/apache/kafka/clients/NetworkClient.poll(NetworkClient.java:349) at org/apache/kafka/clients/consumer/internals/ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226) at org/apache/kafka/clients/consumer/internals/ConsumerNetworkClient.poll(ConsumerNetworkClient.java:188) at org/apache/kafka/clients/consumer/internals/AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:207) at org/apache/kafka/clients/consumer/internals/AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:193) at org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.poll(ConsumerCoordinator.java:279) at org/apache/kafka/clients/consumer/KafkaConsumer.pollOnce(KafkaConsumer.java:1029) at org/apache/kafka/clients/consumer/KafkaConsumer.poll(KafkaConsumer.java:995) at com/ibm/is/cc/kafka/runtime/KafkaConsumerProcessor.process(KafkaConsumerProcessor.java:237) at com/ibm/is/cc/kafka/runtime/KafkaProcessor.process(KafkaProcessor.java:173) at com/ibm/is/cc/javastage/connector/CC_JavaAdapter.run(CC_JavaAdapter.java:443){quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5584) Incorrect log size for topics larger than 2 GB
[ https://issues.apache.org/jira/browse/KAFKA-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-5584: --- Affects Version/s: 0.11.0.0 > Incorrect log size for topics larger than 2 GB > -- > > Key: KAFKA-5584 > URL: https://issues.apache.org/jira/browse/KAFKA-5584 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.11.0.0 >Reporter: Gregor Uhlenheuer >Priority: Critical > Labels: reliability > Fix For: 0.11.0.1 > > Attachments: Screen Shot 2017-07-12 at 09.10.53.png > > > The {{size}} of a {{Log}} is calculated incorrectly due to an Integer > overflow. For large topics (larger than 2 GB) this value overflows. > This is easily observable in the reported metrics values of the path > {{log.Log.partition.*.topic..Size}} (see attached screenshot). > Moreover I think this breaks the size-based retention (via > {{log.retention.bytes}} and {{retention.bytes}}) of large topics as well. > I am not sure on the recommended workflow, should I open a pull request on > github with a fix? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5584) Incorrect log size for topics larger than 2 GB
[ https://issues.apache.org/jira/browse/KAFKA-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087170#comment-16087170 ] Ismael Juma commented on KAFKA-5584: Thanks for tracking it down [~kongo2002], that looks right to me. > Incorrect log size for topics larger than 2 GB > -- > > Key: KAFKA-5584 > URL: https://issues.apache.org/jira/browse/KAFKA-5584 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.11.0.0 >Reporter: Gregor Uhlenheuer >Priority: Critical > Labels: reliability > Fix For: 0.11.0.1 > > Attachments: Screen Shot 2017-07-12 at 09.10.53.png > > > The {{size}} of a {{Log}} is calculated incorrectly due to an Integer > overflow. For large topics (larger than 2 GB) this value overflows. > This is easily observable in the reported metrics values of the path > {{log.Log.partition.*.topic..Size}} (see attached screenshot). > Moreover I think this breaks the size-based retention (via > {{log.retention.bytes}} and {{retention.bytes}}) of large topics as well. > I am not sure on the recommended workflow, should I open a pull request on > github with a fix? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5545) Kafka Stream not able to successfully restart over new broker ip
[ https://issues.apache.org/jira/browse/KAFKA-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087204#comment-16087204 ] Yogesh BG commented on KAFKA-5545: -- Any updates on this issue? > Kafka Stream not able to successfully restart over new broker ip > > > Key: KAFKA-5545 > URL: https://issues.apache.org/jira/browse/KAFKA-5545 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.2.1 >Reporter: Yogesh BG >Priority: Critical > Attachments: kafkastreams.log > > > Hi > I have one kafka broker and one kafka stream application > initially kafka stream connected and starts processing data. Then i restart > the broker. When broker restarts new ip will be assigned. > In kafka stream i have a 5min interval thread which checks if broker ip > changed and if changed, we cleanup the stream, rebuild topology(tried with > reusing topology) and start the stream again. I end up with the following > exceptions. > 11:04:08.032 [StreamThread-38] INFO o.a.k.s.p.internals.StreamThread - > stream-thread [StreamThread-38] Creating active task 0_5 with assigned > partitions [PR-5] > 11:04:08.033 [StreamThread-41] INFO o.a.k.s.p.internals.StreamThread - > stream-thread [StreamThread-41] Creating active task 0_1 with assigned > partitions [PR-1] > 11:04:08.036 [StreamThread-34] INFO o.a.k.s.p.internals.StreamThread - > stream-thread [StreamThread-34] Creating active task 0_7 with assigned > partitions [PR-7] > 11:04:08.036 [StreamThread-37] INFO o.a.k.s.p.internals.StreamThread - > stream-thread [StreamThread-37] Creating active task 0_3 with assigned > partitions [PR-3] > 11:04:08.036 [StreamThread-45] INFO o.a.k.s.p.internals.StreamThread - > stream-thread [StreamThread-45] Creating active task 0_0 with assigned > partitions [PR-0] > 11:04:08.037 [StreamThread-36] INFO o.a.k.s.p.internals.StreamThread - > stream-thread [StreamThread-36] Creating active task 0_4 with assigned > partitions [PR-4] > 11:04:08.037 [StreamThread-43] INFO o.a.k.s.p.internals.StreamThread - > stream-thread [StreamThread-43] Creating active task 0_6 with assigned > partitions [PR-6] > 11:04:08.038 [StreamThread-48] INFO o.a.k.s.p.internals.StreamThread - > stream-thread [StreamThread-48] Creating active task 0_2 with assigned > partitions [PR-2] > 11:04:09.034 [StreamThread-38] WARN o.a.k.s.p.internals.StreamThread - Could > not create task 0_5. Will retry. > org.apache.kafka.streams.errors.LockException: task [0_5] Failed to lock the > state directory for task 0_5 > at > org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:100) > ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:73) > ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:108) > ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:864) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1237) > ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1210) > ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:967) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamThread.access$600(StreamThread.java:69) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:234) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:259) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:352) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303) > [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na] > at > org.apache.kafka.clients.consumer.internal
[jira] [Created] (KAFKA-5593) Kafka streams not re-balancing when 3 consumer streams are there
Yogesh BG created KAFKA-5593: Summary: Kafka streams not re-balancing when 3 consumer streams are there Key: KAFKA-5593 URL: https://issues.apache.org/jira/browse/KAFKA-5593 Project: Kafka Issue Type: Bug Components: streams Reporter: Yogesh BG Priority: Critical Attachments: log1.txt, log2.txt, log3.txt I have 3 broker nodes, 3 kafka streams I observe that all 3 consumer streams are part of the group named rtp-kafkastreams. but when i see the data is processed only by one node. DEBUG n.a.a.k.a.AccessLogMetricEnrichmentProcessor - AccessLogMetricEnrichmentProcessor.process when i do check the partition information shared by each of them i see first node has all partitions like all 8. but in other streams the folder is empty. [root@ip-172-31-11-139 ~]# ls /data/kstreams/rtp-kafkastreams 0_0 0_1 0_2 0_3 0_4 0_5 0_6 0_7 and this folder is empty I tried restarting the other two consumer streams still they won't become the part of the group and re-balance. I have attached the logs. Configurations are inside the log file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5594) Tag messages with authenticated user who produced it
Jack Andrews created KAFKA-5594: --- Summary: Tag messages with authenticated user who produced it Key: KAFKA-5594 URL: https://issues.apache.org/jira/browse/KAFKA-5594 Project: Kafka Issue Type: Wish Reporter: Jack Andrews I see that Kafka now supports custom headers through KIP-82. Would it be possible to hook this up authorization such that the authenticated user is added as a header? Thank you. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5127) Replace pattern matching with foreach where the case None is unused
[ https://issues.apache.org/jira/browse/KAFKA-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087592#comment-16087592 ] ASF GitHub Bot commented on KAFKA-5127: --- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/2919 > Replace pattern matching with foreach where the case None is unused > > > Key: KAFKA-5127 > URL: https://issues.apache.org/jira/browse/KAFKA-5127 > Project: Kafka > Issue Type: Bug > Components: core >Reporter: Balint Molnar >Assignee: Balint Molnar >Priority: Minor > Fix For: 0.11.1.0 > > > There are various place where pattern matching is used with matching only for > one thing and ignoring the None type, this can be replaced with foreach. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception
[ https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087695#comment-16087695 ] Ismael Juma commented on KAFKA-4669: We should probably close this when the PR for KAFKA-5587 is merged. People can always reopen if it happens again. > KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws > exception > - > > Key: KAFKA-4669 > URL: https://issues.apache.org/jira/browse/KAFKA-4669 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.1 >Reporter: Cheng Ju >Priority: Critical > Labels: reliability > Fix For: 0.11.0.1 > > > There is no try catch in NetworkClient.handleCompletedReceives. If an > exception is thrown after inFlightRequests.completeNext(source), then the > corresponding RecordBatch's done will never get called, and > KafkaProducer.flush will hang on this RecordBatch. > I've checked 0.10 code and think this bug does exist in 0.10 versions. > A real case. First a correlateId not match exception happens: > 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] > (org.apache.kafka.clients.producer.internals.Sender.run:130) - Uncaught > error in kafka producer I/O thread: > java.lang.IllegalStateException: Correlation id for response (703766) does > not match request (703764) > at > org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > at java.lang.Thread.run(Thread.java:745) > Then jstack shows the thread is hanging on: > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425) > at > org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544) > at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145) > at java.lang.Thread.run(Thread.java:745) > client code -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5595) Illegal state in SocketServer; attempt to send with another send in progress
Jason Gustafson created KAFKA-5595: -- Summary: Illegal state in SocketServer; attempt to send with another send in progress Key: KAFKA-5595 URL: https://issues.apache.org/jira/browse/KAFKA-5595 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson I have seen this a couple times, but I'm not sure the conditions associated with it. {code} java.lang.IllegalStateException: Attempt to begin a send operation with prior send operation still in progress. at org.apache.kafka.common.network.KafkaChannel.setSend(KafkaChannel.java:138) at org.apache.kafka.common.network.Selector.send(Selector.java:248) at kafka.network.Processor.sendResponse(SocketServer.scala:488) at kafka.network.Processor.processNewResponses(SocketServer.scala:466) at kafka.network.Processor.run(SocketServer.scala:431) at java.lang.Thread.run(Thread.java:748) {code} Prior to this event, I see a lot of this message in the logs (always for the same connection id): {code} Attempting to send response via channel for which there is no open connection, connection id 7 {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5595) Illegal state in SocketServer; attempt to send with another send in progress
[ https://issues.apache.org/jira/browse/KAFKA-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087745#comment-16087745 ] Jason Gustafson commented on KAFKA-5595: cc [~rsivaram] In case you have any ideas. > Illegal state in SocketServer; attempt to send with another send in progress > > > Key: KAFKA-5595 > URL: https://issues.apache.org/jira/browse/KAFKA-5595 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson > > I have seen this a couple times, but I'm not sure the conditions associated > with it. > {code} > java.lang.IllegalStateException: Attempt to begin a send operation with prior > send operation still in progress. > at > org.apache.kafka.common.network.KafkaChannel.setSend(KafkaChannel.java:138) > at org.apache.kafka.common.network.Selector.send(Selector.java:248) > at kafka.network.Processor.sendResponse(SocketServer.scala:488) > at kafka.network.Processor.processNewResponses(SocketServer.scala:466) > at kafka.network.Processor.run(SocketServer.scala:431) > at java.lang.Thread.run(Thread.java:748) > {code} > Prior to this event, I see a lot of this message in the logs (always for the > same connection id): > {code} > Attempting to send response via channel for which there is no open > connection, connection id 7 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5595) Illegal state in SocketServer; attempt to send with another send in progress
[ https://issues.apache.org/jira/browse/KAFKA-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087749#comment-16087749 ] Ismael Juma commented on KAFKA-5595: I haven't checked this in detail, but a possibility: 1. There is an inflight response for client c 2. There is a disconnection and reconnection from client c causing us to lose the channel state 3. Client c sends another request 4. Because we lost the channel state and we don't drain the response queue on disconnections, we may allow 2 inflight responses for the same connection 5. Under the right set of circumstances, this could lead to the IllegalStateException reported Am I missing something? > Illegal state in SocketServer; attempt to send with another send in progress > > > Key: KAFKA-5595 > URL: https://issues.apache.org/jira/browse/KAFKA-5595 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson > > I have seen this a couple times, but I'm not sure the conditions associated > with it. > {code} > java.lang.IllegalStateException: Attempt to begin a send operation with prior > send operation still in progress. > at > org.apache.kafka.common.network.KafkaChannel.setSend(KafkaChannel.java:138) > at org.apache.kafka.common.network.Selector.send(Selector.java:248) > at kafka.network.Processor.sendResponse(SocketServer.scala:488) > at kafka.network.Processor.processNewResponses(SocketServer.scala:466) > at kafka.network.Processor.run(SocketServer.scala:431) > at java.lang.Thread.run(Thread.java:748) > {code} > Prior to this event, I see a lot of this message in the logs (always for the > same connection id): > {code} > Attempting to send response via channel for which there is no open > connection, connection id 7 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5595) Illegal state in SocketServer; attempt to send with another send in progress
[ https://issues.apache.org/jira/browse/KAFKA-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087916#comment-16087916 ] Rajini Sivaram commented on KAFKA-5595: --- Connection id contains remote port, it is currently logging processor id, hence the integer value 7 in the logs. I have submitted a PR to fix that. [~ijuma] Since connection id contains remote port, for the scenario you described, the port needs to get reused. Typically that shouldn't happen while still processing requests of an older connection using that port. The channel should have been closed and removed from selector before the new connection. > Illegal state in SocketServer; attempt to send with another send in progress > > > Key: KAFKA-5595 > URL: https://issues.apache.org/jira/browse/KAFKA-5595 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson > > I have seen this a couple times, but I'm not sure the conditions associated > with it. > {code} > java.lang.IllegalStateException: Attempt to begin a send operation with prior > send operation still in progress. > at > org.apache.kafka.common.network.KafkaChannel.setSend(KafkaChannel.java:138) > at org.apache.kafka.common.network.Selector.send(Selector.java:248) > at kafka.network.Processor.sendResponse(SocketServer.scala:488) > at kafka.network.Processor.processNewResponses(SocketServer.scala:466) > at kafka.network.Processor.run(SocketServer.scala:431) > at java.lang.Thread.run(Thread.java:748) > {code} > Prior to this event, I see a lot of this message in the logs (always for the > same connection id): > {code} > Attempting to send response via channel for which there is no open > connection, connection id 7 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-5595) Illegal state in SocketServer; attempt to send with another send in progress
[ https://issues.apache.org/jira/browse/KAFKA-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087916#comment-16087916 ] Rajini Sivaram edited comment on KAFKA-5595 at 7/14/17 7:56 PM: Connection id contains remote port, it is currently logging processor id, hence the integer value 7 in the logs. I have submitted a PR to fix that. [~ijuma] Since connection id contains remote port, for the scenario you described, the port needs to get reused. Typically that shouldn't happen while still processing requests of an older connection using that port. In theory, I suppose it could happen if there lots of connections being created and closed. was (Author: rsivaram): Connection id contains remote port, it is currently logging processor id, hence the integer value 7 in the logs. I have submitted a PR to fix that. [~ijuma] Since connection id contains remote port, for the scenario you described, the port needs to get reused. Typically that shouldn't happen while still processing requests of an older connection using that port. The channel should have been closed and removed from selector before the new connection. > Illegal state in SocketServer; attempt to send with another send in progress > > > Key: KAFKA-5595 > URL: https://issues.apache.org/jira/browse/KAFKA-5595 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson > > I have seen this a couple times, but I'm not sure the conditions associated > with it. > {code} > java.lang.IllegalStateException: Attempt to begin a send operation with prior > send operation still in progress. > at > org.apache.kafka.common.network.KafkaChannel.setSend(KafkaChannel.java:138) > at org.apache.kafka.common.network.Selector.send(Selector.java:248) > at kafka.network.Processor.sendResponse(SocketServer.scala:488) > at kafka.network.Processor.processNewResponses(SocketServer.scala:466) > at kafka.network.Processor.run(SocketServer.scala:431) > at java.lang.Thread.run(Thread.java:748) > {code} > Prior to this event, I see a lot of this message in the logs (always for the > same connection id): > {code} > Attempting to send response via channel for which there is no open > connection, connection id 7 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5595) Illegal state in SocketServer; attempt to send with another send in progress
[ https://issues.apache.org/jira/browse/KAFKA-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087999#comment-16087999 ] Ismael Juma commented on KAFKA-5595: [~rsivaram], this is a pretty rare case, we only saw two log entries for the ISE after a lot of log entries for closed channels. Also, the kernel tends to favouring reusing ports that have just been released (based on our experience with port conflicts in tests). So, I think it seems possible, although I am not sure. > Illegal state in SocketServer; attempt to send with another send in progress > > > Key: KAFKA-5595 > URL: https://issues.apache.org/jira/browse/KAFKA-5595 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson > > I have seen this a couple times, but I'm not sure the conditions associated > with it. > {code} > java.lang.IllegalStateException: Attempt to begin a send operation with prior > send operation still in progress. > at > org.apache.kafka.common.network.KafkaChannel.setSend(KafkaChannel.java:138) > at org.apache.kafka.common.network.Selector.send(Selector.java:248) > at kafka.network.Processor.sendResponse(SocketServer.scala:488) > at kafka.network.Processor.processNewResponses(SocketServer.scala:466) > at kafka.network.Processor.run(SocketServer.scala:431) > at java.lang.Thread.run(Thread.java:748) > {code} > Prior to this event, I see a lot of this message in the logs (always for the > same connection id): > {code} > Attempting to send response via channel for which there is no open > connection, connection id 7 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-5595) Illegal state in SocketServer; attempt to send with another send in progress
[ https://issues.apache.org/jira/browse/KAFKA-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087999#comment-16087999 ] Ismael Juma edited comment on KAFKA-5595 at 7/14/17 9:12 PM: - [~rsivaram], this is a pretty rare case, we only saw two log entries for the ISE after a lot of log entries for closed channels. Also, the kernel tends to favour reusing ports that have just been released (based on our experience with port conflicts in tests). So, I think it seems possible, although I am not sure. was (Author: ijuma): [~rsivaram], this is a pretty rare case, we only saw two log entries for the ISE after a lot of log entries for closed channels. Also, the kernel tends to favouring reusing ports that have just been released (based on our experience with port conflicts in tests). So, I think it seems possible, although I am not sure. > Illegal state in SocketServer; attempt to send with another send in progress > > > Key: KAFKA-5595 > URL: https://issues.apache.org/jira/browse/KAFKA-5595 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson > > I have seen this a couple times, but I'm not sure the conditions associated > with it. > {code} > java.lang.IllegalStateException: Attempt to begin a send operation with prior > send operation still in progress. > at > org.apache.kafka.common.network.KafkaChannel.setSend(KafkaChannel.java:138) > at org.apache.kafka.common.network.Selector.send(Selector.java:248) > at kafka.network.Processor.sendResponse(SocketServer.scala:488) > at kafka.network.Processor.processNewResponses(SocketServer.scala:466) > at kafka.network.Processor.run(SocketServer.scala:431) > at java.lang.Thread.run(Thread.java:748) > {code} > Prior to this event, I see a lot of this message in the logs (always for the > same connection id): > {code} > Attempting to send response via channel for which there is no open > connection, connection id 7 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5016) Consumer hang in poll method while rebalancing is in progress
[ https://issues.apache.org/jira/browse/KAFKA-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088285#comment-16088285 ] Geoffrey Stewart commented on KAFKA-5016: - I have also encountered the issue documented in this Jira using 0.10.2.0 brokers with the 0.10.2.0 client. This issue only occurs when we use the "subscribe" call from the API, which dynamically assigns partitions. When we use the "assign" call from the API, to manually assign lists of partitions, we do not have any issue. I don't think what is being described above represents the expected behavior of dynamic partition assignment and consumer group coordination. Based on the above explanation it sounds like it would not be possible to have 2 or more simultaneous consumer instances in the same consumer group when using dynamic partition assignment (subscribe). For example, there could be one consumer instance in the group which has made some calls to "poll". As soon as a second consumer instance comes along, it's call to "poll" is only processed after max.poll.interval.ms has elapsed since the first consumer's most recent poll request - at this time the broker will no longer consider that this first consumer is part of the group. I certainly agree that with the arrival of the second consumer to the group, the broker must perform a rebalance or restabilization which may take some time. However this should not take max.poll.interval.ms since the liveness of the first consumer should be maintained by it's heartbeat which occurs every heartbeat.interval.ms. I have confirmed that by using the default value for the property max.poll.interval.ms of 30, the group restabilization (rebalance) takes about this long (5mins) and then the second consumer instance's poll request is processed. Lowering this value to 3, has the effect of reducing the group restabilization (rebalance) to about 30 seconds before the second consumer instance's poll request is processed. To summarize, please explain how I can establish parallel consumer instances in the same group using the subscribe method from the API, which dynamically assigns partitions. Further, please help me to understand why the consumer instances heartbeat does not seem to be maintaining it's liveness. > Consumer hang in poll method while rebalancing is in progress > - > > Key: KAFKA-5016 > URL: https://issues.apache.org/jira/browse/KAFKA-5016 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0, 0.10.2.0 >Reporter: Domenico Di Giulio >Assignee: Vahid Hashemian > Attachments: Kafka 0.10.2.0 Issue (TRACE) - Server + Client.txt, > Kafka 0.10.2.0 Issue (TRACE).txt, KAFKA_5016.java > > > After moving to Kafka 0.10.2.0, it looks like I'm experiencing a hang in the > rebalancing code. > This is a test case, not (still) production code. It does the following with > a single-partition topic and two consumers in the same group: > 1) a topic with one partition is forced to be created (auto-created) > 2) a producer is used to write 10 messages > 3) the first consumer reads all the messages and commits > 4) the second consumer attempts a poll() and hangs indefinitely > The same issue can't be found with 0.10.0.0. > See the attached logs at TRACE level. Look for "SERVER HANGS" to see where > the hang is found: when this happens, the client keeps failing any hearbeat > attempt, as the rebalancing is in progress, and the poll method hangs > indefinitely. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5016) Consumer hang in poll method while rebalancing is in progress
[ https://issues.apache.org/jira/browse/KAFKA-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088288#comment-16088288 ] Domenico Di Giulio commented on KAFKA-5016: --- I am currently out of the office, with no access to my e-mail. I will be back at work on July 27. ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e non comportano alcun vincolo nè creano obblighi per la Banca stessa, salvo che ciò non sia espressamente previsto da un accordo scritto. Questa e-mail è confidenziale. Qualora l'avesse ricevuta per errore, La preghiamo di comunicarne via e-mail la ricezione al mittente e di distruggere il contenuto. La informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi allegati potrebbe costituire reato. Grazie per la collaborazione. -- E-mail from Bank of Italy are sent in good faith but they are neither binding on the Bank nor to be understood as creating any obligation on its part except where provided for in a written agreement. This e-mail is confidential. If you have received it by mistake, please inform the sender by reply e-mail and delete it from your system. Please also note that the unauthorized disclosure or use of the message or any attachments could be an offence. Thank you for your cooperation. ** > Consumer hang in poll method while rebalancing is in progress > - > > Key: KAFKA-5016 > URL: https://issues.apache.org/jira/browse/KAFKA-5016 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.0, 0.10.2.0 >Reporter: Domenico Di Giulio >Assignee: Vahid Hashemian > Attachments: Kafka 0.10.2.0 Issue (TRACE) - Server + Client.txt, > Kafka 0.10.2.0 Issue (TRACE).txt, KAFKA_5016.java > > > After moving to Kafka 0.10.2.0, it looks like I'm experiencing a hang in the > rebalancing code. > This is a test case, not (still) production code. It does the following with > a single-partition topic and two consumers in the same group: > 1) a topic with one partition is forced to be created (auto-created) > 2) a producer is used to write 10 messages > 3) the first consumer reads all the messages and commits > 4) the second consumer attempts a poll() and hangs indefinitely > The same issue can't be found with 0.10.0.0. > See the attached logs at TRACE level. Look for "SERVER HANGS" to see where > the hang is found: when this happens, the client keeps failing any hearbeat > attempt, as the rebalancing is in progress, and the poll method hangs > indefinitely. -- This message was sent by Atlassian JIRA (v6.4.14#64029)