[jira] [Commented] (KAFKA-5584) Incorrect log size for topics larger than 2 GB

2017-07-14 Thread Gregor Uhlenheuer (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086986#comment-16086986
 ] 

Gregor Uhlenheuer commented on KAFKA-5584:
--

It seems the bug was introduced with commit 
{{e71dce89c0da50f3eccc47d0fc050c92d5a99b88}} I think.

> Incorrect log size for topics larger than 2 GB
> --
>
> Key: KAFKA-5584
> URL: https://issues.apache.org/jira/browse/KAFKA-5584
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Gregor Uhlenheuer
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.1
>
> Attachments: Screen Shot 2017-07-12 at 09.10.53.png
>
>
> The {{size}} of a {{Log}} is calculated incorrectly due to an Integer 
> overflow. For large topics (larger than 2 GB) this value overflows.
> This is easily observable in the reported metrics values of the path 
> {{log.Log.partition.*.topic..Size}} (see attached screenshot).
> Moreover I think this breaks the size-based retention (via 
> {{log.retention.bytes}} and {{retention.bytes}}) of large topics as well.
> I am not sure on the recommended workflow, should I open a pull request on 
> github with a fix?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5591) Infinite loop during failed Handshake

2017-07-14 Thread JIRA
Marcin Łuczyński created KAFKA-5591:
---

 Summary: Infinite loop during failed Handshake
 Key: KAFKA-5591
 URL: https://issues.apache.org/jira/browse/KAFKA-5591
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.11.0.0
 Environment: Linux x86_64 x86_64 x86_64 GNU/Linux
Reporter: Marcin Łuczyński
 Attachments: client.truststore.jks

For testing purposes of a connection from my client app to my secured Kafka 
broker (via SSL) I followed preparation procedure described in this section 
[http://kafka.apache.org/090/documentation.html#security_ssl]. There is a flow 
there in description of certificates generation. I was able to find a proper 
sequence of generation of certs and keys on Confluent.io 
[https://www.confluent.io/blog/apache-kafka-security-authorization-authentication-encryption/],
 but before that, when I used the first trust store I generated, it caused 
handshake exception as shown below:

{quote}[2017-07-14 05:24:48,958] DEBUG Accepted connection from 
/10.20.40.20:55609 on /10.20.40.12:9093 and assigned it to processor 3, 
sendBufferSize [actual|requested]: [102400|102400] recvBufferSize 
[actual|requested]: [102400|102400] (kafka.network.Acceptor)
[2017-07-14 05:24:48,959] DEBUG Processor 3 listening to new connection from 
/10.20.40.20:55609 (kafka.network.Processor)
[2017-07-14 05:24:48,971] DEBUG SSLEngine.closeInBound() raised an exception. 
(org.apache.kafka.common.network.SslTransportLayer)
javax.net.ssl.SSLException: Inbound closed before receiving peer's 
close_notify: possible truncation attack?
at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1666)
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1634)
at sun.security.ssl.SSLEngineImpl.closeInbound(SSLEngineImpl.java:1561)
at 
org.apache.kafka.common.network.SslTransportLayer.handshakeFailure(SslTransportLayer.java:730)
at 
org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:313)
at 
org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:74)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:374)
at org.apache.kafka.common.network.Selector.poll(Selector.java:326)
at kafka.network.Processor.poll(SocketServer.scala:499)
at kafka.network.Processor.run(SocketServer.scala:435)
at java.lang.Thread.run(Thread.java:748)
[2017-07-14 05:24:48,971] DEBUG Connection with /10.20.40.20 disconnected 
(org.apache.kafka.common.network.Selector)
javax.net.ssl.SSLProtocolException: Handshake message sequence violation, state 
= 1, type = 1
at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1487)
at 
sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535)
at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:813)
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
at 
org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:411)
at 
org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:269)
at 
org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:74)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:374)
at org.apache.kafka.common.network.Selector.poll(Selector.java:326)
at kafka.network.Processor.poll(SocketServer.scala:499)
at kafka.network.Processor.run(SocketServer.scala:435)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.net.ssl.SSLProtocolException: Handshake message sequence 
violation, state = 1, type = 1
at 
sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:213)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026)
at sun.security.ssl.Handshaker$1.run(Handshaker.java:966)
at sun.security.ssl.Handshaker$1.run(Handshaker.java:963)
at java.security.AccessController.doPrivileged(Native Method)
at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1416)
at 
org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:335)
at 
org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:416)
... 7 more
{quote}

Which is ok obviously for the broken trust store case. However my client app 
did not receive any exception or error message back. It did not stop either. 
Instead it fell into a infinite loop of re-tries, generating huge log with 
exceptions as shown above. I tried to check if there is any client app property 
that controls the number of re-attempt

[jira] [Commented] (KAFKA-5587) Processor got uncaught exception: NullPointerException

2017-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087123#comment-16087123
 ] 

ASF GitHub Bot commented on KAFKA-5587:
---

GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/3526

KAFKA-5587: Remove channel only after staged receives are delivered

When idle connections are closed, ensure that channels with staged receives 
are retained in `closingChannels` until all staged receives are completed. Also 
ensure that only one staged receive is completed in each poll, even when 
channels are closed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-5587

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3526.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3526


commit d6848a3b1dcf9cc73caf06b68bdc89570c396c81
Author: Rajini Sivaram 
Date:   2017-07-14T08:36:22Z

KAFKA-5587: Remove channel only after staged receives are delivered




> Processor got uncaught exception: NullPointerException
> --
>
> Key: KAFKA-5587
> URL: https://issues.apache.org/jira/browse/KAFKA-5587
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.1.1
>Reporter: Dan
>Assignee: Rajini Sivaram
>
> [2017-07-12 21:56:39,964] ERROR Processor got uncaught exception. 
> (kafka.network.Processor)
> java.lang.NullPointerException
> at 
> kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:490)
> at 
> kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:487)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at 
> kafka.network.Processor.processCompletedReceives(SocketServer.scala:487)
> at kafka.network.Processor.run(SocketServer.scala:417)
> at java.lang.Thread.run(Thread.java:745)
> Anyone knows the cause of this exception? What's the effect of it? 
> When this exception occurred, the log also showed that the broker was 
> frequently shrinking ISR to itself. Are these two things interrelated?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception

2017-07-14 Thread Rajini Sivaram (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087136#comment-16087136
 ] 

Rajini Sivaram commented on KAFKA-4669:
---

I have submitted a PR for one scenario that can cause this behaviour under 
KAFKA-5587. Couldn't think of any other cases, but it is hard to be sure that 
there aren't others. 

> KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws 
> exception
> -
>
> Key: KAFKA-4669
> URL: https://issues.apache.org/jira/browse/KAFKA-4669
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Cheng Ju
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.1
>
>
> There is no try catch in NetworkClient.handleCompletedReceives.  If an 
> exception is thrown after inFlightRequests.completeNext(source), then the 
> corresponding RecordBatch's done will never get called, and 
> KafkaProducer.flush will hang on this RecordBatch.
> I've checked 0.10 code and think this bug does exist in 0.10 versions.
> A real case.  First a correlateId not match exception happens:
> 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] 
> (org.apache.kafka.clients.producer.internals.Sender.run:130)  - Uncaught 
> error in kafka producer I/O thread: 
> java.lang.IllegalStateException: Correlation id for response (703766) does 
> not match request (703764)
>   at 
> org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
>   at java.lang.Thread.run(Thread.java:745)
> Then jstack shows the thread is hanging on:
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57)
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544)
>   at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224)
>   at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
>   at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
>   at java.lang.Thread.run(Thread.java:745)
> client code 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5592) Connection with plain client to SSL-secured broker causes OOM

2017-07-14 Thread JIRA
Marcin Łuczyński created KAFKA-5592:
---

 Summary: Connection with plain client to SSL-secured broker causes 
OOM
 Key: KAFKA-5592
 URL: https://issues.apache.org/jira/browse/KAFKA-5592
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.11.0.0
 Environment: Linux x86_64 x86_64 x86_64 GNU/Linux
Reporter: Marcin Łuczyński
 Attachments: heapdump.20170713.100129.14207.0002.phd, Heap.PNG, 
javacore.20170713.100129.14207.0003.txt, Snap.20170713.100129.14207.0004.trc, 
Stack.PNG

While testing connection with client app that does not have configured 
truststore with a Kafka broker secured by SSL, my JVM crashes with 
OutOfMemoryError. I saw it mixed with StackOverfowError. I attach dump files.

The stack trace to start with is here:

{quote}at java/nio/HeapByteBuffer. (HeapByteBuffer.java:57) 
at java/nio/ByteBuffer.allocate(ByteBuffer.java:331) 
at 
org/apache/kafka/common/network/NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
 
at 
org/apache/kafka/common/network/NetworkReceive.readFrom(NetworkReceive.java:71) 
at org/apache/kafka/common/network/KafkaChannel.receive(KafkaChannel.java:169) 
at org/apache/kafka/common/network/KafkaChannel.read(KafkaChannel.java:150) 
at 
org/apache/kafka/common/network/Selector.pollSelectionKeys(Selector.java:355) 
at org/apache/kafka/common/network/Selector.poll(Selector.java:303) 
at org/apache/kafka/clients/NetworkClient.poll(NetworkClient.java:349) 
at 
org/apache/kafka/clients/consumer/internals/ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226)
 
at 
org/apache/kafka/clients/consumer/internals/ConsumerNetworkClient.poll(ConsumerNetworkClient.java:188)
 
at 
org/apache/kafka/clients/consumer/internals/AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:207)
 
at 
org/apache/kafka/clients/consumer/internals/AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:193)
 
at 
org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.poll(ConsumerCoordinator.java:279)
 
at 
org/apache/kafka/clients/consumer/KafkaConsumer.pollOnce(KafkaConsumer.java:1029)
 
at org/apache/kafka/clients/consumer/KafkaConsumer.poll(KafkaConsumer.java:995) 
at 
com/ibm/is/cc/kafka/runtime/KafkaConsumerProcessor.process(KafkaConsumerProcessor.java:237)
 
at com/ibm/is/cc/kafka/runtime/KafkaProcessor.process(KafkaProcessor.java:173) 
at 
com/ibm/is/cc/javastage/connector/CC_JavaAdapter.run(CC_JavaAdapter.java:443){quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5584) Incorrect log size for topics larger than 2 GB

2017-07-14 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5584:
---
Affects Version/s: 0.11.0.0

> Incorrect log size for topics larger than 2 GB
> --
>
> Key: KAFKA-5584
> URL: https://issues.apache.org/jira/browse/KAFKA-5584
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.11.0.0
>Reporter: Gregor Uhlenheuer
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.1
>
> Attachments: Screen Shot 2017-07-12 at 09.10.53.png
>
>
> The {{size}} of a {{Log}} is calculated incorrectly due to an Integer 
> overflow. For large topics (larger than 2 GB) this value overflows.
> This is easily observable in the reported metrics values of the path 
> {{log.Log.partition.*.topic..Size}} (see attached screenshot).
> Moreover I think this breaks the size-based retention (via 
> {{log.retention.bytes}} and {{retention.bytes}}) of large topics as well.
> I am not sure on the recommended workflow, should I open a pull request on 
> github with a fix?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5584) Incorrect log size for topics larger than 2 GB

2017-07-14 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087170#comment-16087170
 ] 

Ismael Juma commented on KAFKA-5584:


Thanks for tracking it down [~kongo2002], that looks right to me.

> Incorrect log size for topics larger than 2 GB
> --
>
> Key: KAFKA-5584
> URL: https://issues.apache.org/jira/browse/KAFKA-5584
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.11.0.0
>Reporter: Gregor Uhlenheuer
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.1
>
> Attachments: Screen Shot 2017-07-12 at 09.10.53.png
>
>
> The {{size}} of a {{Log}} is calculated incorrectly due to an Integer 
> overflow. For large topics (larger than 2 GB) this value overflows.
> This is easily observable in the reported metrics values of the path 
> {{log.Log.partition.*.topic..Size}} (see attached screenshot).
> Moreover I think this breaks the size-based retention (via 
> {{log.retention.bytes}} and {{retention.bytes}}) of large topics as well.
> I am not sure on the recommended workflow, should I open a pull request on 
> github with a fix?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5545) Kafka Stream not able to successfully restart over new broker ip

2017-07-14 Thread Yogesh BG (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087204#comment-16087204
 ] 

Yogesh BG commented on KAFKA-5545:
--

Any updates on this issue?


> Kafka Stream not able to successfully restart over new broker ip
> 
>
> Key: KAFKA-5545
> URL: https://issues.apache.org/jira/browse/KAFKA-5545
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: Yogesh BG
>Priority: Critical
> Attachments: kafkastreams.log
>
>
> Hi
> I have one kafka broker and one kafka stream application
> initially kafka stream connected and starts processing data. Then i restart 
> the broker. When broker restarts new ip will be assigned.
> In kafka stream i have a 5min interval thread which checks if broker ip 
> changed and if changed, we cleanup the stream, rebuild topology(tried with 
> reusing topology) and start the stream again. I end up with the following 
> exceptions.
> 11:04:08.032 [StreamThread-38] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-38] Creating active task 0_5 with assigned 
> partitions [PR-5]
> 11:04:08.033 [StreamThread-41] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-41] Creating active task 0_1 with assigned 
> partitions [PR-1]
> 11:04:08.036 [StreamThread-34] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-34] Creating active task 0_7 with assigned 
> partitions [PR-7]
> 11:04:08.036 [StreamThread-37] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-37] Creating active task 0_3 with assigned 
> partitions [PR-3]
> 11:04:08.036 [StreamThread-45] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-45] Creating active task 0_0 with assigned 
> partitions [PR-0]
> 11:04:08.037 [StreamThread-36] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-36] Creating active task 0_4 with assigned 
> partitions [PR-4]
> 11:04:08.037 [StreamThread-43] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-43] Creating active task 0_6 with assigned 
> partitions [PR-6]
> 11:04:08.038 [StreamThread-48] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-48] Creating active task 0_2 with assigned 
> partitions [PR-2]
> 11:04:09.034 [StreamThread-38] WARN  o.a.k.s.p.internals.StreamThread - Could 
> not create task 0_5. Will retry.
> org.apache.kafka.streams.errors.LockException: task [0_5] Failed to lock the 
> state directory for task 0_5
> at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:100)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:73)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:108)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:864)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1237)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1210)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:967)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$600(StreamThread.java:69)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:234)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:259)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:352)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.clients.consumer.internal

[jira] [Created] (KAFKA-5593) Kafka streams not re-balancing when 3 consumer streams are there

2017-07-14 Thread Yogesh BG (JIRA)
Yogesh BG created KAFKA-5593:


 Summary: Kafka streams not re-balancing when 3 consumer streams 
are there
 Key: KAFKA-5593
 URL: https://issues.apache.org/jira/browse/KAFKA-5593
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Yogesh BG
Priority: Critical
 Attachments: log1.txt, log2.txt, log3.txt

I have 3 broker nodes, 3 kafka streams

I observe that all 3 consumer streams are part of the group named 
rtp-kafkastreams. but when i see the data is processed only by one node. 

DEBUG n.a.a.k.a.AccessLogMetricEnrichmentProcessor - 
AccessLogMetricEnrichmentProcessor.process

when i do check the partition information shared by each of them i see first 
node has all partitions like all 8. but in other streams the folder is empty.


[root@ip-172-31-11-139 ~]# ls /data/kstreams/rtp-kafkastreams
0_0  0_1  0_2  0_3  0_4  0_5  0_6  0_7

 and  this folder is empty

I tried restarting the other two consumer streams still they won't become the 
part of the group and re-balance.

I have attached the logs.

Configurations are inside the log file.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5594) Tag messages with authenticated user who produced it

2017-07-14 Thread Jack Andrews (JIRA)
Jack Andrews created KAFKA-5594:
---

 Summary: Tag messages with authenticated user who produced it
 Key: KAFKA-5594
 URL: https://issues.apache.org/jira/browse/KAFKA-5594
 Project: Kafka
  Issue Type: Wish
Reporter: Jack Andrews


I see that Kafka now supports custom headers through KIP-82.  Would it be 
possible to hook this up authorization such that the authenticated user is 
added as a header?

Thank you.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5127) Replace pattern matching with foreach where the case None is unused

2017-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087592#comment-16087592
 ] 

ASF GitHub Bot commented on KAFKA-5127:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2919


> Replace pattern matching with foreach where the case None is unused 
> 
>
> Key: KAFKA-5127
> URL: https://issues.apache.org/jira/browse/KAFKA-5127
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Balint Molnar
>Assignee: Balint Molnar
>Priority: Minor
> Fix For: 0.11.1.0
>
>
> There are various place where pattern matching is used with matching only for 
> one thing and ignoring the None type, this can be replaced with foreach.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception

2017-07-14 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087695#comment-16087695
 ] 

Ismael Juma commented on KAFKA-4669:


We should probably close this when the PR for KAFKA-5587 is merged. People can 
always reopen if it happens again.

> KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws 
> exception
> -
>
> Key: KAFKA-4669
> URL: https://issues.apache.org/jira/browse/KAFKA-4669
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Cheng Ju
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.1
>
>
> There is no try catch in NetworkClient.handleCompletedReceives.  If an 
> exception is thrown after inFlightRequests.completeNext(source), then the 
> corresponding RecordBatch's done will never get called, and 
> KafkaProducer.flush will hang on this RecordBatch.
> I've checked 0.10 code and think this bug does exist in 0.10 versions.
> A real case.  First a correlateId not match exception happens:
> 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] 
> (org.apache.kafka.clients.producer.internals.Sender.run:130)  - Uncaught 
> error in kafka producer I/O thread: 
> java.lang.IllegalStateException: Correlation id for response (703766) does 
> not match request (703764)
>   at 
> org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
>   at java.lang.Thread.run(Thread.java:745)
> Then jstack shows the thread is hanging on:
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57)
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544)
>   at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224)
>   at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
>   at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
>   at java.lang.Thread.run(Thread.java:745)
> client code 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5595) Illegal state in SocketServer; attempt to send with another send in progress

2017-07-14 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-5595:
--

 Summary: Illegal state in SocketServer; attempt to send with 
another send in progress
 Key: KAFKA-5595
 URL: https://issues.apache.org/jira/browse/KAFKA-5595
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


I have seen this a couple times, but I'm not sure the conditions associated 
with it. 

{code}
java.lang.IllegalStateException: Attempt to begin a send operation with prior 
send operation still in progress.
at 
org.apache.kafka.common.network.KafkaChannel.setSend(KafkaChannel.java:138)
at org.apache.kafka.common.network.Selector.send(Selector.java:248)
at kafka.network.Processor.sendResponse(SocketServer.scala:488)
at kafka.network.Processor.processNewResponses(SocketServer.scala:466)
at kafka.network.Processor.run(SocketServer.scala:431)
at java.lang.Thread.run(Thread.java:748)
{code}

Prior to this event, I see a lot of this message in the logs (always for the 
same connection id):
{code}
Attempting to send response via channel for which there is no open connection, 
connection id 7
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5595) Illegal state in SocketServer; attempt to send with another send in progress

2017-07-14 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087745#comment-16087745
 ] 

Jason Gustafson commented on KAFKA-5595:


cc [~rsivaram] In case you have any ideas.

> Illegal state in SocketServer; attempt to send with another send in progress
> 
>
> Key: KAFKA-5595
> URL: https://issues.apache.org/jira/browse/KAFKA-5595
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>
> I have seen this a couple times, but I'm not sure the conditions associated 
> with it. 
> {code}
> java.lang.IllegalStateException: Attempt to begin a send operation with prior 
> send operation still in progress.
>   at 
> org.apache.kafka.common.network.KafkaChannel.setSend(KafkaChannel.java:138)
>   at org.apache.kafka.common.network.Selector.send(Selector.java:248)
>   at kafka.network.Processor.sendResponse(SocketServer.scala:488)
>   at kafka.network.Processor.processNewResponses(SocketServer.scala:466)
>   at kafka.network.Processor.run(SocketServer.scala:431)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Prior to this event, I see a lot of this message in the logs (always for the 
> same connection id):
> {code}
> Attempting to send response via channel for which there is no open 
> connection, connection id 7
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5595) Illegal state in SocketServer; attempt to send with another send in progress

2017-07-14 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087749#comment-16087749
 ] 

Ismael Juma commented on KAFKA-5595:


I haven't checked this in detail, but a possibility:

1. There is an inflight response for client c
2. There is a disconnection and reconnection from client c causing us to lose 
the channel state
3. Client c sends another request
4. Because we lost the channel state and we don't drain the response queue on 
disconnections, we may allow 2 inflight responses for the same connection
5. Under the right set of circumstances, this could lead to the 
IllegalStateException reported

Am I missing something?

> Illegal state in SocketServer; attempt to send with another send in progress
> 
>
> Key: KAFKA-5595
> URL: https://issues.apache.org/jira/browse/KAFKA-5595
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>
> I have seen this a couple times, but I'm not sure the conditions associated 
> with it. 
> {code}
> java.lang.IllegalStateException: Attempt to begin a send operation with prior 
> send operation still in progress.
>   at 
> org.apache.kafka.common.network.KafkaChannel.setSend(KafkaChannel.java:138)
>   at org.apache.kafka.common.network.Selector.send(Selector.java:248)
>   at kafka.network.Processor.sendResponse(SocketServer.scala:488)
>   at kafka.network.Processor.processNewResponses(SocketServer.scala:466)
>   at kafka.network.Processor.run(SocketServer.scala:431)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Prior to this event, I see a lot of this message in the logs (always for the 
> same connection id):
> {code}
> Attempting to send response via channel for which there is no open 
> connection, connection id 7
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5595) Illegal state in SocketServer; attempt to send with another send in progress

2017-07-14 Thread Rajini Sivaram (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087916#comment-16087916
 ] 

Rajini Sivaram commented on KAFKA-5595:
---

Connection id contains remote port, it is currently logging processor id, hence 
the integer value 7 in the logs. I have submitted a PR to fix that.

[~ijuma] Since connection id contains remote port, for the scenario you 
described, the port needs to get reused. Typically that shouldn't happen while 
still processing requests of an older connection using that port. The channel 
should have been closed and removed from selector before the new connection.



> Illegal state in SocketServer; attempt to send with another send in progress
> 
>
> Key: KAFKA-5595
> URL: https://issues.apache.org/jira/browse/KAFKA-5595
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>
> I have seen this a couple times, but I'm not sure the conditions associated 
> with it. 
> {code}
> java.lang.IllegalStateException: Attempt to begin a send operation with prior 
> send operation still in progress.
>   at 
> org.apache.kafka.common.network.KafkaChannel.setSend(KafkaChannel.java:138)
>   at org.apache.kafka.common.network.Selector.send(Selector.java:248)
>   at kafka.network.Processor.sendResponse(SocketServer.scala:488)
>   at kafka.network.Processor.processNewResponses(SocketServer.scala:466)
>   at kafka.network.Processor.run(SocketServer.scala:431)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Prior to this event, I see a lot of this message in the logs (always for the 
> same connection id):
> {code}
> Attempting to send response via channel for which there is no open 
> connection, connection id 7
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-5595) Illegal state in SocketServer; attempt to send with another send in progress

2017-07-14 Thread Rajini Sivaram (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087916#comment-16087916
 ] 

Rajini Sivaram edited comment on KAFKA-5595 at 7/14/17 7:56 PM:


Connection id contains remote port, it is currently logging processor id, hence 
the integer value 7 in the logs. I have submitted a PR to fix that.

[~ijuma] Since connection id contains remote port, for the scenario you 
described, the port needs to get reused. Typically that shouldn't happen while 
still processing requests of an older connection using that port. In theory, I 
suppose it could happen if there lots of connections being created and closed.




was (Author: rsivaram):
Connection id contains remote port, it is currently logging processor id, hence 
the integer value 7 in the logs. I have submitted a PR to fix that.

[~ijuma] Since connection id contains remote port, for the scenario you 
described, the port needs to get reused. Typically that shouldn't happen while 
still processing requests of an older connection using that port. The channel 
should have been closed and removed from selector before the new connection.



> Illegal state in SocketServer; attempt to send with another send in progress
> 
>
> Key: KAFKA-5595
> URL: https://issues.apache.org/jira/browse/KAFKA-5595
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>
> I have seen this a couple times, but I'm not sure the conditions associated 
> with it. 
> {code}
> java.lang.IllegalStateException: Attempt to begin a send operation with prior 
> send operation still in progress.
>   at 
> org.apache.kafka.common.network.KafkaChannel.setSend(KafkaChannel.java:138)
>   at org.apache.kafka.common.network.Selector.send(Selector.java:248)
>   at kafka.network.Processor.sendResponse(SocketServer.scala:488)
>   at kafka.network.Processor.processNewResponses(SocketServer.scala:466)
>   at kafka.network.Processor.run(SocketServer.scala:431)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Prior to this event, I see a lot of this message in the logs (always for the 
> same connection id):
> {code}
> Attempting to send response via channel for which there is no open 
> connection, connection id 7
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5595) Illegal state in SocketServer; attempt to send with another send in progress

2017-07-14 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087999#comment-16087999
 ] 

Ismael Juma commented on KAFKA-5595:


[~rsivaram], this is a pretty rare case, we only saw two log entries for the 
ISE after a lot of log entries for closed channels. Also, the kernel tends to 
favouring reusing ports that have just been released (based on our experience 
with port conflicts in tests). So, I think it seems possible, although I am not 
sure.

> Illegal state in SocketServer; attempt to send with another send in progress
> 
>
> Key: KAFKA-5595
> URL: https://issues.apache.org/jira/browse/KAFKA-5595
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>
> I have seen this a couple times, but I'm not sure the conditions associated 
> with it. 
> {code}
> java.lang.IllegalStateException: Attempt to begin a send operation with prior 
> send operation still in progress.
>   at 
> org.apache.kafka.common.network.KafkaChannel.setSend(KafkaChannel.java:138)
>   at org.apache.kafka.common.network.Selector.send(Selector.java:248)
>   at kafka.network.Processor.sendResponse(SocketServer.scala:488)
>   at kafka.network.Processor.processNewResponses(SocketServer.scala:466)
>   at kafka.network.Processor.run(SocketServer.scala:431)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Prior to this event, I see a lot of this message in the logs (always for the 
> same connection id):
> {code}
> Attempting to send response via channel for which there is no open 
> connection, connection id 7
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-5595) Illegal state in SocketServer; attempt to send with another send in progress

2017-07-14 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087999#comment-16087999
 ] 

Ismael Juma edited comment on KAFKA-5595 at 7/14/17 9:12 PM:
-

[~rsivaram], this is a pretty rare case, we only saw two log entries for the 
ISE after a lot of log entries for closed channels. Also, the kernel tends to 
favour reusing ports that have just been released (based on our experience with 
port conflicts in tests). So, I think it seems possible, although I am not sure.


was (Author: ijuma):
[~rsivaram], this is a pretty rare case, we only saw two log entries for the 
ISE after a lot of log entries for closed channels. Also, the kernel tends to 
favouring reusing ports that have just been released (based on our experience 
with port conflicts in tests). So, I think it seems possible, although I am not 
sure.

> Illegal state in SocketServer; attempt to send with another send in progress
> 
>
> Key: KAFKA-5595
> URL: https://issues.apache.org/jira/browse/KAFKA-5595
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>
> I have seen this a couple times, but I'm not sure the conditions associated 
> with it. 
> {code}
> java.lang.IllegalStateException: Attempt to begin a send operation with prior 
> send operation still in progress.
>   at 
> org.apache.kafka.common.network.KafkaChannel.setSend(KafkaChannel.java:138)
>   at org.apache.kafka.common.network.Selector.send(Selector.java:248)
>   at kafka.network.Processor.sendResponse(SocketServer.scala:488)
>   at kafka.network.Processor.processNewResponses(SocketServer.scala:466)
>   at kafka.network.Processor.run(SocketServer.scala:431)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Prior to this event, I see a lot of this message in the logs (always for the 
> same connection id):
> {code}
> Attempting to send response via channel for which there is no open 
> connection, connection id 7
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5016) Consumer hang in poll method while rebalancing is in progress

2017-07-14 Thread Geoffrey Stewart (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088285#comment-16088285
 ] 

Geoffrey Stewart commented on KAFKA-5016:
-

I have also encountered the issue documented in this Jira using 0.10.2.0 
brokers with the 0.10.2.0 client.  This issue only occurs when we use the 
"subscribe" call from the API, which dynamically assigns partitions.  When we 
use the "assign" call from the API, to manually assign lists of partitions, we 
do not have any issue.  I don't think what is being described above represents 
the expected behavior of dynamic partition assignment and consumer group 
coordination.  Based on the above explanation it sounds like it would not be 
possible to have 2 or more simultaneous consumer instances in the same consumer 
group when using dynamic partition assignment (subscribe).  For example, there 
could be one consumer instance in the group which has made some calls to 
"poll".  As soon as a second consumer instance comes along, it's call to "poll" 
is only processed after max.poll.interval.ms has elapsed since the first 
consumer's most recent poll request - at this time the broker will no longer 
consider that this first consumer is part of the group.  I certainly agree that 
with the arrival of the second consumer to the group, the broker must perform a 
rebalance or restabilization which may take some time.  However this should not 
take max.poll.interval.ms since the liveness of the first consumer should be 
maintained by it's heartbeat which occurs every heartbeat.interval.ms.  I have 
confirmed that by using the default value for the property max.poll.interval.ms 
of 30, the group restabilization (rebalance) takes about this long (5mins) 
and then the second consumer instance's poll request is processed.  Lowering 
this value to 3, has the effect of reducing the group restabilization 
(rebalance) to about 30 seconds before the second consumer instance's poll 
request is processed.
To summarize, please explain how I can establish parallel consumer instances in 
the same group using the subscribe method from the API, which dynamically 
assigns partitions.  Further, please help me to understand why the consumer 
instances heartbeat does not seem to be maintaining it's liveness.

> Consumer hang in poll method while rebalancing is in progress
> -
>
> Key: KAFKA-5016
> URL: https://issues.apache.org/jira/browse/KAFKA-5016
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.1.0, 0.10.2.0
>Reporter: Domenico Di Giulio
>Assignee: Vahid Hashemian
> Attachments: Kafka 0.10.2.0 Issue (TRACE) - Server + Client.txt, 
> Kafka 0.10.2.0 Issue (TRACE).txt, KAFKA_5016.java
>
>
> After moving to Kafka 0.10.2.0, it looks like I'm experiencing a hang in the 
> rebalancing code. 
> This is a test case, not (still) production code. It does the following with 
> a single-partition topic and two consumers in the same group:
> 1) a topic with one partition is forced to be created (auto-created)
> 2) a producer is used to write 10 messages
> 3) the first consumer reads all the messages and commits
> 4) the second consumer attempts a poll() and hangs indefinitely
> The same issue can't be found with 0.10.0.0.
> See the attached logs at TRACE level. Look for "SERVER HANGS" to see where 
> the hang is found: when this happens, the client keeps failing any hearbeat 
> attempt, as the rebalancing is in progress, and the poll method hangs 
> indefinitely.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5016) Consumer hang in poll method while rebalancing is in progress

2017-07-14 Thread Domenico Di Giulio (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088288#comment-16088288
 ] 

Domenico Di Giulio commented on KAFKA-5016:
---

I am currently out of the office, with no access to my e-mail.
I will be back at work on July 27.

** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e 
non comportano alcun vincolo nè creano obblighi per la Banca stessa, salvo che 
ciò non sia espressamente previsto da un accordo scritto. Questa e-mail è 
confidenziale. Qualora l'avesse ricevuta per errore, La preghiamo di 
comunicarne via e-mail la ricezione al mittente e di distruggere il contenuto. 
La informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi 
allegati potrebbe costituire reato. Grazie per la collaborazione.
-- E-mail from Bank of Italy are sent in good faith but they are neither 
binding on the Bank nor to be understood as creating any obligation on its part 
except where provided for in a written agreement. This e-mail is confidential. 
If you have received it by mistake, please inform the sender by reply e-mail 
and delete it from your system. Please also note that the unauthorized 
disclosure or use of the message or any attachments could be an offence. Thank 
you for your cooperation. **


> Consumer hang in poll method while rebalancing is in progress
> -
>
> Key: KAFKA-5016
> URL: https://issues.apache.org/jira/browse/KAFKA-5016
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.1.0, 0.10.2.0
>Reporter: Domenico Di Giulio
>Assignee: Vahid Hashemian
> Attachments: Kafka 0.10.2.0 Issue (TRACE) - Server + Client.txt, 
> Kafka 0.10.2.0 Issue (TRACE).txt, KAFKA_5016.java
>
>
> After moving to Kafka 0.10.2.0, it looks like I'm experiencing a hang in the 
> rebalancing code. 
> This is a test case, not (still) production code. It does the following with 
> a single-partition topic and two consumers in the same group:
> 1) a topic with one partition is forced to be created (auto-created)
> 2) a producer is used to write 10 messages
> 3) the first consumer reads all the messages and commits
> 4) the second consumer attempts a poll() and hangs indefinitely
> The same issue can't be found with 0.10.0.0.
> See the attached logs at TRACE level. Look for "SERVER HANGS" to see where 
> the hang is found: when this happens, the client keeps failing any hearbeat 
> attempt, as the rebalancing is in progress, and the poll method hangs 
> indefinitely.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)