[jira] [Updated] (KAFKA-3210) Using asynchronous calls through the raw ZK API in ZkUtils

2016-02-07 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated KAFKA-3210:

Description: 
We have observed a number of issues with the controller interaction with 
ZooKeeper mainly because ZkClient creates new sessions transparently under the 
hood. Creating sessions transparently enables, for example, old controller to 
successfully update znodes in ZooKeeper even when they aren't the controller 
any longer (e.g., KAFKA-3083). To fix this, we need to bypass the ZkClient lib 
like we did with ZKWatchedEphemeral.

In addition to fixing such races with the controller, it would improve 
performance significantly if we used the async API (see KAFKA-3038). The async 
API is more efficient because it pipelines the requests to ZooKeeper, and the 
number of requests upon controller recovery can be large.

This jira proposes to make these two changes to the calls in ZkUtils and to do 
it, one path is to first replace the calls in ZkUtils with raw async ZK calls 
and block so that we don't have to change the controller code in this phase. 
Once this step is accomplished and it is stable, we make changes to the 
controller to handle the asynchronous calls to ZooKeeper.

Note that in the first step, we will need to introduce some new logic for 
session management, which is currently handled entirely by ZkClient. We will 
also need to implement the subscription mechanism for event notifications (see 
ZooKeeperLeaderElector as a an exemple).  

  was:
We have observed a number of issues with the controller interaction with 
ZooKeeper mainly because ZkClient creates new sessions transparently under the 
hood. Creating sessions transparently enables, for example, old controller to 
successfully update znodes in ZooKeeper even when they aren't the controller 
any longer (e.g., KAFKA-3083). To fix this, we need to bypass the ZkClient lib 
like we did with ZKWatchedEphemeral.

In addition to fixing such races with the controller, it would improve 
performance significantly if we used the async API (see KAFKA-3038). The async 
API is more efficient because it pipelines the requests to ZooKeeper, and the 
number of requests upon controller recovery can be large.

This jira proposes to make these two changes to the calls in ZkUtils and to do 
it, one path is to first replace the calls in ZkUtils with raw async ZK calls 
and block so that we don't have to change the controller code in this phase. 
Once this step is accomplished and it is stable, we make changes to the 
controller to handle the asynchronous calls to ZooKeeper.

Note that in the first step, we will need to introduce some logic for session 
management, which is currently handled entirely by ZkClient.  


> Using asynchronous calls through the raw ZK API in ZkUtils
> --
>
> Key: KAFKA-3210
> URL: https://issues.apache.org/jira/browse/KAFKA-3210
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, zkclient
>Affects Versions: 0.9.0.0
>Reporter: Flavio Junqueira
>
> We have observed a number of issues with the controller interaction with 
> ZooKeeper mainly because ZkClient creates new sessions transparently under 
> the hood. Creating sessions transparently enables, for example, old 
> controller to successfully update znodes in ZooKeeper even when they aren't 
> the controller any longer (e.g., KAFKA-3083). To fix this, we need to bypass 
> the ZkClient lib like we did with ZKWatchedEphemeral.
> In addition to fixing such races with the controller, it would improve 
> performance significantly if we used the async API (see KAFKA-3038). The 
> async API is more efficient because it pipelines the requests to ZooKeeper, 
> and the number of requests upon controller recovery can be large.
> This jira proposes to make these two changes to the calls in ZkUtils and to 
> do it, one path is to first replace the calls in ZkUtils with raw async ZK 
> calls and block so that we don't have to change the controller code in this 
> phase. Once this step is accomplished and it is stable, we make changes to 
> the controller to handle the asynchronous calls to ZooKeeper.
> Note that in the first step, we will need to introduce some new logic for 
> session management, which is currently handled entirely by ZkClient. We will 
> also need to implement the subscription mechanism for event notifications 
> (see ZooKeeperLeaderElector as a an exemple).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2985) Consumer group stuck in rebalancing state

2016-02-07 Thread Islam H A Azaz (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136317#comment-15136317
 ] 

Islam H A Azaz commented on KAFKA-2985:
---

I'm facing the same problem. Could I get the steps to have it fixed?

> Consumer group stuck in rebalancing state
> -
>
> Key: KAFKA-2985
> URL: https://issues.apache.org/jira/browse/KAFKA-2985
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.0
> Environment: Kafka 0.9.0.0.
> Kafka Java consumer 0.9.0.0
> 2 Java producers.
> 3 Java consumers using the new consumer API.
> 2 kafka brokers.
>Reporter: Jens Rantil
>Assignee: Jason Gustafson
>
> We've doing some load testing on Kafka. _After_ the load test when our 
> consumers and have two times now seen Kafka become stuck in consumer group 
> rebalancing. This is after all our consumers are done consuming and 
> essentially polling periodically without getting any records.
> The brokers list the consumer group (named "default"), but I can't query the 
> offsets:
> {noformat}
> jrantil@queue-0:/srv/kafka/kafka$ ./bin/kafka-consumer-groups.sh 
> --new-consumer --bootstrap-server localhost:9092 --list
> default
> jrantil@queue-0:/srv/kafka/kafka$ ./bin/kafka-consumer-groups.sh 
> --new-consumer --bootstrap-server localhost:9092 --describe --group 
> default|sort
> Consumer group `default` does not exist or is rebalancing.
> {noformat}
> Retrying to query the offsets for 15 minutes or so still said it was 
> rebalancing. After restarting our first broker, the group immediately started 
> rebalancing. That broker was logging this before restart:
> {noformat}
> [2015-12-12 13:09:48,517] INFO [Group Metadata Manager on Broker 0]: Removed 
> 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
> [2015-12-12 13:10:16,139] INFO [GroupCoordinator 0]: Stabilized group default 
> generation 16 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:10:16,141] INFO [GroupCoordinator 0]: Assignment received from 
> leader for group default for generation 16 
> (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:10:16,575] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group default with old generation 16 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:11:15,141] INFO [GroupCoordinator 0]: Stabilized group default 
> generation 17 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:11:15,143] INFO [GroupCoordinator 0]: Assignment received from 
> leader for group default for generation 17 
> (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:11:15,314] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group default with old generation 17 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:12:14,144] INFO [GroupCoordinator 0]: Stabilized group default 
> generation 18 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:12:14,145] INFO [GroupCoordinator 0]: Assignment received from 
> leader for group default for generation 18 
> (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:12:14,340] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group default with old generation 18 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:13:13,146] INFO [GroupCoordinator 0]: Stabilized group default 
> generation 19 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:13:13,148] INFO [GroupCoordinator 0]: Assignment received from 
> leader for group default for generation 19 
> (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:13:13,238] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group default with old generation 19 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:14:12,148] INFO [GroupCoordinator 0]: Stabilized group default 
> generation 20 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:14:12,149] INFO [GroupCoordinator 0]: Assignment received from 
> leader for group default for generation 20 
> (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:14:12,360] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group default with old generation 20 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:15:11,150] INFO [GroupCoordinator 0]: Stabilized group default 
> generation 21 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:15:11,152] INFO [GroupCoordinator 0]: Assignment received from 
> leader for group default for generation 21 
> (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:15:11,217] INFO [GroupCoordinator 0]: Preparing to restabilize 
> group default with old generation 21 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:16:10,152] INFO [GroupCoordinator 0]: Stabilized group default 
> generation 22 (kafka.coordinator.GroupCoordinator)
> [2015-12-12 13:16:10,154] INFO [GroupCoordinator 0]: Assignment received from 
> leader for group default for

ConsumerMetadataRequest

2016-02-07 Thread Àbéjídé Àyodélé
Hi,

The documentation for Kafka Protocol was modified

 removing ConsumerMetadataRequest.

What would you advice the maintainer of a client library that relies on
ConsumerMetadataRequest to do?

Also going forward are there plans to flag APIs as unstable/deprecated.

Thanks,

Abejide Ayodele
It always seems impossible until it's done. --Nelson Mandela


Re: ConsumerMetadataRequest

2016-02-07 Thread Magnus Edenhill
Hi Abejide,

the ConsumerMetadataRequest was renamed to GroupCoordinatorRequest to
indicate its more generic use than just consumer groups.
See the protocol guide for more info:
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-GroupMembershipAPI

Regards,
Magnus

2016-02-07 15:32 GMT+01:00 Àbéjídé Àyodélé :

> Hi,
>
> The documentation for Kafka Protocol was modified
> <
> https://cwiki.apache.org/confluence/pages/diffpages.action?originalId=61329518&pageId=61330213
> >
>  removing ConsumerMetadataRequest.
>
> What would you advice the maintainer of a client library that relies on
> ConsumerMetadataRequest to do?
>
> Also going forward are there plans to flag APIs as unstable/deprecated.
>
> Thanks,
>
> Abejide Ayodele
> It always seems impossible until it's done. --Nelson Mandela
>


[jira] [Commented] (KAFKA-3208) Default security.inter.broker.protocol based on configured advertised.listeners

2016-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136490#comment-15136490
 ] 

ASF GitHub Bot commented on KAFKA-3208:
---

Github user granthenke closed the pull request at:

https://github.com/apache/kafka/pull/869


> Default security.inter.broker.protocol based on configured 
> advertised.listeners
> ---
>
> Key: KAFKA-3208
> URL: https://issues.apache.org/jira/browse/KAFKA-3208
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Grant Henke
>Assignee: Grant Henke
>
> Currently the default security.inter.broker.protocol is PLAINTEXT. However, 
> when enabling SASL or SSL on a cluster it is common to not include a 
> PLAINTEXT listener which causes Kafka to fail startup (after KAFKA-3194). 
> To reduce the number of issues and make the configuration more convenient we 
> could default this value, if not explicitly set, to the first protocol 
> defined in the advertised.listeners configuration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3208: Default security.inter.broker.prot...

2016-02-07 Thread granthenke
Github user granthenke closed the pull request at:

https://github.com/apache/kafka/pull/869


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3208) Default security.inter.broker.protocol based on configured advertised.listeners

2016-02-07 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-3208:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

See discussion in the PR.

> Default security.inter.broker.protocol based on configured 
> advertised.listeners
> ---
>
> Key: KAFKA-3208
> URL: https://issues.apache.org/jira/browse/KAFKA-3208
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Grant Henke
>Assignee: Grant Henke
>
> Currently the default security.inter.broker.protocol is PLAINTEXT. However, 
> when enabling SASL or SSL on a cluster it is common to not include a 
> PLAINTEXT listener which causes Kafka to fail startup (after KAFKA-3194). 
> To reduce the number of issues and make the configuration more convenient we 
> could default this value, if not explicitly set, to the first protocol 
> defined in the advertised.listeners configuration. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3145) CPU Usage Spike to 100% when network connection is to error port

2016-02-07 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-3145:
---
   Assignee: (was: Jun Rao)
Description: 
CPU spike to 100% when network connection is to error port.
It seems network IO thread are very busy logging following error message. 

[2016-01-25 14:09:12,476] ERROR Uncaught error in kafka producer I/O thread:  
(org.apache.kafka.clients.producer.internals.Sender)
java.lang.IllegalStateException: Invalid request (size = -1241382912)
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:68)
at org.apache.kafka.common.network.Selector.poll(Selector.java:248)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122)
at java.lang.Thread.run(Thread.java:745)
[2016-01-25 14:09:12,479] ERROR Uncaught error in kafka producer I/O thread:  
(org.apache.kafka.clients.producer.internals.Sender)
java.lang.NullPointerException
at 
org.apache.kafka.common.network.NetworkReceive.complete(NetworkReceive.java:48)
at org.apache.kafka.common.network.Selector.poll(Selector.java:249)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122)
at java.lang.Thread.run(Thread.java:745)
[2016-01-25 14:09:12,480] ERROR Uncaught error in kafka producer I/O thread:  
(org.apache.kafka.clients.producer.internals.Sender)
java.lang.NullPointerException
at 
org.apache.kafka.common.network.NetworkReceive.complete(NetworkReceive.java:48)
at org.apache.kafka.common.network.Selector.poll(Selector.java:249)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122)
at java.lang.Thread.run(Thread.java:745)
[2016-01-25 14:09:12,480] ERROR Uncaught error in kafka producer I/O thread:  
(org.apache.kafka.clients.producer.internals.Sender)
java.lang.NullPointerException
at 
org.apache.kafka.common.network.NetworkReceive.complete(NetworkReceive.java:48)
at org.apache.kafka.common.network.Selector.poll(Selector.java:249)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122)
at java.lang.Thread.run(Thread.java:745)
[2016-01-25 14:09:12,481] ERROR Uncaught error in kafka producer I/O thread:  
(org.apache.kafka.clients.producer.internals.Sender)
java.lang.NullPointerException
at 
org.apache.kafka.common.network.NetworkReceive.complete(NetworkReceive.java:48)
at org.apache.kafka.common.network.Selector.poll(Selector.java:249)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122)
at java.lang.Thread.run(Thread.java:745)

Thanks,



  was:

CPU spike to 100% when network connection is to error port.
It seems network IO thread are very busy logging following error message. 

[2016-01-25 14:09:12,476] ERROR Uncaught error in kafka producer I/O thread:  
(org.apache.kafka.clients.producer.internals.Sender)
java.lang.IllegalStateException: Invalid request (size = -1241382912)
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:68)
at org.apache.kafka.common.network.Selector.poll(Selector.java:248)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122)
at java.lang.Thread.run(Thread.java:745)
[2016-01-25 14:09:12,479] ERROR Uncaught error in kafka producer I/O thread:  
(org.apache.kafka.clients.producer.internals.Sender)
java.lang.NullPointerException
at 
org.apache.kafka.common.network.NetworkReceive.complete(NetworkReceive.java:48)
at org.apache.kafka.common.network.Selector.poll(Selector.java:249)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122)
at java.lang.Thread.run(Thread.java:745)
[2

Fwd: KAFKA-3002 issue in 0.9.0.0

2016-02-07 Thread Roman Hoc
Hello,
I got no answer from you side about this issue, therefore I decided to
remind you on it.
Can you please provide this information?
Thank you, best regards,
Roman


-- Forwarded message --
From: Roman Hoc 
Date: 2016-01-29 12:14 GMT+01:00
Subject: KAFKA-3002 issue in 0.9.0.0
To: dev@kafka.apache.org


Hello,
can you please provide approximate dates of releases, where this bug fix is
officially released?
I.e. either 0.9.0.1 or 0.9.1.0?
Thank you, best regards,
Roman

https://issues.apache.org/jira/browse/KAFKA-3002