[jira] [Created] (KAFKA-7215) Improve LogCleaner behavior on error

2018-07-30 Thread Stanislav Kozlovski (JIRA)
Stanislav Kozlovski created KAFKA-7215:
--

 Summary: Improve LogCleaner behavior on error
 Key: KAFKA-7215
 URL: https://issues.apache.org/jira/browse/KAFKA-7215
 Project: Kafka
  Issue Type: Improvement
Reporter: Stanislav Kozlovski
Assignee: Stanislav Kozlovski


For more detailed information see 
[KIP-346|https://cwiki.apache.org/confluence/display/KAFKA/KIP-346+-+Improve+LogCleaner+behavior+on+error]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread Rajini Sivaram
The Apache Kafka community is pleased to announce the release for

Apache Kafka 2.0.0.





This is a major release and includes significant new features from

40 KIPs. It contains fixes and improvements from 246 JIRAs, including

a few critical bugs. Here is a summary of some notable changes:

** KIP-290 adds support for prefixed ACLs, simplifying access control
management in large secure deployments. Bulk access to topics,
consumer groups or transactional ids with a prefix can now be granted
using a single rule. Access control for topic creation has also been
improved to enable access to be granted to create specific topics or
topics with a prefix.

** KIP-255 adds a framework for authenticating to Kafka brokers using
OAuth2 bearer tokens. The SASL/OAUTHBEARER implementation is
customizable using callbacks for token retrieval and validation.

**Host name verification is now enabled by default for SSL connections
to ensure that the default SSL configuration is not susceptible to
man-in-the middle attacks. You can disable this verification for
deployments where validation is performed using other mechanisms.

** You can now dynamically update SSL trust stores without broker restart.
You can also configure security for broker listeners in ZooKeeper before
starting brokers, including SSL key store and trust store passwords and
JAAS configuration for SASL. With this new feature, you can store sensitive
password configs in encrypted form in ZooKeeper rather than in cleartext
in the broker properties file.

** The replication protocol has been improved to avoid log divergence
between leader and follower during fast leader failover. We have also
improved resilience of brokers by reducing the memory footprint of
message down-conversions. By using message chunking, both memory
usage and memory reference time have been reduced to avoid
OutOfMemory errors in brokers.

** Kafka clients are now notified of throttling before any throttling is
applied
when quotas are enabled. This enables clients to distinguish between
network errors and large throttle times when quotas are exceeded.

** We have added a configuration option for Kafka consumer to avoid
indefinite blocking in the consumer.

** We have dropped support for Java 7 and removed the previously
deprecated Scala producer and consumer.

** Kafka Connect includes a number of improvements and features.
KIP-298 enables you to control how errors in connectors, transformations
and converters are handled by enabling automatic retries and controlling the
number of errors that are tolerated before the connector is stopped. More
contextual information can be included in the logs to help diagnose problems
and problematic messages consumed by sink connectors can be sent to a
dead letter queue rather than forcing the connector to stop.

** KIP-297 adds a new extension point to move secrets out of connector
configurations and integrate with any external key management system.
The placeholders in connector configurations are only resolved before
sending the configuration to the connector, ensuring that secrets are stored
and managed securely in your preferred key management system and
not exposed over the REST APIs or in log files.

** We have added a thin Scala wrapper API for our Kafka Streams DSL,
which provides better type inference and better type safety during compile
time. Scala users can have less boilerplate in their code, notably regarding
Serdes with new implicit Serdes.

** Message headers are now supported in the Kafka Streams Processor API,
allowing users to add and manipulate headers read from the source topics
and propagate them to the sink topics.

** Windowed aggregations performance in Kafka Streams has been largely
improved (sometimes by an order of magnitude) thanks to the new
single-key-fetch API.

** We have further improved unit testibility of Kafka Streams with the
kafka-streams-testutil artifact.





All of the changes in this release can be found in the release notes:

https://www.apache.org/dist/kafka/2.0.0/RELEASE_NOTES.html





You can download the source and binary release (Scala 2.11 and Scala 2.12)
from:

https://kafka.apache.org/downloads#2.0.0




---





Apache Kafka is a distributed streaming platform with four core APIs:



** The Producer API allows an application to publish a stream records to

one or more Kafka topics.



** The Consumer API allows an application to subscribe to one or more

topics and process the stream of records produced to them.



** The Streams API allows an application to act as a stream processor,

consuming an input stream from one or more topics and producing an

output stream to one or more output topics, effectively transforming the

input streams to output streams.



** The Connector API allows building and running reusable producers or

consumers that connect Kafka to

Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread Damian Guy
Excellent! Thanks for running the release Rajini!

On Mon, 30 Jul 2018 at 11:25 Rajini Sivaram  wrote:

> The Apache Kafka community is pleased to announce the release for
>
> Apache Kafka 2.0.0.
>
>
>
>
>
> This is a major release and includes significant new features from
>
> 40 KIPs. It contains fixes and improvements from 246 JIRAs, including
>
> a few critical bugs. Here is a summary of some notable changes:
>
> ** KIP-290 adds support for prefixed ACLs, simplifying access control
> management in large secure deployments. Bulk access to topics,
> consumer groups or transactional ids with a prefix can now be granted
> using a single rule. Access control for topic creation has also been
> improved to enable access to be granted to create specific topics or
> topics with a prefix.
>
> ** KIP-255 adds a framework for authenticating to Kafka brokers using
> OAuth2 bearer tokens. The SASL/OAUTHBEARER implementation is
> customizable using callbacks for token retrieval and validation.
>
> **Host name verification is now enabled by default for SSL connections
> to ensure that the default SSL configuration is not susceptible to
> man-in-the middle attacks. You can disable this verification for
> deployments where validation is performed using other mechanisms.
>
> ** You can now dynamically update SSL trust stores without broker restart.
> You can also configure security for broker listeners in ZooKeeper before
> starting brokers, including SSL key store and trust store passwords and
> JAAS configuration for SASL. With this new feature, you can store sensitive
> password configs in encrypted form in ZooKeeper rather than in cleartext
> in the broker properties file.
>
> ** The replication protocol has been improved to avoid log divergence
> between leader and follower during fast leader failover. We have also
> improved resilience of brokers by reducing the memory footprint of
> message down-conversions. By using message chunking, both memory
> usage and memory reference time have been reduced to avoid
> OutOfMemory errors in brokers.
>
> ** Kafka clients are now notified of throttling before any throttling is
> applied
> when quotas are enabled. This enables clients to distinguish between
> network errors and large throttle times when quotas are exceeded.
>
> ** We have added a configuration option for Kafka consumer to avoid
> indefinite blocking in the consumer.
>
> ** We have dropped support for Java 7 and removed the previously
> deprecated Scala producer and consumer.
>
> ** Kafka Connect includes a number of improvements and features.
> KIP-298 enables you to control how errors in connectors, transformations
> and converters are handled by enabling automatic retries and controlling
> the
> number of errors that are tolerated before the connector is stopped. More
> contextual information can be included in the logs to help diagnose
> problems
> and problematic messages consumed by sink connectors can be sent to a
> dead letter queue rather than forcing the connector to stop.
>
> ** KIP-297 adds a new extension point to move secrets out of connector
> configurations and integrate with any external key management system.
> The placeholders in connector configurations are only resolved before
> sending the configuration to the connector, ensuring that secrets are
> stored
> and managed securely in your preferred key management system and
> not exposed over the REST APIs or in log files.
>
> ** We have added a thin Scala wrapper API for our Kafka Streams DSL,
> which provides better type inference and better type safety during compile
> time. Scala users can have less boilerplate in their code, notably
> regarding
> Serdes with new implicit Serdes.
>
> ** Message headers are now supported in the Kafka Streams Processor API,
> allowing users to add and manipulate headers read from the source topics
> and propagate them to the sink topics.
>
> ** Windowed aggregations performance in Kafka Streams has been largely
> improved (sometimes by an order of magnitude) thanks to the new
> single-key-fetch API.
>
> ** We have further improved unit testibility of Kafka Streams with the
> kafka-streams-testutil artifact.
>
>
>
>
>
> All of the changes in this release can be found in the release notes:
>
> https://www.apache.org/dist/kafka/2.0.0/RELEASE_NOTES.html
>
>
>
>
>
> You can download the source and binary release (Scala 2.11 and Scala 2.12)
> from:
>
> https://kafka.apache.org/downloads#2.0.0
> 
>
>
>
>
> ---
>
>
>
>
>
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
>
> ** The Producer API allows an application to publish a stream records to
>
> one or more Kafka topics.
>
>
>
> ** The Consumer API allows an application to subscribe to one or more
>
> topics and process the stream of records produced to them.
>
>
>
> ** The Streams API allows an appli

Dynamic matching of topics using regex for connector sink

2018-07-30 Thread Pratik Gaglani
Hi Dev,

With the release of the Kafka 1.1.0 (KIP-215) the support for the regex was 
added for the sink connector, however it is appears that the topics cannot be 
added dynamically matching the regex without restarting the connector. Can we 
add this in issues list? I am fairly new to Kafka.


-Pratik

Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread Manikumar
Thanks for driving the release!



On Mon, Jul 30, 2018 at 3:55 PM Rajini Sivaram  wrote:

> The Apache Kafka community is pleased to announce the release for
>
> Apache Kafka 2.0.0.
>
>
>
>
>
> This is a major release and includes significant new features from
>
> 40 KIPs. It contains fixes and improvements from 246 JIRAs, including
>
> a few critical bugs. Here is a summary of some notable changes:
>
> ** KIP-290 adds support for prefixed ACLs, simplifying access control
> management in large secure deployments. Bulk access to topics,
> consumer groups or transactional ids with a prefix can now be granted
> using a single rule. Access control for topic creation has also been
> improved to enable access to be granted to create specific topics or
> topics with a prefix.
>
> ** KIP-255 adds a framework for authenticating to Kafka brokers using
> OAuth2 bearer tokens. The SASL/OAUTHBEARER implementation is
> customizable using callbacks for token retrieval and validation.
>
> **Host name verification is now enabled by default for SSL connections
> to ensure that the default SSL configuration is not susceptible to
> man-in-the middle attacks. You can disable this verification for
> deployments where validation is performed using other mechanisms.
>
> ** You can now dynamically update SSL trust stores without broker restart.
> You can also configure security for broker listeners in ZooKeeper before
> starting brokers, including SSL key store and trust store passwords and
> JAAS configuration for SASL. With this new feature, you can store sensitive
> password configs in encrypted form in ZooKeeper rather than in cleartext
> in the broker properties file.
>
> ** The replication protocol has been improved to avoid log divergence
> between leader and follower during fast leader failover. We have also
> improved resilience of brokers by reducing the memory footprint of
> message down-conversions. By using message chunking, both memory
> usage and memory reference time have been reduced to avoid
> OutOfMemory errors in brokers.
>
> ** Kafka clients are now notified of throttling before any throttling is
> applied
> when quotas are enabled. This enables clients to distinguish between
> network errors and large throttle times when quotas are exceeded.
>
> ** We have added a configuration option for Kafka consumer to avoid
> indefinite blocking in the consumer.
>
> ** We have dropped support for Java 7 and removed the previously
> deprecated Scala producer and consumer.
>
> ** Kafka Connect includes a number of improvements and features.
> KIP-298 enables you to control how errors in connectors, transformations
> and converters are handled by enabling automatic retries and controlling
> the
> number of errors that are tolerated before the connector is stopped. More
> contextual information can be included in the logs to help diagnose
> problems
> and problematic messages consumed by sink connectors can be sent to a
> dead letter queue rather than forcing the connector to stop.
>
> ** KIP-297 adds a new extension point to move secrets out of connector
> configurations and integrate with any external key management system.
> The placeholders in connector configurations are only resolved before
> sending the configuration to the connector, ensuring that secrets are
> stored
> and managed securely in your preferred key management system and
> not exposed over the REST APIs or in log files.
>
> ** We have added a thin Scala wrapper API for our Kafka Streams DSL,
> which provides better type inference and better type safety during compile
> time. Scala users can have less boilerplate in their code, notably
> regarding
> Serdes with new implicit Serdes.
>
> ** Message headers are now supported in the Kafka Streams Processor API,
> allowing users to add and manipulate headers read from the source topics
> and propagate them to the sink topics.
>
> ** Windowed aggregations performance in Kafka Streams has been largely
> improved (sometimes by an order of magnitude) thanks to the new
> single-key-fetch API.
>
> ** We have further improved unit testibility of Kafka Streams with the
> kafka-streams-testutil artifact.
>
>
>
>
>
> All of the changes in this release can be found in the release notes:
>
> https://www.apache.org/dist/kafka/2.0.0/RELEASE_NOTES.html
>
>
>
>
>
> You can download the source and binary release (Scala 2.11 and Scala 2.12)
> from:
>
> https://kafka.apache.org/downloads#2.0.0
> 
>
>
>
>
> ---
>
>
>
>
>
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
>
> ** The Producer API allows an application to publish a stream records to
>
> one or more Kafka topics.
>
>
>
> ** The Consumer API allows an application to subscribe to one or more
>
> topics and process the stream of records produced to them.
>
>
>
> ** The Streams API allows an application to act

Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread Mickael Maison
Great news! Thanks for running the release

On Mon, Jul 30, 2018 at 12:20 PM, Manikumar  wrote:
> Thanks for driving the release!
>
>
>
> On Mon, Jul 30, 2018 at 3:55 PM Rajini Sivaram  wrote:
>
>> The Apache Kafka community is pleased to announce the release for
>>
>> Apache Kafka 2.0.0.
>>
>>
>>
>>
>>
>> This is a major release and includes significant new features from
>>
>> 40 KIPs. It contains fixes and improvements from 246 JIRAs, including
>>
>> a few critical bugs. Here is a summary of some notable changes:
>>
>> ** KIP-290 adds support for prefixed ACLs, simplifying access control
>> management in large secure deployments. Bulk access to topics,
>> consumer groups or transactional ids with a prefix can now be granted
>> using a single rule. Access control for topic creation has also been
>> improved to enable access to be granted to create specific topics or
>> topics with a prefix.
>>
>> ** KIP-255 adds a framework for authenticating to Kafka brokers using
>> OAuth2 bearer tokens. The SASL/OAUTHBEARER implementation is
>> customizable using callbacks for token retrieval and validation.
>>
>> **Host name verification is now enabled by default for SSL connections
>> to ensure that the default SSL configuration is not susceptible to
>> man-in-the middle attacks. You can disable this verification for
>> deployments where validation is performed using other mechanisms.
>>
>> ** You can now dynamically update SSL trust stores without broker restart.
>> You can also configure security for broker listeners in ZooKeeper before
>> starting brokers, including SSL key store and trust store passwords and
>> JAAS configuration for SASL. With this new feature, you can store sensitive
>> password configs in encrypted form in ZooKeeper rather than in cleartext
>> in the broker properties file.
>>
>> ** The replication protocol has been improved to avoid log divergence
>> between leader and follower during fast leader failover. We have also
>> improved resilience of brokers by reducing the memory footprint of
>> message down-conversions. By using message chunking, both memory
>> usage and memory reference time have been reduced to avoid
>> OutOfMemory errors in brokers.
>>
>> ** Kafka clients are now notified of throttling before any throttling is
>> applied
>> when quotas are enabled. This enables clients to distinguish between
>> network errors and large throttle times when quotas are exceeded.
>>
>> ** We have added a configuration option for Kafka consumer to avoid
>> indefinite blocking in the consumer.
>>
>> ** We have dropped support for Java 7 and removed the previously
>> deprecated Scala producer and consumer.
>>
>> ** Kafka Connect includes a number of improvements and features.
>> KIP-298 enables you to control how errors in connectors, transformations
>> and converters are handled by enabling automatic retries and controlling
>> the
>> number of errors that are tolerated before the connector is stopped. More
>> contextual information can be included in the logs to help diagnose
>> problems
>> and problematic messages consumed by sink connectors can be sent to a
>> dead letter queue rather than forcing the connector to stop.
>>
>> ** KIP-297 adds a new extension point to move secrets out of connector
>> configurations and integrate with any external key management system.
>> The placeholders in connector configurations are only resolved before
>> sending the configuration to the connector, ensuring that secrets are
>> stored
>> and managed securely in your preferred key management system and
>> not exposed over the REST APIs or in log files.
>>
>> ** We have added a thin Scala wrapper API for our Kafka Streams DSL,
>> which provides better type inference and better type safety during compile
>> time. Scala users can have less boilerplate in their code, notably
>> regarding
>> Serdes with new implicit Serdes.
>>
>> ** Message headers are now supported in the Kafka Streams Processor API,
>> allowing users to add and manipulate headers read from the source topics
>> and propagate them to the sink topics.
>>
>> ** Windowed aggregations performance in Kafka Streams has been largely
>> improved (sometimes by an order of magnitude) thanks to the new
>> single-key-fetch API.
>>
>> ** We have further improved unit testibility of Kafka Streams with the
>> kafka-streams-testutil artifact.
>>
>>
>>
>>
>>
>> All of the changes in this release can be found in the release notes:
>>
>> https://www.apache.org/dist/kafka/2.0.0/RELEASE_NOTES.html
>>
>>
>>
>>
>>
>> You can download the source and binary release (Scala 2.11 and Scala 2.12)
>> from:
>>
>> https://kafka.apache.org/downloads#2.0.0
>> 
>>
>>
>>
>>
>> ---
>>
>>
>>
>>
>>
>> Apache Kafka is a distributed streaming platform with four core APIs:
>>
>>
>>
>> ** The Producer API allows an application to publish a stream records to

Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted.

2018-07-30 Thread Adam Bellemare
Hi Guozhang et al

I was just reading the 2.0 release notes and noticed a section on Record
Headers.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-244%3A+Add+Record+Header+support+to+Kafka+Streams+Processor+API

I am not yet sure if the contents of a RecordHeader is propagated all the
way through the Sinks and Sources, but if it is, and if it remains attached
to the record (including null records) I may be able to ditch the
propagationWrapper for an implementation using RecordHeader. I am not yet
sure if this is doable, so if anyone understands RecordHeader impl better
than I, I would be happy to hear from you.

In the meantime, let me know of any questions. I believe this PR has a lot
of potential to solve problems for other people, as I have encountered a
number of other companies in the wild all home-brewing their own solutions
to come up with a method of handling relational data in streams.

Adam


On Fri, Jul 27, 2018 at 1:45 AM, Guozhang Wang  wrote:

> Hello Adam,
>
> Thanks for rebooting the discussion of this KIP ! Let me finish my pass on
> the wiki and get back to you soon. Sorry for the delays..
>
> Guozhang
>
> On Tue, Jul 24, 2018 at 6:08 AM, Adam Bellemare 
> wrote:
>
>> Let me kick this off with a few starting points that I would like to
>> generate some discussion on.
>>
>> 1) It seems to me that I will need to repartition the data twice - once on
>> the foreign key, and once back to the primary key. Is there anything I am
>> missing here?
>>
>> 2) I believe I will also need to materialize 3 state stores: the
>> prefixScan
>> SS, the highwater mark SS (for out-of-order resolution) and the final
>> state
>> store, due to the workflow I have laid out. I have not thought of a better
>> way yet, but would appreciate any input on this matter. I have gone back
>> through the mailing list for the previous discussions on this KIP, and I
>> did not see anything relating to resolving out-of-order compute. I cannot
>> see a way around the current three-SS structure that I have.
>>
>> 3) Caching is disabled on the prefixScan SS, as I do not know how to
>> resolve the iterator obtained from rocksDB with that of the cache. In
>> addition, I must ensure everything is flushed before scanning. Since the
>> materialized prefixScan SS is under "control" of the function, I do not
>> anticipate this to be a problem. Performance throughput will need to be
>> tested, but as Jan observed in his initial overview of this issue, it is
>> generally a surge of output events which affect performance moreso than
>> the
>> flush or prefixScan itself.
>>
>> Thoughts on any of these are greatly appreciated, since these elements are
>> really the cornerstone of the whole design. I can put up the code I have
>> written against 1.0.2 if we so desire, but first I was hoping to just
>> tackle some of the fundamental design proposals.
>>
>> Thanks,
>> Adam
>>
>>
>>
>> On Mon, Jul 23, 2018 at 10:05 AM, Adam Bellemare <
>> adam.bellem...@gmail.com>
>> wrote:
>>
>> > Here is the new discussion thread for KIP-213. I picked back up on the
>> KIP
>> > as this is something that we too at Flipp are now running in production.
>> > Jan started this last year, and I know that Trivago is also using
>> something
>> > similar in production, at least in terms of APIs and functionality.
>> >
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 213+Support+non-key+joining+in+KTable
>> >
>> > I do have an implementation of the code for Kafka 1.0.2 (our local
>> > production version) but I won't post it yet as I would like to focus on
>> the
>> > workflow and design first. That being said, I also need to add some
>> clearer
>> > integration tests (I did a lot of testing using a non-Kafka Streams
>> > framework) and clean up the code a bit more before putting it in a PR
>> > against trunk (I can do so later this week likely).
>> >
>> > Please take a look,
>> >
>> > Thanks
>> >
>> > Adam Bellemare
>> >
>> >
>>
>
>
>
> --
> -- Guozhang
>


Re: [DISCUSS] KIP-332: Update AclCommand to use AdminClient API

2018-07-30 Thread Manikumar
Bumping this up!

On Mon, Jul 23, 2018 at 8:00 PM Manikumar  wrote:

> Hi all,
>
> I have created a KIP to use AdminClient API in AclCommand (kafka-acls.sh)
>
>
> *https://cwiki.apache.org/confluence/display/KAFKA/KIP-332%3A+Update+AclCommand+to+use+AdminClient+API*
>
> 
>
> Please take a look.
>
> Thanks,
>


Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread Vahid S Hashemian
Such a good news on a Monday morning ...

Thank you Rajini for driving the release!

--Vahid




From:   Mickael Maison 
To: Users 
Cc: dev , annou...@apache.org, kafka-clients 

Date:   07/30/2018 04:37 AM
Subject:Re: [ANNOUNCE] Apache Kafka 2.0.0 Released



Great news! Thanks for running the release

On Mon, Jul 30, 2018 at 12:20 PM, Manikumar  
wrote:
> Thanks for driving the release!
>
>
>
> On Mon, Jul 30, 2018 at 3:55 PM Rajini Sivaram  
wrote:
>
>> The Apache Kafka community is pleased to announce the release for
>>
>> Apache Kafka 2.0.0.
>>
>>
>>
>>
>>
>> This is a major release and includes significant new features from
>>
>> 40 KIPs. It contains fixes and improvements from 246 JIRAs, including
>>
>> a few critical bugs. Here is a summary of some notable changes:
>>
>> ** KIP-290 adds support for prefixed ACLs, simplifying access control
>> management in large secure deployments. Bulk access to topics,
>> consumer groups or transactional ids with a prefix can now be granted
>> using a single rule. Access control for topic creation has also been
>> improved to enable access to be granted to create specific topics or
>> topics with a prefix.
>>
>> ** KIP-255 adds a framework for authenticating to Kafka brokers using
>> OAuth2 bearer tokens. The SASL/OAUTHBEARER implementation is
>> customizable using callbacks for token retrieval and validation.
>>
>> **Host name verification is now enabled by default for SSL connections
>> to ensure that the default SSL configuration is not susceptible to
>> man-in-the middle attacks. You can disable this verification for
>> deployments where validation is performed using other mechanisms.
>>
>> ** You can now dynamically update SSL trust stores without broker 
restart.
>> You can also configure security for broker listeners in ZooKeeper 
before
>> starting brokers, including SSL key store and trust store passwords and
>> JAAS configuration for SASL. With this new feature, you can store 
sensitive
>> password configs in encrypted form in ZooKeeper rather than in 
cleartext
>> in the broker properties file.
>>
>> ** The replication protocol has been improved to avoid log divergence
>> between leader and follower during fast leader failover. We have also
>> improved resilience of brokers by reducing the memory footprint of
>> message down-conversions. By using message chunking, both memory
>> usage and memory reference time have been reduced to avoid
>> OutOfMemory errors in brokers.
>>
>> ** Kafka clients are now notified of throttling before any throttling 
is
>> applied
>> when quotas are enabled. This enables clients to distinguish between
>> network errors and large throttle times when quotas are exceeded.
>>
>> ** We have added a configuration option for Kafka consumer to avoid
>> indefinite blocking in the consumer.
>>
>> ** We have dropped support for Java 7 and removed the previously
>> deprecated Scala producer and consumer.
>>
>> ** Kafka Connect includes a number of improvements and features.
>> KIP-298 enables you to control how errors in connectors, 
transformations
>> and converters are handled by enabling automatic retries and 
controlling
>> the
>> number of errors that are tolerated before the connector is stopped. 
More
>> contextual information can be included in the logs to help diagnose
>> problems
>> and problematic messages consumed by sink connectors can be sent to a
>> dead letter queue rather than forcing the connector to stop.
>>
>> ** KIP-297 adds a new extension point to move secrets out of connector
>> configurations and integrate with any external key management system.
>> The placeholders in connector configurations are only resolved before
>> sending the configuration to the connector, ensuring that secrets are
>> stored
>> and managed securely in your preferred key management system and
>> not exposed over the REST APIs or in log files.
>>
>> ** We have added a thin Scala wrapper API for our Kafka Streams DSL,
>> which provides better type inference and better type safety during 
compile
>> time. Scala users can have less boilerplate in their code, notably
>> regarding
>> Serdes with new implicit Serdes.
>>
>> ** Message headers are now supported in the Kafka Streams Processor 
API,
>> allowing users to add and manipulate headers read from the source 
topics
>> and propagate them to the sink topics.
>>
>> ** Windowed aggregations performance in Kafka Streams has been largely
>> improved (sometimes by an order of magnitude) thanks to the new
>> single-key-fetch API.
>>
>> ** We have further improved unit testibility of Kafka Streams with the
>> kafka-streams-testutil artifact.
>>
>>
>>
>>
>>
>> All of the changes in this release can be found in the release notes:
>>
>> 
https://www.apache.org/dist/kafka/2.0.0/RELEASE_NOTES.html

>>
>>
>>
>>
>>
>> You can download the source and binary release (Scala 2.11 and Scala 
2.12)
>> from:
>>
>> 
https://kafka.apache.org/downloads#2.0.0

>> <
https://kafka.apache.org/downloads

Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread Dong Lin
This is great news! Thanks for driving this Rajini!!

On Mon, Jul 30, 2018 at 3:25 AM, Rajini Sivaram  wrote:

> The Apache Kafka community is pleased to announce the release for
>
> Apache Kafka 2.0.0.
>
>
>
>
>
> This is a major release and includes significant new features from
>
> 40 KIPs. It contains fixes and improvements from 246 JIRAs, including
>
> a few critical bugs. Here is a summary of some notable changes:
>
> ** KIP-290 adds support for prefixed ACLs, simplifying access control
> management in large secure deployments. Bulk access to topics,
> consumer groups or transactional ids with a prefix can now be granted
> using a single rule. Access control for topic creation has also been
> improved to enable access to be granted to create specific topics or
> topics with a prefix.
>
> ** KIP-255 adds a framework for authenticating to Kafka brokers using
> OAuth2 bearer tokens. The SASL/OAUTHBEARER implementation is
> customizable using callbacks for token retrieval and validation.
>
> **Host name verification is now enabled by default for SSL connections
> to ensure that the default SSL configuration is not susceptible to
> man-in-the middle attacks. You can disable this verification for
> deployments where validation is performed using other mechanisms.
>
> ** You can now dynamically update SSL trust stores without broker restart.
> You can also configure security for broker listeners in ZooKeeper before
> starting brokers, including SSL key store and trust store passwords and
> JAAS configuration for SASL. With this new feature, you can store sensitive
> password configs in encrypted form in ZooKeeper rather than in cleartext
> in the broker properties file.
>
> ** The replication protocol has been improved to avoid log divergence
> between leader and follower during fast leader failover. We have also
> improved resilience of brokers by reducing the memory footprint of
> message down-conversions. By using message chunking, both memory
> usage and memory reference time have been reduced to avoid
> OutOfMemory errors in brokers.
>
> ** Kafka clients are now notified of throttling before any throttling is
> applied
> when quotas are enabled. This enables clients to distinguish between
> network errors and large throttle times when quotas are exceeded.
>
> ** We have added a configuration option for Kafka consumer to avoid
> indefinite blocking in the consumer.
>
> ** We have dropped support for Java 7 and removed the previously
> deprecated Scala producer and consumer.
>
> ** Kafka Connect includes a number of improvements and features.
> KIP-298 enables you to control how errors in connectors, transformations
> and converters are handled by enabling automatic retries and controlling
> the
> number of errors that are tolerated before the connector is stopped. More
> contextual information can be included in the logs to help diagnose
> problems
> and problematic messages consumed by sink connectors can be sent to a
> dead letter queue rather than forcing the connector to stop.
>
> ** KIP-297 adds a new extension point to move secrets out of connector
> configurations and integrate with any external key management system.
> The placeholders in connector configurations are only resolved before
> sending the configuration to the connector, ensuring that secrets are
> stored
> and managed securely in your preferred key management system and
> not exposed over the REST APIs or in log files.
>
> ** We have added a thin Scala wrapper API for our Kafka Streams DSL,
> which provides better type inference and better type safety during compile
> time. Scala users can have less boilerplate in their code, notably
> regarding
> Serdes with new implicit Serdes.
>
> ** Message headers are now supported in the Kafka Streams Processor API,
> allowing users to add and manipulate headers read from the source topics
> and propagate them to the sink topics.
>
> ** Windowed aggregations performance in Kafka Streams has been largely
> improved (sometimes by an order of magnitude) thanks to the new
> single-key-fetch API.
>
> ** We have further improved unit testibility of Kafka Streams with the
> kafka-streams-testutil artifact.
>
>
>
>
>
> All of the changes in this release can be found in the release notes:
>
> https://www.apache.org/dist/kafka/2.0.0/RELEASE_NOTES.html
>
>
>
>
>
> You can download the source and binary release (Scala 2.11 and Scala 2.12)
> from:
>
> https://kafka.apache.org/downloads#2.0.0
> 
>
>
>
> 
> ---
>
>
>
>
>
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
>
> ** The Producer API allows an application to publish a stream records to
>
> one or more Kafka topics.
>
>
>
> ** The Consumer API allows an application to subscribe to one or more
>
> topics and process the stream of records produced to them.
>
>
>
> ** The Streams API allows 

[jira] [Created] (KAFKA-7216) Exception while running kafka-acls.sh from 1.0 env on target Kafka env with 1.1.1

2018-07-30 Thread Satish Duggana (JIRA)
Satish Duggana created KAFKA-7216:
-

 Summary: Exception while running kafka-acls.sh from 1.0 env on 
target Kafka env with 1.1.1
 Key: KAFKA-7216
 URL: https://issues.apache.org/jira/browse/KAFKA-7216
 Project: Kafka
  Issue Type: Bug
Affects Versions: 1.1.1, 1.1.0
Reporter: Satish Duggana


When `kafka-acls.sh` with SimpleAclAuthorizer on target Kafka cluster with 
1.1.1 version, it throws the below error.
{code:java}
kafka.common.KafkaException: DelegationToken not a valid resourceType name. The 
valid names are Topic,Group,Cluster,TransactionalId
  at 
kafka.security.auth.ResourceType$$anonfun$fromString$1.apply(ResourceType.scala:56)
  at 
kafka.security.auth.ResourceType$$anonfun$fromString$1.apply(ResourceType.scala:56)
  at scala.Option.getOrElse(Option.scala:121)
  at kafka.security.auth.ResourceType$.fromString(ResourceType.scala:56)
  at 
kafka.security.auth.SimpleAclAuthorizer$$anonfun$loadCache$1$$anonfun$apply$mcV$sp$1.apply(SimpleAclAuthorizer.scala:233)
  at 
kafka.security.auth.SimpleAclAuthorizer$$anonfun$loadCache$1$$anonfun$apply$mcV$sp$1.apply(SimpleAclAuthorizer.scala:232)
  at scala.collection.Iterator$class.foreach(Iterator.scala:891)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
  at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
  at 
kafka.security.auth.SimpleAclAuthorizer$$anonfun$loadCache$1.apply$mcV$sp(SimpleAclAuthorizer.scala:232)
  at 
kafka.security.auth.SimpleAclAuthorizer$$anonfun$loadCache$1.apply(SimpleAclAuthorizer.scala:230)
  at 
kafka.security.auth.SimpleAclAuthorizer$$anonfun$loadCache$1.apply(SimpleAclAuthorizer.scala:230)
  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:216)
  at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:224)
  at 
kafka.security.auth.SimpleAclAuthorizer.loadCache(SimpleAclAuthorizer.scala:230)
  at 
kafka.security.auth.SimpleAclAuthorizer.configure(SimpleAclAuthorizer.scala:114)
  at kafka.admin.AclCommand$.withAuthorizer(AclCommand.scala:83)
  at kafka.admin.AclCommand$.addAcl(AclCommand.scala:93)
  at kafka.admin.AclCommand$.main(AclCommand.scala:53)
  at kafka.admin.AclCommand.main(AclCommand.scala)
{code}
 
 This is because it tries to get all the resource types registered from ZK path 
and it throws error when `DelegationToken` resource is not defined in 
`ResourceType` of client's Kafka version(which is earlier than 1.1.x)
  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread Ismael Juma
Thanks to everyone who contributed to the release (including testing and
bug reports)! And thank you Rajini for managing the release.

Ismael

On Mon, 30 Jul 2018, 03:25 Rajini Sivaram,  wrote:

> The Apache Kafka community is pleased to announce the release for
>
> Apache Kafka 2.0.0.
>
>
>
>
>
> This is a major release and includes significant new features from
>
> 40 KIPs. It contains fixes and improvements from 246 JIRAs, including
>
> a few critical bugs. Here is a summary of some notable changes:
>
> ** KIP-290 adds support for prefixed ACLs, simplifying access control
> management in large secure deployments. Bulk access to topics,
> consumer groups or transactional ids with a prefix can now be granted
> using a single rule. Access control for topic creation has also been
> improved to enable access to be granted to create specific topics or
> topics with a prefix.
>
> ** KIP-255 adds a framework for authenticating to Kafka brokers using
> OAuth2 bearer tokens. The SASL/OAUTHBEARER implementation is
> customizable using callbacks for token retrieval and validation.
>
> **Host name verification is now enabled by default for SSL connections
> to ensure that the default SSL configuration is not susceptible to
> man-in-the middle attacks. You can disable this verification for
> deployments where validation is performed using other mechanisms.
>
> ** You can now dynamically update SSL trust stores without broker restart.
> You can also configure security for broker listeners in ZooKeeper before
> starting brokers, including SSL key store and trust store passwords and
> JAAS configuration for SASL. With this new feature, you can store sensitive
> password configs in encrypted form in ZooKeeper rather than in cleartext
> in the broker properties file.
>
> ** The replication protocol has been improved to avoid log divergence
> between leader and follower during fast leader failover. We have also
> improved resilience of brokers by reducing the memory footprint of
> message down-conversions. By using message chunking, both memory
> usage and memory reference time have been reduced to avoid
> OutOfMemory errors in brokers.
>
> ** Kafka clients are now notified of throttling before any throttling is
> applied
> when quotas are enabled. This enables clients to distinguish between
> network errors and large throttle times when quotas are exceeded.
>
> ** We have added a configuration option for Kafka consumer to avoid
> indefinite blocking in the consumer.
>
> ** We have dropped support for Java 7 and removed the previously
> deprecated Scala producer and consumer.
>
> ** Kafka Connect includes a number of improvements and features.
> KIP-298 enables you to control how errors in connectors, transformations
> and converters are handled by enabling automatic retries and controlling
> the
> number of errors that are tolerated before the connector is stopped. More
> contextual information can be included in the logs to help diagnose
> problems
> and problematic messages consumed by sink connectors can be sent to a
> dead letter queue rather than forcing the connector to stop.
>
> ** KIP-297 adds a new extension point to move secrets out of connector
> configurations and integrate with any external key management system.
> The placeholders in connector configurations are only resolved before
> sending the configuration to the connector, ensuring that secrets are
> stored
> and managed securely in your preferred key management system and
> not exposed over the REST APIs or in log files.
>
> ** We have added a thin Scala wrapper API for our Kafka Streams DSL,
> which provides better type inference and better type safety during compile
> time. Scala users can have less boilerplate in their code, notably
> regarding
> Serdes with new implicit Serdes.
>
> ** Message headers are now supported in the Kafka Streams Processor API,
> allowing users to add and manipulate headers read from the source topics
> and propagate them to the sink topics.
>
> ** Windowed aggregations performance in Kafka Streams has been largely
> improved (sometimes by an order of magnitude) thanks to the new
> single-key-fetch API.
>
> ** We have further improved unit testibility of Kafka Streams with the
> kafka-streams-testutil artifact.
>
>
>
>
>
> All of the changes in this release can be found in the release notes:
>
> https://www.apache.org/dist/kafka/2.0.0/RELEASE_NOTES.html
>
>
>
>
>
> You can download the source and binary release (Scala 2.11 and Scala 2.12)
> from:
>
> https://kafka.apache.org/downloads#2.0.0
> 
>
>
>
>
> ---
>
>
>
>
>
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
>
> ** The Producer API allows an application to publish a stream records to
>
> one or more Kafka topics.
>
>
>
> ** The Consumer API allows an application to subscribe to one or more
>
> topics and

Re: [DISCUSS] KIP-320: Allow fetchers to detect and handle log truncation

2018-07-30 Thread Jason Gustafson
Hey Dong,

Thanks for the detailed review. Responses below:

1/2: Thanks for noticing the inconsistency. Would it be reasonable to
simply call it LeaderEpoch for both APIs?

3: I agree it should be a map. I will update.

4: Fair point. I think we should always be able to identify an offset.
Let's remove the Optional for now and reconsider if we find an unhandled
case during implementation.

5: Yeah, I was thinking about this. The two error codes could be handled
similarly, so we might merge them. Mainly I was thinking that it will be
useful for consumers/replicas to know whether they are ahead or behind the
leader. For example, if a consumer sees UNKNOWN_LEADER_EPOCH, it need not
refresh metadata. Or if a replica sees a FENCED_LEADER_EPOCH error, it
could just stop fetching and await the LeaderAndIsr request that it is
missing. It probably also makes debugging a little bit easier. I guess I'm
a bit inclined to keep both error codes, but I'm open to reconsideration if
you feel strongly. Another point to consider is whether we should continue
using NOT_LEADER_FOR_PARTITION if a follower receives an unexpected fetch.
The leader epoch would be different in this case so we could use one of the
invalid epoch error codes instead since they contain more information.

6: I agree the name is not ideal in that scenario. What if we overloaded
`seek`?

7: Sure, I will mention this.


Thanks,
Jason

On Fri, Jul 27, 2018 at 6:17 PM, Dong Lin  wrote:

> Hey Jason,
>
> Thanks for the update! I agree with the current proposal overall. I have
> some minor comments related to naming etc.
>
> 1) I am not strong and will just leave it here for discussion. Would it be
> better to rename "CurrentLeaderEpoch" to "ExpectedLeaderEpoch" for the new
> field in the OffsetsForLeaderEpochRequest? The reason is that
> "CurrentLeaderEpoch" may not necessarily be true current leader epoch if
> the consumer has stale metadata. "ExpectedLeaderEpoch" shows that this
> epoch is what consumer expects on the broker which may or may not be the
> true value.
>
> 2) Currently we add the field "LeaderEpoch" to FetchRequest and the field
> "CurrentLeaderEpoch" to OffsetsForLeaderEpochRequest. Given that both
> fields are compared with the leaderEpoch in the broker, would it be better
> to give them the same name?
>
> 3) Currently LogTruncationException.truncationOffset() returns
> Optional to user. Should it return
> Optional> to handle the scenario
> where leaderEpoch of multiple partitions are different from the leaderEpoch
> in the broker?
>
> 4) Currently LogTruncationException.truncationOffset() returns an Optional
> value. Could you explain a bit more when it will return Optional.empty()? I
> am trying to understand whether it is simpler and reasonable to
> replace Optional.empty()
> with OffsetMetadata(offset=last_fetched_offset, leaderEpoch=-1).
>
> 5) Do we also need to add a new retriable exception for error code
> FENCED_LEADER_EPOCH? And do we need to define both FENCED_LEADER_EPOCH
> and UNKNOWN_LEADER_EPOCH.
> It seems that the current KIP uses these two error codes in the same way
> and the exception for these two error codes is not exposed to the user.
> Maybe we should combine them into one error, e.g. INVALID_LEADER_EPOCH?
>
> 6) For users who has turned off auto offset reset, when consumer.poll()
> throw LogTruncationException, it seems that user will most likely call
> seekToCommitted(offset,
> leaderEpoch) where offset and leaderEpoch are obtained from
> LogTruncationException.truncationOffset(). In this case, the offset used
> here is not committed, which is inconsistent from the method name
> seekToCommitted(...). Would it be better to rename the method to e.g.
> seekToLastConsumedMessage()?
>
> 7) Per point 3 in Jun's comment, would it be useful to explicitly specify
> in the KIP that we will log the truncation event if user has turned on auto
> offset reset policy?
>
>
> Thanks,
> Dong
>
>
> On Fri, Jul 27, 2018 at 12:39 PM, Jason Gustafson 
> wrote:
>
> > Thanks Anna, you are right on both points. I updated the KIP.
> >
> > -Jason
> >
> > On Thu, Jul 26, 2018 at 2:08 PM, Anna Povzner  wrote:
> >
> > > Hi Jason,
> > >
> > > Thanks for the update. I agree with the current proposal.
> > >
> > > Two minor comments:
> > > 1) In “API Changes” section, first paragraph says that “users can catch
> > the
> > > more specific exception type and use the new `seekToNearest()` API
> > defined
> > > below.”. Since LogTruncationException “will include the partitions that
> > > were truncated and the offset of divergence”., shouldn’t the client use
> > > seek(offset) to seek to the offset of divergence in response to the
> > > exception?
> > > 2) In “Protocol Changes” section, OffsetsForLeaderEpoch subsection says
> > > “Note
> > > that consumers will send a sentinel value (-1) for the current epoch
> and
> > > the broker will simply disregard that validation.”. Is that still true
> > with
> > > MetadataResponse containing leader epoch?
> > >

Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread Dongjin Lee
Thank you for your great works! Thanks again for the commiters and all the
contributors!

On Tue, Jul 31, 2018 at 1:05 AM Ismael Juma  wrote:

> Thanks to everyone who contributed to the release (including testing and
> bug reports)! And thank you Rajini for managing the release.
>
> Ismael
>
> On Mon, 30 Jul 2018, 03:25 Rajini Sivaram,  wrote:
>
> > The Apache Kafka community is pleased to announce the release for
> >
> > Apache Kafka 2.0.0.
> >
> >
> >
> >
> >
> > This is a major release and includes significant new features from
> >
> > 40 KIPs. It contains fixes and improvements from 246 JIRAs, including
> >
> > a few critical bugs. Here is a summary of some notable changes:
> >
> > ** KIP-290 adds support for prefixed ACLs, simplifying access control
> > management in large secure deployments. Bulk access to topics,
> > consumer groups or transactional ids with a prefix can now be granted
> > using a single rule. Access control for topic creation has also been
> > improved to enable access to be granted to create specific topics or
> > topics with a prefix.
> >
> > ** KIP-255 adds a framework for authenticating to Kafka brokers using
> > OAuth2 bearer tokens. The SASL/OAUTHBEARER implementation is
> > customizable using callbacks for token retrieval and validation.
> >
> > **Host name verification is now enabled by default for SSL connections
> > to ensure that the default SSL configuration is not susceptible to
> > man-in-the middle attacks. You can disable this verification for
> > deployments where validation is performed using other mechanisms.
> >
> > ** You can now dynamically update SSL trust stores without broker
> restart.
> > You can also configure security for broker listeners in ZooKeeper before
> > starting brokers, including SSL key store and trust store passwords and
> > JAAS configuration for SASL. With this new feature, you can store
> sensitive
> > password configs in encrypted form in ZooKeeper rather than in cleartext
> > in the broker properties file.
> >
> > ** The replication protocol has been improved to avoid log divergence
> > between leader and follower during fast leader failover. We have also
> > improved resilience of brokers by reducing the memory footprint of
> > message down-conversions. By using message chunking, both memory
> > usage and memory reference time have been reduced to avoid
> > OutOfMemory errors in brokers.
> >
> > ** Kafka clients are now notified of throttling before any throttling is
> > applied
> > when quotas are enabled. This enables clients to distinguish between
> > network errors and large throttle times when quotas are exceeded.
> >
> > ** We have added a configuration option for Kafka consumer to avoid
> > indefinite blocking in the consumer.
> >
> > ** We have dropped support for Java 7 and removed the previously
> > deprecated Scala producer and consumer.
> >
> > ** Kafka Connect includes a number of improvements and features.
> > KIP-298 enables you to control how errors in connectors, transformations
> > and converters are handled by enabling automatic retries and controlling
> > the
> > number of errors that are tolerated before the connector is stopped. More
> > contextual information can be included in the logs to help diagnose
> > problems
> > and problematic messages consumed by sink connectors can be sent to a
> > dead letter queue rather than forcing the connector to stop.
> >
> > ** KIP-297 adds a new extension point to move secrets out of connector
> > configurations and integrate with any external key management system.
> > The placeholders in connector configurations are only resolved before
> > sending the configuration to the connector, ensuring that secrets are
> > stored
> > and managed securely in your preferred key management system and
> > not exposed over the REST APIs or in log files.
> >
> > ** We have added a thin Scala wrapper API for our Kafka Streams DSL,
> > which provides better type inference and better type safety during
> compile
> > time. Scala users can have less boilerplate in their code, notably
> > regarding
> > Serdes with new implicit Serdes.
> >
> > ** Message headers are now supported in the Kafka Streams Processor API,
> > allowing users to add and manipulate headers read from the source topics
> > and propagate them to the sink topics.
> >
> > ** Windowed aggregations performance in Kafka Streams has been largely
> > improved (sometimes by an order of magnitude) thanks to the new
> > single-key-fetch API.
> >
> > ** We have further improved unit testibility of Kafka Streams with the
> > kafka-streams-testutil artifact.
> >
> >
> >
> >
> >
> > All of the changes in this release can be found in the release notes:
> >
> > https://www.apache.org/dist/kafka/2.0.0/RELEASE_NOTES.html
> >
> >
> >
> >
> >
> > You can download the source and binary release (Scala 2.11 and Scala
> 2.12)
> > from:
> >
> > https://kafka.apache.org/downloads#2.0.0
> > 
> >
> >
> >
> >
> >

[jira] [Created] (KAFKA-7217) Loading dynamic topic data into kafka connector sink using regex

2018-07-30 Thread Pratik Gaglani (JIRA)
Pratik Gaglani created KAFKA-7217:
-

 Summary: Loading dynamic topic data into kafka connector sink 
using regex
 Key: KAFKA-7217
 URL: https://issues.apache.org/jira/browse/KAFKA-7217
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Affects Versions: 1.1.0
Reporter: Pratik Gaglani


The new feature to use regex KAFKA-3074

in connectors, however it seems that the topic data from the newly added topics 
after the connector has been started is not consumed until the connector is 
restarted. We have a need to dynamically added new topic and have connector 
consume the topic based on regex defined in properties of connector. How can it 
be achieved? Ex: regex: topic-.* topic: topic-1, topic-2 If I introduce new 
topic topic-3, then how can I make the connector consume the topic data without 
restarting it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[VOTE] KIP-328: Ability to suppress updates for KTables

2018-07-30 Thread John Roesler
Hello devs,

The discussion of KIP-328 has gone some time with no new comments, so I am
calling for a vote!

Here's the KIP: https://cwiki.apache.org/confluence/x/sQU0BQ

The basic idea is to provide:
* more usable control over update rate (vs the current state store caches)
* the final-result-for-windowed-computations feature which several people
have requested

Thanks,
-John


Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread Sean Glover
Congrats to everyone involved.  Releasing "2.0.0" is a big achievement :)

On Mon, Jul 30, 2018 at 12:35 PM Dongjin Lee  wrote:

> Thank you for your great works! Thanks again for the commiters and all the
> contributors!
>
> On Tue, Jul 31, 2018 at 1:05 AM Ismael Juma  wrote:
>
> > Thanks to everyone who contributed to the release (including testing and
> > bug reports)! And thank you Rajini for managing the release.
> >
> > Ismael
> >
> > On Mon, 30 Jul 2018, 03:25 Rajini Sivaram,  wrote:
> >
> > > The Apache Kafka community is pleased to announce the release for
> > >
> > > Apache Kafka 2.0.0.
> > >
> > >
> > >
> > >
> > >
> > > This is a major release and includes significant new features from
> > >
> > > 40 KIPs. It contains fixes and improvements from 246 JIRAs, including
> > >
> > > a few critical bugs. Here is a summary of some notable changes:
> > >
> > > ** KIP-290 adds support for prefixed ACLs, simplifying access control
> > > management in large secure deployments. Bulk access to topics,
> > > consumer groups or transactional ids with a prefix can now be granted
> > > using a single rule. Access control for topic creation has also been
> > > improved to enable access to be granted to create specific topics or
> > > topics with a prefix.
> > >
> > > ** KIP-255 adds a framework for authenticating to Kafka brokers using
> > > OAuth2 bearer tokens. The SASL/OAUTHBEARER implementation is
> > > customizable using callbacks for token retrieval and validation.
> > >
> > > **Host name verification is now enabled by default for SSL connections
> > > to ensure that the default SSL configuration is not susceptible to
> > > man-in-the middle attacks. You can disable this verification for
> > > deployments where validation is performed using other mechanisms.
> > >
> > > ** You can now dynamically update SSL trust stores without broker
> > restart.
> > > You can also configure security for broker listeners in ZooKeeper
> before
> > > starting brokers, including SSL key store and trust store passwords and
> > > JAAS configuration for SASL. With this new feature, you can store
> > sensitive
> > > password configs in encrypted form in ZooKeeper rather than in
> cleartext
> > > in the broker properties file.
> > >
> > > ** The replication protocol has been improved to avoid log divergence
> > > between leader and follower during fast leader failover. We have also
> > > improved resilience of brokers by reducing the memory footprint of
> > > message down-conversions. By using message chunking, both memory
> > > usage and memory reference time have been reduced to avoid
> > > OutOfMemory errors in brokers.
> > >
> > > ** Kafka clients are now notified of throttling before any throttling
> is
> > > applied
> > > when quotas are enabled. This enables clients to distinguish between
> > > network errors and large throttle times when quotas are exceeded.
> > >
> > > ** We have added a configuration option for Kafka consumer to avoid
> > > indefinite blocking in the consumer.
> > >
> > > ** We have dropped support for Java 7 and removed the previously
> > > deprecated Scala producer and consumer.
> > >
> > > ** Kafka Connect includes a number of improvements and features.
> > > KIP-298 enables you to control how errors in connectors,
> transformations
> > > and converters are handled by enabling automatic retries and
> controlling
> > > the
> > > number of errors that are tolerated before the connector is stopped.
> More
> > > contextual information can be included in the logs to help diagnose
> > > problems
> > > and problematic messages consumed by sink connectors can be sent to a
> > > dead letter queue rather than forcing the connector to stop.
> > >
> > > ** KIP-297 adds a new extension point to move secrets out of connector
> > > configurations and integrate with any external key management system.
> > > The placeholders in connector configurations are only resolved before
> > > sending the configuration to the connector, ensuring that secrets are
> > > stored
> > > and managed securely in your preferred key management system and
> > > not exposed over the REST APIs or in log files.
> > >
> > > ** We have added a thin Scala wrapper API for our Kafka Streams DSL,
> > > which provides better type inference and better type safety during
> > compile
> > > time. Scala users can have less boilerplate in their code, notably
> > > regarding
> > > Serdes with new implicit Serdes.
> > >
> > > ** Message headers are now supported in the Kafka Streams Processor
> API,
> > > allowing users to add and manipulate headers read from the source
> topics
> > > and propagate them to the sink topics.
> > >
> > > ** Windowed aggregations performance in Kafka Streams has been largely
> > > improved (sometimes by an order of magnitude) thanks to the new
> > > single-key-fetch API.
> > >
> > > ** We have further improved unit testibility of Kafka Streams with the
> > > kafka-streams-testutil artifact.
> > >
> > >
> > >
> > >
> > >
> > > 

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

2018-07-30 Thread Lucas Wang
Thanks for your review, Dong.
Ack that these configs will have a bigger impact for users.

On the other hand, I would argue that the request queue becoming full
may or may not be a rare scenario.
How often the request queue gets full depends on the request incoming rate,
the request processing rate, and the size of the request queue.
When that happens, the dedicated endpoints design can better handle
it than any of the previously discussed options.

Another reason I made the change was that I have the same taste
as Becket that it's a better separation of the control plane from the data
plane.

Finally, I want to clarify that this change is NOT motivated by the
out-of-order
processing discussion. The latter problem is orthogonal to this KIP, and it
can happen in any of the design options we discussed for this KIP so far.
So I'd like to address out-of-order processing separately in another thread,
and avoid mentioning it in this KIP.

Thanks,
Lucas

On Fri, Jul 27, 2018 at 7:51 PM, Dong Lin  wrote:

> Hey Lucas,
>
> Thanks for the update.
>
> The current KIP propose new broker configs "listeners.for.controller" and
> "advertised.listeners.for.controller". This is going to be a big change
> since listeners are among the most important configs that every user needs
> to change. According to the rejected alternative section, it seems that the
> reason to add these two configs is to improve performance when the data
> request queue is full rather than for correctness. It should be a very rare
> scenario and I am not sure we should add configs for all users just to
> improve the performance in such rare scenario.
>
> Also, if the new design is based on the issues which are discovered in the
> recent discussion, e.g. out of order processing if we don't use a dedicated
> thread for controller request, it may be useful to explain the problem in
> the motivation section.
>
> Thanks,
> Dong
>
> On Fri, Jul 27, 2018 at 1:28 PM, Lucas Wang  wrote:
>
> > A kind reminder for review of this KIP.
> >
> > Thank you very much!
> > Lucas
> >
> > On Wed, Jul 25, 2018 at 10:23 PM, Lucas Wang 
> > wrote:
> >
> > > Hi All,
> > >
> > > I've updated the KIP by adding the dedicated endpoints for controller
> > > connections,
> > > and pinning threads for controller requests.
> > > Also I've updated the title of this KIP. Please take a look and let me
> > > know your feedback.
> > >
> > > Thanks a lot for your time!
> > > Lucas
> > >
> > > On Tue, Jul 24, 2018 at 10:19 AM, Mayuresh Gharat <
> > > gharatmayures...@gmail.com> wrote:
> > >
> > >> Hi Lucas,
> > >> I agree, if we want to go forward with a separate controller plane and
> > >> data
> > >> plane and completely isolate them, having a separate port for
> controller
> > >> with a separate Acceptor and a Processor sounds ideal to me.
> > >>
> > >> Thanks,
> > >>
> > >> Mayuresh
> > >>
> > >>
> > >> On Mon, Jul 23, 2018 at 11:04 PM Becket Qin 
> > wrote:
> > >>
> > >> > Hi Lucas,
> > >> >
> > >> > Yes, I agree that a dedicated end to end control flow would be
> ideal.
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Jiangjie (Becket) Qin
> > >> >
> > >> > On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang 
> > >> wrote:
> > >> >
> > >> > > Thanks for the comment, Becket.
> > >> > > So far, we've been trying to avoid making any request handler
> thread
> > >> > > special.
> > >> > > But if we were to follow that path in order to make the two planes
> > >> more
> > >> > > isolated,
> > >> > > what do you think about also having a dedicated processor thread,
> > >> > > and dedicated port for the controller?
> > >> > >
> > >> > > Today one processor thread can handle multiple connections, let's
> > say
> > >> 100
> > >> > > connections
> > >> > >
> > >> > > represented by connection0, ... connection99, among which
> > >> connection0-98
> > >> > > are from clients, while connection99 is from
> > >> > >
> > >> > > the controller. Further let's say after one selector polling,
> there
> > >> are
> > >> > > incoming requests on all connections.
> > >> > >
> > >> > > When the request queue is full, (either the data request being
> full
> > in
> > >> > the
> > >> > > two queue design, or
> > >> > >
> > >> > > the one single queue being full in the deque design), the
> processor
> > >> > thread
> > >> > > will be blocked first
> > >> > >
> > >> > > when trying to enqueue the data request from connection0, then
> > >> possibly
> > >> > > blocked for the data request
> > >> > >
> > >> > > from connection1, ... etc even though the controller request is
> > ready
> > >> to
> > >> > be
> > >> > > enqueued.
> > >> > >
> > >> > > To solve this problem, it seems we would need to have a separate
> > port
> > >> > > dedicated to
> > >> > >
> > >> > > the controller, a dedicated processor thread, a dedicated
> controller
> > >> > > request queue,
> > >> > >
> > >> > > and pinning of one request handler thread for controller requests.
> > >> > >
> > >> > > Thanks,
> > >> > > Lucas
> > >> > >
> > >> > >
> > >> > > On

Re: [VOTE] KIP-328: Ability to suppress updates for KTables

2018-07-30 Thread Ted Yu
+1

On Mon, Jul 30, 2018 at 11:46 AM John Roesler  wrote:

> Hello devs,
>
> The discussion of KIP-328 has gone some time with no new comments, so I am
> calling for a vote!
>
> Here's the KIP: https://cwiki.apache.org/confluence/x/sQU0BQ
>
> The basic idea is to provide:
> * more usable control over update rate (vs the current state store caches)
> * the final-result-for-windowed-computations feature which several people
> have requested
>
> Thanks,
> -John
>


Re: [DISCUSS] KIP-332: Update AclCommand to use AdminClient API

2018-07-30 Thread Colin McCabe
Hi Manikumar,

It's great that you are taking a look at this!  Much needed.

Just one note: I assume that --authorizer-properties is no longer required if 
the --bootstrap-server option is specified.  We should probably spell this out 
somewhere in the KIP.

thanks,
Colin


On Mon, Jul 30, 2018, at 06:47, Manikumar wrote:
> Bumping this up!
> 
> On Mon, Jul 23, 2018 at 8:00 PM Manikumar  wrote:
> 
> > Hi all,
> >
> > I have created a KIP to use AdminClient API in AclCommand (kafka-acls.sh)
> >
> >
> > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-332%3A+Update+AclCommand+to+use+AdminClient+API*
> >
> > 
> >
> > Please take a look.
> >
> > Thanks,
> >


Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread Jeff Widman
Congrats!

Is there any reason I don't see 2.0.0 listed on the docs page?

https://kafka.apache.org/documentation/

On Mon, Jul 30, 2018 at 12:54 PM, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> Congratulations !
>
> On Mon., 30 Jul. 2018, 11:51 am Sean Glover, 
> wrote:
>
> > Congrats to everyone involved.  Releasing "2.0.0" is a big achievement :)
> >
> > On Mon, Jul 30, 2018 at 12:35 PM Dongjin Lee  wrote:
> >
> > > Thank you for your great works! Thanks again for the commiters and all
> > the
> > > contributors!
> > >
> > > On Tue, Jul 31, 2018 at 1:05 AM Ismael Juma  wrote:
> > >
> > > > Thanks to everyone who contributed to the release (including testing
> > and
> > > > bug reports)! And thank you Rajini for managing the release.
> > > >
> > > > Ismael
> > > >
> > > > On Mon, 30 Jul 2018, 03:25 Rajini Sivaram, 
> > wrote:
> > > >
> > > > > The Apache Kafka community is pleased to announce the release for
> > > > >
> > > > > Apache Kafka 2.0.0.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > This is a major release and includes significant new features from
> > > > >
> > > > > 40 KIPs. It contains fixes and improvements from 246 JIRAs,
> including
> > > > >
> > > > > a few critical bugs. Here is a summary of some notable changes:
> > > > >
> > > > > ** KIP-290 adds support for prefixed ACLs, simplifying access
> control
> > > > > management in large secure deployments. Bulk access to topics,
> > > > > consumer groups or transactional ids with a prefix can now be
> granted
> > > > > using a single rule. Access control for topic creation has also
> been
> > > > > improved to enable access to be granted to create specific topics
> or
> > > > > topics with a prefix.
> > > > >
> > > > > ** KIP-255 adds a framework for authenticating to Kafka brokers
> using
> > > > > OAuth2 bearer tokens. The SASL/OAUTHBEARER implementation is
> > > > > customizable using callbacks for token retrieval and validation.
> > > > >
> > > > > **Host name verification is now enabled by default for SSL
> > connections
> > > > > to ensure that the default SSL configuration is not susceptible to
> > > > > man-in-the middle attacks. You can disable this verification for
> > > > > deployments where validation is performed using other mechanisms.
> > > > >
> > > > > ** You can now dynamically update SSL trust stores without broker
> > > > restart.
> > > > > You can also configure security for broker listeners in ZooKeeper
> > > before
> > > > > starting brokers, including SSL key store and trust store passwords
> > and
> > > > > JAAS configuration for SASL. With this new feature, you can store
> > > > sensitive
> > > > > password configs in encrypted form in ZooKeeper rather than in
> > > cleartext
> > > > > in the broker properties file.
> > > > >
> > > > > ** The replication protocol has been improved to avoid log
> divergence
> > > > > between leader and follower during fast leader failover. We have
> also
> > > > > improved resilience of brokers by reducing the memory footprint of
> > > > > message down-conversions. By using message chunking, both memory
> > > > > usage and memory reference time have been reduced to avoid
> > > > > OutOfMemory errors in brokers.
> > > > >
> > > > > ** Kafka clients are now notified of throttling before any
> throttling
> > > is
> > > > > applied
> > > > > when quotas are enabled. This enables clients to distinguish
> between
> > > > > network errors and large throttle times when quotas are exceeded.
> > > > >
> > > > > ** We have added a configuration option for Kafka consumer to avoid
> > > > > indefinite blocking in the consumer.
> > > > >
> > > > > ** We have dropped support for Java 7 and removed the previously
> > > > > deprecated Scala producer and consumer.
> > > > >
> > > > > ** Kafka Connect includes a number of improvements and features.
> > > > > KIP-298 enables you to control how errors in connectors,
> > > transformations
> > > > > and converters are handled by enabling automatic retries and
> > > controlling
> > > > > the
> > > > > number of errors that are tolerated before the connector is
> stopped.
> > > More
> > > > > contextual information can be included in the logs to help diagnose
> > > > > problems
> > > > > and problematic messages consumed by sink connectors can be sent
> to a
> > > > > dead letter queue rather than forcing the connector to stop.
> > > > >
> > > > > ** KIP-297 adds a new extension point to move secrets out of
> > connector
> > > > > configurations and integrate with any external key management
> system.
> > > > > The placeholders in connector configurations are only resolved
> before
> > > > > sending the configuration to the connector, ensuring that secrets
> are
> > > > > stored
> > > > > and managed securely in your preferred key management system and
> > > > > not exposed over the REST APIs or in log files.
> > > > >
> > > > > ** We have added a thin Scala wrapper API for our Kafka Streams
> DSL,
> > > > > which provides bette

Re: [DISCUSS] KIP-332: Update AclCommand to use AdminClient API

2018-07-30 Thread Ted Yu
Look good to me.

On Mon, Jul 23, 2018 at 7:30 AM Manikumar  wrote:

> Hi all,
>
> I have created a KIP to use AdminClient API in AclCommand (kafka-acls.sh)
>
> *
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-332%3A+Update+AclCommand+to+use+AdminClient+API*
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-332%3A+Update+AclCommand+to+use+AdminClient+API
> >
>
> Please take a look.
>
> Thanks,
>


[DISCUSS] Applying scalafmt to core code

2018-07-30 Thread Ray Chiang
I had started on KAFKA-2423 (was Scalastyle, now Expand scalafmt to 
core).  As part of the cleanup, applying the "gradlew spotlessApply" 
command ended up affecting too many (435 out of 439) files.  Since this 
will affect every file, this sort of change does risk polluting the git 
logs.


So, I'd like to get a discussion going to find some agreement on an 
approach.  Right now, I see two categories of options:


A) Getting scalafmt working on the existing code
B) Getting all the code conforming to scalafmt requirements

For the first, I see a couple of approaches:

A1) Do the minimum change that allows scalafmt to run on all the .scala 
files
A2) Make the change so that scalafmt runs as-is (only on the streams 
code) and add a different task/options that allow running scalafmt on a 
subset of code.  (Reasons explained below)


For the second, I can think of the following options:

B1) Do one giant git commit of all cleaned code (no one seemed to like this)
B2) Do git commits one file at a time (trunk or as a branch)
B3) Do git commits one leaf subdirectory at a time (trunk or as a branch)
B4) With each pull request on all patches, run option A2) on the 
affected files


From what I can envision, options B2 and B3 require quite a bit of 
manual work if we want to cover multiple releases.  The "cleanest" 
option I can think of looks something like:


C1) Contributor makes code modifications for their JIRA
C2) Contributor runs option A2 to also apply scalafmt to their existing code
C3) Committer does the regular review process

At some point in the future, enough cleanup could be done that the final 
cleanup can be done as a much smaller set of MINOR commits.


-Ray



Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread Stephane Maarek
Congratulations !

On Mon., 30 Jul. 2018, 11:51 am Sean Glover, 
wrote:

> Congrats to everyone involved.  Releasing "2.0.0" is a big achievement :)
>
> On Mon, Jul 30, 2018 at 12:35 PM Dongjin Lee  wrote:
>
> > Thank you for your great works! Thanks again for the commiters and all
> the
> > contributors!
> >
> > On Tue, Jul 31, 2018 at 1:05 AM Ismael Juma  wrote:
> >
> > > Thanks to everyone who contributed to the release (including testing
> and
> > > bug reports)! And thank you Rajini for managing the release.
> > >
> > > Ismael
> > >
> > > On Mon, 30 Jul 2018, 03:25 Rajini Sivaram, 
> wrote:
> > >
> > > > The Apache Kafka community is pleased to announce the release for
> > > >
> > > > Apache Kafka 2.0.0.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > This is a major release and includes significant new features from
> > > >
> > > > 40 KIPs. It contains fixes and improvements from 246 JIRAs, including
> > > >
> > > > a few critical bugs. Here is a summary of some notable changes:
> > > >
> > > > ** KIP-290 adds support for prefixed ACLs, simplifying access control
> > > > management in large secure deployments. Bulk access to topics,
> > > > consumer groups or transactional ids with a prefix can now be granted
> > > > using a single rule. Access control for topic creation has also been
> > > > improved to enable access to be granted to create specific topics or
> > > > topics with a prefix.
> > > >
> > > > ** KIP-255 adds a framework for authenticating to Kafka brokers using
> > > > OAuth2 bearer tokens. The SASL/OAUTHBEARER implementation is
> > > > customizable using callbacks for token retrieval and validation.
> > > >
> > > > **Host name verification is now enabled by default for SSL
> connections
> > > > to ensure that the default SSL configuration is not susceptible to
> > > > man-in-the middle attacks. You can disable this verification for
> > > > deployments where validation is performed using other mechanisms.
> > > >
> > > > ** You can now dynamically update SSL trust stores without broker
> > > restart.
> > > > You can also configure security for broker listeners in ZooKeeper
> > before
> > > > starting brokers, including SSL key store and trust store passwords
> and
> > > > JAAS configuration for SASL. With this new feature, you can store
> > > sensitive
> > > > password configs in encrypted form in ZooKeeper rather than in
> > cleartext
> > > > in the broker properties file.
> > > >
> > > > ** The replication protocol has been improved to avoid log divergence
> > > > between leader and follower during fast leader failover. We have also
> > > > improved resilience of brokers by reducing the memory footprint of
> > > > message down-conversions. By using message chunking, both memory
> > > > usage and memory reference time have been reduced to avoid
> > > > OutOfMemory errors in brokers.
> > > >
> > > > ** Kafka clients are now notified of throttling before any throttling
> > is
> > > > applied
> > > > when quotas are enabled. This enables clients to distinguish between
> > > > network errors and large throttle times when quotas are exceeded.
> > > >
> > > > ** We have added a configuration option for Kafka consumer to avoid
> > > > indefinite blocking in the consumer.
> > > >
> > > > ** We have dropped support for Java 7 and removed the previously
> > > > deprecated Scala producer and consumer.
> > > >
> > > > ** Kafka Connect includes a number of improvements and features.
> > > > KIP-298 enables you to control how errors in connectors,
> > transformations
> > > > and converters are handled by enabling automatic retries and
> > controlling
> > > > the
> > > > number of errors that are tolerated before the connector is stopped.
> > More
> > > > contextual information can be included in the logs to help diagnose
> > > > problems
> > > > and problematic messages consumed by sink connectors can be sent to a
> > > > dead letter queue rather than forcing the connector to stop.
> > > >
> > > > ** KIP-297 adds a new extension point to move secrets out of
> connector
> > > > configurations and integrate with any external key management system.
> > > > The placeholders in connector configurations are only resolved before
> > > > sending the configuration to the connector, ensuring that secrets are
> > > > stored
> > > > and managed securely in your preferred key management system and
> > > > not exposed over the REST APIs or in log files.
> > > >
> > > > ** We have added a thin Scala wrapper API for our Kafka Streams DSL,
> > > > which provides better type inference and better type safety during
> > > compile
> > > > time. Scala users can have less boilerplate in their code, notably
> > > > regarding
> > > > Serdes with new implicit Serdes.
> > > >
> > > > ** Message headers are now supported in the Kafka Streams Processor
> > API,
> > > > allowing users to add and manipulate headers read from the source
> > topics
> > > > and propagate them to the sink topics.
> > > >
> > > > ** Windowed aggregatio

Re: [DISCUSS] KIP-334 Include partitions in exceptions raised during consumer record deserialization/validation

2018-07-30 Thread Jason Gustafson
Hey Stanislav,

Thanks for the KIP. I think the goal is to allow users to seek past a
records which cannot be parsed for whatever reason. However, it's a little
annoying that you need to catch two separate types to handle this. I'm
wondering if it makes sense to expose an interface like
`UnconsumableRecordException` or something like that. The consumer could
then have separate internal exception types which extend from
InvalidRecordException and SerializationException respectively and
implement `UnconsumableRecordException`. That would simplify the handling
and users could check the cause if they cared which case it was.

Another question for consideration. I'd imagine some users would find it
helpful to seek past failed messages automatically. If there is a corrupt
record, for example, there's almost nothing you can do except seek past it
anyway. I'm wondering if there should be a config for this or if users
should be able to install a callback of some sorts to handle failed
records. Not sure if this is that big of a problem for users, but
interested to hear others thoughts.

Thanks,
Jason

On Fri, Jul 20, 2018 at 6:32 PM, Stanislav Kozlovski  wrote:

> Hi Ted,
>
> I do plan to start one. When is the appropriate time? My reasoning was that
> people would like to view the changes first
>
> On Fri, Jul 20, 2018, 6:21 PM Ted Yu  wrote:
>
> > Hi, Stanislav:
> > Do you plan to start VOTE thread ?
> >
> > Cheers
> >
> > On Fri, Jul 20, 2018 at 6:11 PM Stanislav Kozlovski <
> > stanis...@confluent.io>
> > wrote:
> >
> > > Hey group,
> > >
> > > I added a Pull Request for this KIP - here it is
> > > https://github.com/apache/kafka/pull/5410
> > > Please take a look.
> > >
> > > Best,
> > > Stanislav
> > >
> > > On Thu, Jul 5, 2018 at 11:06 AM Ismael Juma  wrote:
> > >
> > > > Yes, the Scala consumers have been removed in 2.0.0, which simplifies
> > > some
> > > > of this. The following commit was an initial step in unifying the
> > > exception
> > > > handling:
> > > >
> > > >
> > > >
> > >
> > https://github.com/apache/kafka/commit/96bcfdfc7c9aac075635b2034e65e4
> 12a725672e
> > > >
> > > > But more can be done as you mentioned.
> > > >
> > > > Ismael
> > > >
> > > > On 5 Jul 2018 9:36 am, "Stanislav Kozlovski"  >
> > > > wrote:
> > > >
> > > > Hey Ismael,
> > > >
> > > > It is only slightly related - my PR would attach two new attributes
> and
> > > > also touch upon deserialization exceptions.
> > > >
> > > > But this PR did provide me with some insight:
> > > > Maybe the best approach would be to make `InvalidRecordException` a
> > > public
> > > > exception instead of introducing a new one - I did not realize it was
> > not
> > > > publicly exposed.
> > > > Does the following:
> > > >
> > > >  InvalidMessageException extends CorruptRecordException for temporary
> > > > compatibility with the old Scala clients.
> > > >  * We want to update the server side code to use and catch the new
> > > > CorruptRecordException.
> > > >  * Because ByteBufferMessageSet.scala and Message.scala are used in
> > > > both server and client code having
> > > >  * InvalidMessageException extend CorruptRecordException allows us to
> > > > change server code without affecting the client.
> > > >
> > > > still apply? I can see that the `ByteBufferMessageSet` and `Message`
> > > scala
> > > > classes are not present in the codebase anymore. AFAIA the old scala
> > > > clients were removed with 2.0.0 and we can thus update the server
> side
> > > code
> > > > to use the `CorruptRecordException` while changing and exposing
> > > > `InvalidRecordException` to the public. WDYT?
> > > >
> > > > I will also make sure to not expose the cause of the exception when
> not
> > > > needed, maybe I'll outright remove the `cause` attribute
> > > >
> > > >
> > > > On Thu, Jul 5, 2018 at 4:55 PM Ismael Juma 
> wrote:
> > > >
> > > > > Thanks for the KIP, Stanislav. The following PR looks related:
> > > > >
> > > > > https://github.com/apache/kafka/pull/4093/files
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Thu, Jul 5, 2018 at 8:44 AM Stanislav Kozlovski <
> > > > stanis...@confluent.io
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hey everybody,
> > > > > >
> > > > > > I just created a new KIP about exposing more information in
> > > exceptions
> > > > > > caused by consumer record deserialization/validation. Please
> have a
> > > > look
> > > > > at
> > > > > > it, it is a very short page.
> > > > > >
> > > > > > I am working under the assumption that all invalid record or
> > > > > > deserialization exceptions in the consumer pass through the
> > `Fetcher`
> > > > > > class. Please confirm if that is true, otherwise I might miss
> some
> > > > places
> > > > > > where the exceptions are raised in my implementation
> > > > > >
> > > > > > One concern I have is the name of the second exception -
> > > > > > `InoperativeRecordException`. I would have named it
> > > > > > `InvalidRecordException` but that is taken. The `Fetcher` class
> > >

Re: [VOTE] KIP-328: Ability to suppress updates for KTables

2018-07-30 Thread Bill Bejeck
Thanks for the KIP!

+1

-Bill

On Mon, Jul 30, 2018 at 3:42 PM Ted Yu  wrote:

> +1
>
> On Mon, Jul 30, 2018 at 11:46 AM John Roesler  wrote:
>
> > Hello devs,
> >
> > The discussion of KIP-328 has gone some time with no new comments, so I
> am
> > calling for a vote!
> >
> > Here's the KIP: https://cwiki.apache.org/confluence/x/sQU0BQ
> >
> > The basic idea is to provide:
> > * more usable control over update rate (vs the current state store
> caches)
> > * the final-result-for-windowed-computations feature which several people
> > have requested
> >
> > Thanks,
> > -John
> >
>


Re: [VOTE] KIP-328: Ability to suppress updates for KTables

2018-07-30 Thread Guozhang Wang
Hi John,

Thanks for the updated KIP, +1 from me, and one minor suggestion:

Following your suggestion of the differentiation of `skipped-records` v.s.
`late-event-drop`, we should probably consider moving the scenarios where
records got ignored due the window not being available any more in windowed
aggregation operators from the `skipped-records` metrics recording to the
`late-event-drop` metrics recording.



Guozhang


On Mon, Jul 30, 2018 at 1:36 PM, Bill Bejeck  wrote:

> Thanks for the KIP!
>
> +1
>
> -Bill
>
> On Mon, Jul 30, 2018 at 3:42 PM Ted Yu  wrote:
>
> > +1
> >
> > On Mon, Jul 30, 2018 at 11:46 AM John Roesler  wrote:
> >
> > > Hello devs,
> > >
> > > The discussion of KIP-328 has gone some time with no new comments, so I
> > am
> > > calling for a vote!
> > >
> > > Here's the KIP: https://cwiki.apache.org/confluence/x/sQU0BQ
> > >
> > > The basic idea is to provide:
> > > * more usable control over update rate (vs the current state store
> > caches)
> > > * the final-result-for-windowed-computations feature which several
> people
> > > have requested
> > >
> > > Thanks,
> > > -John
> > >
> >
>



-- 
-- Guozhang


Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-07-30 Thread Jason Gustafson
Hi Boyang,

Thanks for the response. I think the main point I was trying to make is the
need for fencing. I am not too concerned about how to generate a unique id
on the client side. The approach you suggested for streams seems
reasonable. However, any time you reuse an id, you need to be careful that
there is only one instance that can use it at any time. We are always
running into problems where a previous instance of an application comes
back to life unexpectedly after we had already presumed it was dead.
Fencing ensures that even if this happens, it cannot do any damage. I would
say that some protection from zombies is a requirement here.

The second point was more abstract and mainly meant to initiate some
discussion. We have gone through several iterations of improvements to try
and reduce the rebalancing in consumer applications. We started out trying
to tune the session timeout. We have added an internal config to skip
leaving the group when streams shuts down. The broker now has a config to
delay rebalances in case all consumers join at about the same time. The
approach in this KIP is a step in a more principled direction, but it still
feels like we are making this unnecessarily hard on ourselves by insisting
that group membership is a dynamic concept. In practice, the number of
nodes dedicated to an application tends to remain fixed for long periods of
time and only scales up or down when needed. And these days you've got
frameworks like kubernetes which can automatically provision new nodes if
one fails. So the argument for dynamic membership is becoming weaker in my
opinion. This KIP is basically trying to impose a small degree of static
membership anyway so that rolling restarts do not change membership.
Anyway, curious to hear some thoughts about this from you and the others
who work on streams.

Thanks,
Jason


On Sat, Jul 28, 2018 at 4:44 PM, Boyang Chen  wrote:

> Thanks for the replies, James and Jason. Let me try to summarize your
> concerns.
>
>
> I think James' question is primarily the severity of user using this
> config wrongly. The impact would be that the same member id being used by
> multiple or even all of the consumers. The assignment protocol couldn't
> distinguish any of the overlapping consumers, thus assigning the exact same
> partitions multiple times to different consumers. I would say the processed
> result would be including a lot of duplicates and unnecessary heavy load on
> the client side, The correctness will depend on the user logic, however I'm
> pessimistic.
>
>
> Although the impact is very high, the symptom is not hard to triage,
> because user could visualize consumer identity overlapping fairly easily by
> exported consumer metrics. On the user standpoint, they would be fully
> aware of the potential erratic status before enabling "member.id"
> configuration IMO. Let me know your thoughts James!
>
>
> Next is Jason's suggestion. Jason shared a higher viewpoint and pointed
> out the problem that we need to solve is to maintain "a strong bias towards
> being able to reuse previous state". The proposed approach is to separate
> the notion of consumer membership and consumer identity.
>
>
> The original idea of this KIP was on the Stream application, so I
> understand that the identities of multiple consumers belong to one
> instance, where each Stream thread will be using one dedicated main
> consumer. So in a Stream use case, we could internally generate member id
> with USER_DEFINED_ID + STREAM_THREAD_ID.
>
>
> In pure consumer use case, this could be a little bit challenging since
> user could arbitrarily initiate multiple consumers on the same instance
> which is out of our library control. This could add up the possibility of
> member id collision. So instead of making developers life easier,
> introducing member id config could break the existing code logic and take
> long time to understand and fix. Although I still assume this is an
> advanced config, user may use member id config even before they fully
> understand the problem, and use the same set of initialization logic cross
> multiple consumers on the same instance.
>
>
> I hope I have explained my understanding of the pros and cons of this KIP
> better. Remember the core argument of this KIP: If the broker recognize
> this consumer as an existing member, it shouldn't trigger rebalance. If we
> build our discussion on top of this argument, the client management of
> group membership could be tricky at first, but considering our original
> motivation to leader-follower rebalance model, I feel that having broker to
> create membership info and let client maintain would be less appealing and
> fragile. Having client generate membership data could build up
> source-of-truth model and streamline the current architecture. We need also
> consider flexibility introduced by this KIP for cloud users to coordinate
> consumer/stream instances more freely. Honestly, I'm interested in Jason's
> registration id p

Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-07-30 Thread Mike Freyberger
Jason,

I really appreciate the broader conversation that you are bringing up here.

I've been working on an application that does streaming joins for a while now, 
and we face a similar issue with group membership being dynamic. We are 
currently using our own StickyAssignor and take special care during rolling 
restarts to make sure consumer assignments do not change. 

I think a feature that allows for group membership to be fixed, along with a 
CLI for adding or removing a node from the group be ideal. This reminds me of 
some of the work by the DynamoDB team about 10 years back when they 
differentiated transient failures from permanent failures to deal with this 
problems like this. 

Best,

Mike

On 7/30/18, 5:36 PM, "Jason Gustafson"  wrote:

Hi Boyang,

Thanks for the response. I think the main point I was trying to make is the
need for fencing. I am not too concerned about how to generate a unique id
on the client side. The approach you suggested for streams seems
reasonable. However, any time you reuse an id, you need to be careful that
there is only one instance that can use it at any time. We are always
running into problems where a previous instance of an application comes
back to life unexpectedly after we had already presumed it was dead.
Fencing ensures that even if this happens, it cannot do any damage. I would
say that some protection from zombies is a requirement here.

The second point was more abstract and mainly meant to initiate some
discussion. We have gone through several iterations of improvements to try
and reduce the rebalancing in consumer applications. We started out trying
to tune the session timeout. We have added an internal config to skip
leaving the group when streams shuts down. The broker now has a config to
delay rebalances in case all consumers join at about the same time. The
approach in this KIP is a step in a more principled direction, but it still
feels like we are making this unnecessarily hard on ourselves by insisting
that group membership is a dynamic concept. In practice, the number of
nodes dedicated to an application tends to remain fixed for long periods of
time and only scales up or down when needed. And these days you've got
frameworks like kubernetes which can automatically provision new nodes if
one fails. So the argument for dynamic membership is becoming weaker in my
opinion. This KIP is basically trying to impose a small degree of static
membership anyway so that rolling restarts do not change membership.
Anyway, curious to hear some thoughts about this from you and the others
who work on streams.

Thanks,
Jason


On Sat, Jul 28, 2018 at 4:44 PM, Boyang Chen  wrote:

> Thanks for the replies, James and Jason. Let me try to summarize your
> concerns.
>
>
> I think James' question is primarily the severity of user using this
> config wrongly. The impact would be that the same member id being used by
> multiple or even all of the consumers. The assignment protocol couldn't
> distinguish any of the overlapping consumers, thus assigning the exact 
same
> partitions multiple times to different consumers. I would say the 
processed
> result would be including a lot of duplicates and unnecessary heavy load 
on
> the client side, The correctness will depend on the user logic, however 
I'm
> pessimistic.
>
>
> Although the impact is very high, the symptom is not hard to triage,
> because user could visualize consumer identity overlapping fairly easily 
by
> exported consumer metrics. On the user standpoint, they would be fully
> aware of the potential erratic status before enabling "member.id"
> configuration IMO. Let me know your thoughts James!
>
>
> Next is Jason's suggestion. Jason shared a higher viewpoint and pointed
> out the problem that we need to solve is to maintain "a strong bias 
towards
> being able to reuse previous state". The proposed approach is to separate
> the notion of consumer membership and consumer identity.
>
>
> The original idea of this KIP was on the Stream application, so I
> understand that the identities of multiple consumers belong to one
> instance, where each Stream thread will be using one dedicated main
> consumer. So in a Stream use case, we could internally generate member id
> with USER_DEFINED_ID + STREAM_THREAD_ID.
>
>
> In pure consumer use case, this could be a little bit challenging since
> user could arbitrarily initiate multiple consumers on the same instance
> which is out of our library control. This could add up the possibility of
> member id collision. So instead of making developers life easier,
> introducing member id config could break the existing code logic and take
> long time to understand and fix. Alt

Re: [VOTE] KIP-328: Ability to suppress updates for KTables

2018-07-30 Thread John Roesler
Thanks Guozhang,

Thanks for that catch. to clarify, currently, events are "late" only when
they are older than the retention period. Currently, we detect this in the
processor and record it as a "skipped-record". We then do not attempt to
store the event in the window store. If a user provided a pre-configured
window store with a retention period smaller than the one they specify via
Windows#until, the segmented store will drop the update with no metric and
record a debug-level log.

With KIP-328, with the introduction of "grace period" and moving retention
fully into the state store, we need to have metrics for both "late events"
(new records older than the grace period) and "expired window events" (new
records for windows that are no longer retained in the state store). I
already proposed metrics for the late events, and I've just updated the KIP
with metrics for the expired window events. I also updated the KIP to make
it clear that neither late nor expired events will count as
"skipped-records" any more.

-John

On Mon, Jul 30, 2018 at 4:22 PM Guozhang Wang  wrote:

> Hi John,
>
> Thanks for the updated KIP, +1 from me, and one minor suggestion:
>
> Following your suggestion of the differentiation of `skipped-records` v.s.
> `late-event-drop`, we should probably consider moving the scenarios where
> records got ignored due the window not being available any more in windowed
> aggregation operators from the `skipped-records` metrics recording to the
> `late-event-drop` metrics recording.
>
>
>
> Guozhang
>
>
> On Mon, Jul 30, 2018 at 1:36 PM, Bill Bejeck  wrote:
>
> > Thanks for the KIP!
> >
> > +1
> >
> > -Bill
> >
> > On Mon, Jul 30, 2018 at 3:42 PM Ted Yu  wrote:
> >
> > > +1
> > >
> > > On Mon, Jul 30, 2018 at 11:46 AM John Roesler 
> wrote:
> > >
> > > > Hello devs,
> > > >
> > > > The discussion of KIP-328 has gone some time with no new comments,
> so I
> > > am
> > > > calling for a vote!
> > > >
> > > > Here's the KIP: https://cwiki.apache.org/confluence/x/sQU0BQ
> > > >
> > > > The basic idea is to provide:
> > > > * more usable control over update rate (vs the current state store
> > > caches)
> > > > * the final-result-for-windowed-computations feature which several
> > people
> > > > have requested
> > > >
> > > > Thanks,
> > > > -John
> > > >
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: [VOTE] KIP-328: Ability to suppress updates for KTables

2018-07-30 Thread John Roesler
Another thing that came up after I started working on an implementation is
that in addition to deprecating "retention" from the Windows interface, we
also need to deprecate "segmentInterval", for the same reasons. I simply
overlooked it previously. I've updated the KIP accordingly.

Hopefully, this doesn't change anyone's vote.

Thanks,
-John

On Mon, Jul 30, 2018 at 5:31 PM John Roesler  wrote:

> Thanks Guozhang,
>
> Thanks for that catch. to clarify, currently, events are "late" only when
> they are older than the retention period. Currently, we detect this in the
> processor and record it as a "skipped-record". We then do not attempt to
> store the event in the window store. If a user provided a pre-configured
> window store with a retention period smaller than the one they specify via
> Windows#until, the segmented store will drop the update with no metric and
> record a debug-level log.
>
> With KIP-328, with the introduction of "grace period" and moving retention
> fully into the state store, we need to have metrics for both "late events"
> (new records older than the grace period) and "expired window events" (new
> records for windows that are no longer retained in the state store). I
> already proposed metrics for the late events, and I've just updated the KIP
> with metrics for the expired window events. I also updated the KIP to make
> it clear that neither late nor expired events will count as
> "skipped-records" any more.
>
> -John
>
> On Mon, Jul 30, 2018 at 4:22 PM Guozhang Wang  wrote:
>
>> Hi John,
>>
>> Thanks for the updated KIP, +1 from me, and one minor suggestion:
>>
>> Following your suggestion of the differentiation of `skipped-records` v.s.
>> `late-event-drop`, we should probably consider moving the scenarios where
>> records got ignored due the window not being available any more in
>> windowed
>> aggregation operators from the `skipped-records` metrics recording to the
>> `late-event-drop` metrics recording.
>>
>>
>>
>> Guozhang
>>
>>
>> On Mon, Jul 30, 2018 at 1:36 PM, Bill Bejeck  wrote:
>>
>> > Thanks for the KIP!
>> >
>> > +1
>> >
>> > -Bill
>> >
>> > On Mon, Jul 30, 2018 at 3:42 PM Ted Yu  wrote:
>> >
>> > > +1
>> > >
>> > > On Mon, Jul 30, 2018 at 11:46 AM John Roesler 
>> wrote:
>> > >
>> > > > Hello devs,
>> > > >
>> > > > The discussion of KIP-328 has gone some time with no new comments,
>> so I
>> > > am
>> > > > calling for a vote!
>> > > >
>> > > > Here's the KIP: https://cwiki.apache.org/confluence/x/sQU0BQ
>> > > >
>> > > > The basic idea is to provide:
>> > > > * more usable control over update rate (vs the current state store
>> > > caches)
>> > > > * the final-result-for-windowed-computations feature which several
>> > people
>> > > > have requested
>> > > >
>> > > > Thanks,
>> > > > -John
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> -- Guozhang
>>
>


Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-07-30 Thread Guozhang Wang
Hello Boyang / Jason / Mike,

Thanks for your thoughtful inputs! Regarding the fencing issue, I've
thought about leveraging the epoch notion from PID of transactional
messaging before, but since in this proposal we do not always require
member ids from clients, and hence could have a mixed of user-specified
member ids with coordinator-generated member ids, the epoch idea may not be
very well suited for this scenario. Of course, we can tighten the screws a
bit by requiring that for a given consumer group, all consumers must either
be giving their member ids or leveraging on consumer coordinator to give
member ids, which does not sound a very strict requirement in practice, and
all we need to do is the add a new field in the join group request (we are
proposing to bump up its version anyways). And hence I've also thought
about another simple fencing approach, aka "first comer wins", that is to
pass in the host / port information from KafkaApis to GroupCoordinator to
validate if it matches the existing member id's cached host / post. It does
not always guarantee that we fence the right zombies because of "first
comer wins" (think of a scenario where the right client gets kicked out,
and then before it re-joins the actual zombie with the same member id gets
joined), but as I mentioned previously it will poke some leaks into the
code hierarchy a bit so I'm also hesitant to do it. If people think it is
indeed a must-have than good-to-have, I'd suggest we leverage on host-port
than using the epoch mechanism then.

--

As for the more general idea of having a static membership protocol to
better integrated with Cloud environment like k8s, I think the first idea
may actually be better fit with it.

Just a quick summary of what rebalance issues we face today:

1. Application start: when multi-instance application is started, multiple
rebalances are triggered to migrate states to newly started instances since
not all instances are joining at the same time. NOTE that KIP-134

is
targeted for this issue, but as an after-thought it may not be the optimal
solution.
2. Application shutdown: similarly to 1), when multi-instance application
is shutting down, multiple rebalances are triggered.
3. Application scale out: when a new instance is started, one rebalance is
executed to shuffle all assignment, rather than just a "partial" shuffling
of some of the members.
4. Application scale in: similarly to 3), when an existing instance
gracefully shutdown, once rebalance is executed to shuffle all assignment.
5. Application instance bounce (upgrade, config change etc): one instance
shut down and then restart, it will trigger two rebalances. NOTE that
disabling leave-group is targeted for this issue.
6. Application instance failure: one instance failed, and probably a new
instance start to take its assignment (e.g. k8s), it will trigger two
rebalances. The different with 3) above is that new instance would not have
local cached tasks.


Among them, I think 1/2/5/6 could potentially be grouped together as
"static membership"; 4/5 could be grouped as another category, of allowing
"incremental rebalance" or "partial rebalance" than full-rebalance. Of
course, having incremental rebalances can help on 1/2/5/6 as well to reduce
the cost of each unnecessary rebalance, but ideally we want NO rebalances
at all for these cases, which will be more true with k8s / etc integrations
or static memberships.

--

So just to throw in a sketchy idea following this route for 1/2/5/6 for
brainstorming kick-off:


1. We bump up the JoinGroupRequest with additional fields:

  1.a) a flag indicating "static" or "dynamic" membership protocols.
  1.b) with "static" membership, we also add the pre-defined member id.
  1.c) with "static" membership, we also add an optional
"group-change-timeout" value.

2. On the broker side, we enforce only one of the two protocols for all
group members: we accept the protocol on the first joined member of the
group, and if later joining members indicate a different membership
protocol, we reject it. If the group-change-timeout value was different to
the first joined member, we reject it as well.

3. With dynamic membership, nothing is changed; with static membership, we
do the following:

  3.a) never assign member ids, instead always expect the joining members
to come with their own member id; we could do the fencing based on host /
port here.
  3.b) upon receiving the first join group request, use the
"group-change-timeout" instead of the session-timeout as rebalance timeout
to wait for other members to join. This is for 1) above.
  3.c) upon receiving a leave-group request, use the "group-change-timeout"
to wait for more members to leave group as well, or for the left members to
re-join. After the timeout we trigger a rebalance with whatever 

Re: [VOTE] KIP-328: Ability to suppress updates for KTables

2018-07-30 Thread Guozhang Wang
Yes, the addendum lgtm as well. Thanks!

On Mon, Jul 30, 2018 at 3:34 PM, John Roesler  wrote:

> Another thing that came up after I started working on an implementation is
> that in addition to deprecating "retention" from the Windows interface, we
> also need to deprecate "segmentInterval", for the same reasons. I simply
> overlooked it previously. I've updated the KIP accordingly.
>
> Hopefully, this doesn't change anyone's vote.
>
> Thanks,
> -John
>
> On Mon, Jul 30, 2018 at 5:31 PM John Roesler  wrote:
>
> > Thanks Guozhang,
> >
> > Thanks for that catch. to clarify, currently, events are "late" only when
> > they are older than the retention period. Currently, we detect this in
> the
> > processor and record it as a "skipped-record". We then do not attempt to
> > store the event in the window store. If a user provided a pre-configured
> > window store with a retention period smaller than the one they specify
> via
> > Windows#until, the segmented store will drop the update with no metric
> and
> > record a debug-level log.
> >
> > With KIP-328, with the introduction of "grace period" and moving
> retention
> > fully into the state store, we need to have metrics for both "late
> events"
> > (new records older than the grace period) and "expired window events"
> (new
> > records for windows that are no longer retained in the state store). I
> > already proposed metrics for the late events, and I've just updated the
> KIP
> > with metrics for the expired window events. I also updated the KIP to
> make
> > it clear that neither late nor expired events will count as
> > "skipped-records" any more.
> >
> > -John
> >
> > On Mon, Jul 30, 2018 at 4:22 PM Guozhang Wang 
> wrote:
> >
> >> Hi John,
> >>
> >> Thanks for the updated KIP, +1 from me, and one minor suggestion:
> >>
> >> Following your suggestion of the differentiation of `skipped-records`
> v.s.
> >> `late-event-drop`, we should probably consider moving the scenarios
> where
> >> records got ignored due the window not being available any more in
> >> windowed
> >> aggregation operators from the `skipped-records` metrics recording to
> the
> >> `late-event-drop` metrics recording.
> >>
> >>
> >>
> >> Guozhang
> >>
> >>
> >> On Mon, Jul 30, 2018 at 1:36 PM, Bill Bejeck  wrote:
> >>
> >> > Thanks for the KIP!
> >> >
> >> > +1
> >> >
> >> > -Bill
> >> >
> >> > On Mon, Jul 30, 2018 at 3:42 PM Ted Yu  wrote:
> >> >
> >> > > +1
> >> > >
> >> > > On Mon, Jul 30, 2018 at 11:46 AM John Roesler 
> >> wrote:
> >> > >
> >> > > > Hello devs,
> >> > > >
> >> > > > The discussion of KIP-328 has gone some time with no new comments,
> >> so I
> >> > > am
> >> > > > calling for a vote!
> >> > > >
> >> > > > Here's the KIP: https://cwiki.apache.org/confluence/x/sQU0BQ
> >> > > >
> >> > > > The basic idea is to provide:
> >> > > > * more usable control over update rate (vs the current state store
> >> > > caches)
> >> > > > * the final-result-for-windowed-computations feature which
> several
> >> > people
> >> > > > have requested
> >> > > >
> >> > > > Thanks,
> >> > > > -John
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
>



-- 
-- Guozhang


Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-07-30 Thread Mike Freyberger
Guozhang, 

Thanks for giving us a great starting point.

A few questions that come to mind right away:

1) What do you think a reasonable group-change-timeout would be? I am thinking 
on the order of minutes (5 minutes?)

2) Will the nodes that are still alive continue to make progress during a 
static membership rebalance? I believe during a rebalance today all consumers 
wait for the SyncGroupResponse before continuing to read data from the brokers. 
If that is the case, I think it'd be ideal all nodes that are still alive 
during a static group membership change to continue to make progress as if 
nothing happened such that are there is no impact to the majority of the group 
when one node is bounced (quick restart).

3) Do you think an explicit method for forcing a rebalance would be needed? I 
am thinking of a scenario such as a disk failure on a node, and that node will 
definitely not come back. Rather than waiting up to the group-change-timeout, I 
think it'd be good an admin to force a rebalance rather than wait the full 
group-change-timeout. Maybe this is an over optimization, but I think certain 
use cases would benefit from static group membership with the ability to force 
a rebalance at any time. 

Best,

Mike

On 7/30/18, 6:57 PM, "Guozhang Wang"  wrote:

Hello Boyang / Jason / Mike,

Thanks for your thoughtful inputs! Regarding the fencing issue, I've
thought about leveraging the epoch notion from PID of transactional
messaging before, but since in this proposal we do not always require
member ids from clients, and hence could have a mixed of user-specified
member ids with coordinator-generated member ids, the epoch idea may not be
very well suited for this scenario. Of course, we can tighten the screws a
bit by requiring that for a given consumer group, all consumers must either
be giving their member ids or leveraging on consumer coordinator to give
member ids, which does not sound a very strict requirement in practice, and
all we need to do is the add a new field in the join group request (we are
proposing to bump up its version anyways). And hence I've also thought
about another simple fencing approach, aka "first comer wins", that is to
pass in the host / port information from KafkaApis to GroupCoordinator to
validate if it matches the existing member id's cached host / post. It does
not always guarantee that we fence the right zombies because of "first
comer wins" (think of a scenario where the right client gets kicked out,
and then before it re-joins the actual zombie with the same member id gets
joined), but as I mentioned previously it will poke some leaks into the
code hierarchy a bit so I'm also hesitant to do it. If people think it is
indeed a must-have than good-to-have, I'd suggest we leverage on host-port
than using the epoch mechanism then.

--

As for the more general idea of having a static membership protocol to
better integrated with Cloud environment like k8s, I think the first idea
may actually be better fit with it.

Just a quick summary of what rebalance issues we face today:

1. Application start: when multi-instance application is started, multiple
rebalances are triggered to migrate states to newly started instances since
not all instances are joining at the same time. NOTE that KIP-134


is
targeted for this issue, but as an after-thought it may not be the optimal
solution.
2. Application shutdown: similarly to 1), when multi-instance application
is shutting down, multiple rebalances are triggered.
3. Application scale out: when a new instance is started, one rebalance is
executed to shuffle all assignment, rather than just a "partial" shuffling
of some of the members.
4. Application scale in: similarly to 3), when an existing instance
gracefully shutdown, once rebalance is executed to shuffle all assignment.
5. Application instance bounce (upgrade, config change etc): one instance
shut down and then restart, it will trigger two rebalances. NOTE that
disabling leave-group is targeted for this issue.
6. Application instance failure: one instance failed, and probably a new
instance start to take its assignment (e.g. k8s), it will trigger two
rebalances. The different with 3) above is that new instance would not have
local cached tasks.


Among them, I think 1/2/5/6 could potentially be grouped together as
"static membership"; 4/5 could be grouped as another category, of allowing
"incremental rebalance" or "partial rebalance" than full-rebalance. Of
course, having incremental rebalances can help on 1/2/5/6 as well to reduce
the cost of each unnecessary rebalance, but ideally we want NO reb

Re: [Vote] KIP-321: Update TopologyDescription to better represent Source and Sink Nodes

2018-07-30 Thread Nishanth Pradeep
We need one more binding vote.

Binding Votes:

   - Matthias J. Sax
   - Guozhang Wong

Community Votes:

   - Bill Bejeck
   - Ted Yu

Best,
Nishanth Pradeep

On Fri, Jul 27, 2018 at 10:02 AM Bill Bejeck  wrote:

> Thanks for the KIP!
>
> +1
>
> -Bill
>
> On Thu, Jul 26, 2018 at 2:39 AM Guozhang Wang  wrote:
>
> > +1
> >
> > On Wed, Jul 25, 2018 at 11:13 PM, Matthias J. Sax  >
> > wrote:
> >
> > > +1 (binding)
> > >
> > > -Matthias
> > >
> > > On 7/25/18 7:47 PM, Ted Yu wrote:
> > > > +1
> > > >
> > > > On Wed, Jul 25, 2018 at 7:24 PM Nishanth Pradeep <
> > nishanth...@gmail.com>
> > > > wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> I'm calling a vote for KIP-321:
> > > >>
> > > >>
> > > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-321%3A+Update+
> > > TopologyDescription+to+better+represent+Source+and+Sink+Nodes
> > > >>
> > > >> Best,
> > > >> Nishanth Pradeep
> > > >>
> > > >
> > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-07-30 Thread Jason Gustafson
Hey Guozhang,

Thanks for the detailed response. Really quick about the fencing issue, I
think host/port will not be sufficient because it cannot handle
disconnects. For example, if the coordinator moves to another broker, then
there is no way we'd be able to guarantee the same host/port. Currently we
try to avoid rebalancing when the coordinator moves. That said, I agree in
principle with the "first comer wins" approach you've suggested. Basically
a member is only removed from the group if its session expires or it leaves
the group explicitly.

-Jason

On Mon, Jul 30, 2018 at 4:24 PM, Mike Freyberger 
wrote:

> Guozhang,
>
> Thanks for giving us a great starting point.
>
> A few questions that come to mind right away:
>
> 1) What do you think a reasonable group-change-timeout would be? I am
> thinking on the order of minutes (5 minutes?)
>
> 2) Will the nodes that are still alive continue to make progress during a
> static membership rebalance? I believe during a rebalance today all
> consumers wait for the SyncGroupResponse before continuing to read data
> from the brokers. If that is the case, I think it'd be ideal all nodes that
> are still alive during a static group membership change to continue to make
> progress as if nothing happened such that are there is no impact to the
> majority of the group when one node is bounced (quick restart).
>
> 3) Do you think an explicit method for forcing a rebalance would be
> needed? I am thinking of a scenario such as a disk failure on a node, and
> that node will definitely not come back. Rather than waiting up to the
> group-change-timeout, I think it'd be good an admin to force a rebalance
> rather than wait the full group-change-timeout. Maybe this is an over
> optimization, but I think certain use cases would benefit from static group
> membership with the ability to force a rebalance at any time.
>
> Best,
>
> Mike
>
> On 7/30/18, 6:57 PM, "Guozhang Wang"  wrote:
>
> Hello Boyang / Jason / Mike,
>
> Thanks for your thoughtful inputs! Regarding the fencing issue, I've
> thought about leveraging the epoch notion from PID of transactional
> messaging before, but since in this proposal we do not always require
> member ids from clients, and hence could have a mixed of user-specified
> member ids with coordinator-generated member ids, the epoch idea may
> not be
> very well suited for this scenario. Of course, we can tighten the
> screws a
> bit by requiring that for a given consumer group, all consumers must
> either
> be giving their member ids or leveraging on consumer coordinator to
> give
> member ids, which does not sound a very strict requirement in
> practice, and
> all we need to do is the add a new field in the join group request (we
> are
> proposing to bump up its version anyways). And hence I've also thought
> about another simple fencing approach, aka "first comer wins", that is
> to
> pass in the host / port information from KafkaApis to GroupCoordinator
> to
> validate if it matches the existing member id's cached host / post. It
> does
> not always guarantee that we fence the right zombies because of "first
> comer wins" (think of a scenario where the right client gets kicked
> out,
> and then before it re-joins the actual zombie with the same member id
> gets
> joined), but as I mentioned previously it will poke some leaks into the
> code hierarchy a bit so I'm also hesitant to do it. If people think it
> is
> indeed a must-have than good-to-have, I'd suggest we leverage on
> host-port
> than using the epoch mechanism then.
>
> --
>
> As for the more general idea of having a static membership protocol to
> better integrated with Cloud environment like k8s, I think the first
> idea
> may actually be better fit with it.
>
> Just a quick summary of what rebalance issues we face today:
>
> 1. Application start: when multi-instance application is started,
> multiple
> rebalances are triggered to migrate states to newly started instances
> since
> not all instances are joining at the same time. NOTE that KIP-134
>  134%3A+Delay+initial+consumer+group+rebalance>
> is
> targeted for this issue, but as an after-thought it may not be the
> optimal
> solution.
> 2. Application shutdown: similarly to 1), when multi-instance
> application
> is shutting down, multiple rebalances are triggered.
> 3. Application scale out: when a new instance is started, one
> rebalance is
> executed to shuffle all assignment, rather than just a "partial"
> shuffling
> of some of the members.
> 4. Application scale in: similarly to 3), when an existing instance
> gracefully shutdown, once rebalance is executed to shuffle all
> assignment.
> 5. Application instance bounce (upgrade, config change etc): one
> instance

Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-07-30 Thread Guozhang Wang
@Mike:

1) This should be configurable for each consumer group id, and the actual
value will depend on how fast users expect their instances to start /
shutdown concurrently, and fail-over. I think a default value of 30 seconds
should be good, note that it means on starting / shutting down the
coordinator will wait for that amount of time to start the rebalance, and
hence this value should not be too long, and definitely should be smaller
than request timeout.

2) I think you are talking about all nodes revoking assigned partitions and
reassigning them may take long time. This is targeted for 3/4 in my list
and should be handled by incremental rebalancing, which to me is a bit
orthogonal feature to add. Though for static membership the incremental
rebalance implementation may be simpler.

3) Today the coordinator have no way to differentiate if a member leaving
the group will ever come back, from the case that this member has
permanently failed and hence will not come back and hence a rebalance
should be triggered rather than waiting for group-change-timeout.


Guozhang




On Mon, Jul 30, 2018 at 4:24 PM, Mike Freyberger 
wrote:

> Guozhang,
>
> Thanks for giving us a great starting point.
>
> A few questions that come to mind right away:
>
> 1) What do you think a reasonable group-change-timeout would be? I am
> thinking on the order of minutes (5 minutes?)
>
> 2) Will the nodes that are still alive continue to make progress during a
> static membership rebalance? I believe during a rebalance today all
> consumers wait for the SyncGroupResponse before continuing to read data
> from the brokers. If that is the case, I think it'd be ideal all nodes that
> are still alive during a static group membership change to continue to make
> progress as if nothing happened such that are there is no impact to the
> majority of the group when one node is bounced (quick restart).
>
> 3) Do you think an explicit method for forcing a rebalance would be
> needed? I am thinking of a scenario such as a disk failure on a node, and
> that node will definitely not come back. Rather than waiting up to the
> group-change-timeout, I think it'd be good an admin to force a rebalance
> rather than wait the full group-change-timeout. Maybe this is an over
> optimization, but I think certain use cases would benefit from static group
> membership with the ability to force a rebalance at any time.
>
> Best,
>
> Mike
>
> On 7/30/18, 6:57 PM, "Guozhang Wang"  wrote:
>
> Hello Boyang / Jason / Mike,
>
> Thanks for your thoughtful inputs! Regarding the fencing issue, I've
> thought about leveraging the epoch notion from PID of transactional
> messaging before, but since in this proposal we do not always require
> member ids from clients, and hence could have a mixed of user-specified
> member ids with coordinator-generated member ids, the epoch idea may
> not be
> very well suited for this scenario. Of course, we can tighten the
> screws a
> bit by requiring that for a given consumer group, all consumers must
> either
> be giving their member ids or leveraging on consumer coordinator to
> give
> member ids, which does not sound a very strict requirement in
> practice, and
> all we need to do is the add a new field in the join group request (we
> are
> proposing to bump up its version anyways). And hence I've also thought
> about another simple fencing approach, aka "first comer wins", that is
> to
> pass in the host / port information from KafkaApis to GroupCoordinator
> to
> validate if it matches the existing member id's cached host / post. It
> does
> not always guarantee that we fence the right zombies because of "first
> comer wins" (think of a scenario where the right client gets kicked
> out,
> and then before it re-joins the actual zombie with the same member id
> gets
> joined), but as I mentioned previously it will poke some leaks into the
> code hierarchy a bit so I'm also hesitant to do it. If people think it
> is
> indeed a must-have than good-to-have, I'd suggest we leverage on
> host-port
> than using the epoch mechanism then.
>
> --
>
> As for the more general idea of having a static membership protocol to
> better integrated with Cloud environment like k8s, I think the first
> idea
> may actually be better fit with it.
>
> Just a quick summary of what rebalance issues we face today:
>
> 1. Application start: when multi-instance application is started,
> multiple
> rebalances are triggered to migrate states to newly started instances
> since
> not all instances are joining at the same time. NOTE that KIP-134
>  134%3A+Delay+initial+consumer+group+rebalance>
> is
> targeted for this issue, but as an after-thought it may not be the
> optimal
> solution.
> 2. Application shutdown: similarly to 

Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-07-30 Thread Guozhang Wang
@Jason,

Good point about disconnects. And with that I think I agree that a registry
id maybe a better idea to enable fencing than validating on host / port.


Guozhang


On Mon, Jul 30, 2018 at 5:40 PM, Jason Gustafson  wrote:

> Hey Guozhang,
>
> Thanks for the detailed response. Really quick about the fencing issue, I
> think host/port will not be sufficient because it cannot handle
> disconnects. For example, if the coordinator moves to another broker, then
> there is no way we'd be able to guarantee the same host/port. Currently we
> try to avoid rebalancing when the coordinator moves. That said, I agree in
> principle with the "first comer wins" approach you've suggested. Basically
> a member is only removed from the group if its session expires or it leaves
> the group explicitly.
>
> -Jason
>
> On Mon, Jul 30, 2018 at 4:24 PM, Mike Freyberger  >
> wrote:
>
> > Guozhang,
> >
> > Thanks for giving us a great starting point.
> >
> > A few questions that come to mind right away:
> >
> > 1) What do you think a reasonable group-change-timeout would be? I am
> > thinking on the order of minutes (5 minutes?)
> >
> > 2) Will the nodes that are still alive continue to make progress during a
> > static membership rebalance? I believe during a rebalance today all
> > consumers wait for the SyncGroupResponse before continuing to read data
> > from the brokers. If that is the case, I think it'd be ideal all nodes
> that
> > are still alive during a static group membership change to continue to
> make
> > progress as if nothing happened such that are there is no impact to the
> > majority of the group when one node is bounced (quick restart).
> >
> > 3) Do you think an explicit method for forcing a rebalance would be
> > needed? I am thinking of a scenario such as a disk failure on a node, and
> > that node will definitely not come back. Rather than waiting up to the
> > group-change-timeout, I think it'd be good an admin to force a rebalance
> > rather than wait the full group-change-timeout. Maybe this is an over
> > optimization, but I think certain use cases would benefit from static
> group
> > membership with the ability to force a rebalance at any time.
> >
> > Best,
> >
> > Mike
> >
> > On 7/30/18, 6:57 PM, "Guozhang Wang"  wrote:
> >
> > Hello Boyang / Jason / Mike,
> >
> > Thanks for your thoughtful inputs! Regarding the fencing issue, I've
> > thought about leveraging the epoch notion from PID of transactional
> > messaging before, but since in this proposal we do not always require
> > member ids from clients, and hence could have a mixed of
> user-specified
> > member ids with coordinator-generated member ids, the epoch idea may
> > not be
> > very well suited for this scenario. Of course, we can tighten the
> > screws a
> > bit by requiring that for a given consumer group, all consumers must
> > either
> > be giving their member ids or leveraging on consumer coordinator to
> > give
> > member ids, which does not sound a very strict requirement in
> > practice, and
> > all we need to do is the add a new field in the join group request
> (we
> > are
> > proposing to bump up its version anyways). And hence I've also
> thought
> > about another simple fencing approach, aka "first comer wins", that
> is
> > to
> > pass in the host / port information from KafkaApis to
> GroupCoordinator
> > to
> > validate if it matches the existing member id's cached host / post.
> It
> > does
> > not always guarantee that we fence the right zombies because of
> "first
> > comer wins" (think of a scenario where the right client gets kicked
> > out,
> > and then before it re-joins the actual zombie with the same member id
> > gets
> > joined), but as I mentioned previously it will poke some leaks into
> the
> > code hierarchy a bit so I'm also hesitant to do it. If people think
> it
> > is
> > indeed a must-have than good-to-have, I'd suggest we leverage on
> > host-port
> > than using the epoch mechanism then.
> >
> > --
> >
> > As for the more general idea of having a static membership protocol
> to
> > better integrated with Cloud environment like k8s, I think the first
> > idea
> > may actually be better fit with it.
> >
> > Just a quick summary of what rebalance issues we face today:
> >
> > 1. Application start: when multi-instance application is started,
> > multiple
> > rebalances are triggered to migrate states to newly started instances
> > since
> > not all instances are joining at the same time. NOTE that KIP-134
> >  > 134%3A+Delay+initial+consumer+group+rebalance>
> > is
> > targeted for this issue, but as an after-thought it may not be the
> > optimal
> > solution.
> > 2. Application shutdown: similarly to 1), when multi-instance
> > application
> > is shutting down, mu

Re: [DISCUSS] KIP-332: Update AclCommand to use AdminClient API

2018-07-30 Thread Manikumar
Hi Colin,

Yes,  "--authorizer-properties" option is not required with
"--bootstrap-server" option. Updated the KIP.


Thanks,

On Tue, Jul 31, 2018 at 1:30 AM Ted Yu  wrote:

> Look good to me.
>
> On Mon, Jul 23, 2018 at 7:30 AM Manikumar 
> wrote:
>
> > Hi all,
> >
> > I have created a KIP to use AdminClient API in AclCommand (kafka-acls.sh)
> >
> > *
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-332%3A+Update+AclCommand+to+use+AdminClient+API*
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-332%3A+Update+AclCommand+to+use+AdminClient+API
> > >
> >
> > Please take a look.
> >
> > Thanks,
> >
>


Permission to create KIP

2018-07-30 Thread Ratish Ravindran
Hi,

I am trying to create a KIP, but I don't access to do so. Can anyone
provide me with the access ?

Thanks,
Ratish


Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread James Cheng
Congrats and great job, everyone! Thanks Rajini for driving the release!

-James

Sent from my iPhone

> On Jul 30, 2018, at 3:25 AM, Rajini Sivaram  wrote:
> 
> The Apache Kafka community is pleased to announce the release for
> 
> Apache Kafka 2.0.0.
> 
> 
> 
> 
> 
> This is a major release and includes significant new features from
> 
> 40 KIPs. It contains fixes and improvements from 246 JIRAs, including
> 
> a few critical bugs. Here is a summary of some notable changes:
> 
> ** KIP-290 adds support for prefixed ACLs, simplifying access control
> management in large secure deployments. Bulk access to topics,
> consumer groups or transactional ids with a prefix can now be granted
> using a single rule. Access control for topic creation has also been
> improved to enable access to be granted to create specific topics or
> topics with a prefix.
> 
> ** KIP-255 adds a framework for authenticating to Kafka brokers using
> OAuth2 bearer tokens. The SASL/OAUTHBEARER implementation is
> customizable using callbacks for token retrieval and validation.
> 
> **Host name verification is now enabled by default for SSL connections
> to ensure that the default SSL configuration is not susceptible to
> man-in-the middle attacks. You can disable this verification for
> deployments where validation is performed using other mechanisms.
> 
> ** You can now dynamically update SSL trust stores without broker restart.
> You can also configure security for broker listeners in ZooKeeper before
> starting brokers, including SSL key store and trust store passwords and
> JAAS configuration for SASL. With this new feature, you can store sensitive
> password configs in encrypted form in ZooKeeper rather than in cleartext
> in the broker properties file.
> 
> ** The replication protocol has been improved to avoid log divergence
> between leader and follower during fast leader failover. We have also
> improved resilience of brokers by reducing the memory footprint of
> message down-conversions. By using message chunking, both memory
> usage and memory reference time have been reduced to avoid
> OutOfMemory errors in brokers.
> 
> ** Kafka clients are now notified of throttling before any throttling is
> applied
> when quotas are enabled. This enables clients to distinguish between
> network errors and large throttle times when quotas are exceeded.
> 
> ** We have added a configuration option for Kafka consumer to avoid
> indefinite blocking in the consumer.
> 
> ** We have dropped support for Java 7 and removed the previously
> deprecated Scala producer and consumer.
> 
> ** Kafka Connect includes a number of improvements and features.
> KIP-298 enables you to control how errors in connectors, transformations
> and converters are handled by enabling automatic retries and controlling the
> number of errors that are tolerated before the connector is stopped. More
> contextual information can be included in the logs to help diagnose problems
> and problematic messages consumed by sink connectors can be sent to a
> dead letter queue rather than forcing the connector to stop.
> 
> ** KIP-297 adds a new extension point to move secrets out of connector
> configurations and integrate with any external key management system.
> The placeholders in connector configurations are only resolved before
> sending the configuration to the connector, ensuring that secrets are stored
> and managed securely in your preferred key management system and
> not exposed over the REST APIs or in log files.
> 
> ** We have added a thin Scala wrapper API for our Kafka Streams DSL,
> which provides better type inference and better type safety during compile
> time. Scala users can have less boilerplate in their code, notably regarding
> Serdes with new implicit Serdes.
> 
> ** Message headers are now supported in the Kafka Streams Processor API,
> allowing users to add and manipulate headers read from the source topics
> and propagate them to the sink topics.
> 
> ** Windowed aggregations performance in Kafka Streams has been largely
> improved (sometimes by an order of magnitude) thanks to the new
> single-key-fetch API.
> 
> ** We have further improved unit testibility of Kafka Streams with the
> kafka-streams-testutil artifact.
> 
> 
> 
> 
> 
> All of the changes in this release can be found in the release notes:
> 
> https://www.apache.org/dist/kafka/2.0.0/RELEASE_NOTES.html
> 
> 
> 
> 
> 
> You can download the source and binary release (Scala 2.11 and Scala 2.12)
> from:
> 
> https://kafka.apache.org/downloads#2.0.0
> 
> 
> 
> 
> ---
> 
> 
> 
> 
> 
> Apache Kafka is a distributed streaming platform with four core APIs:
> 
> 
> 
> ** The Producer API allows an application to publish a stream records to
> 
> one or more Kafka topics.
> 
> 
> 
> ** The Consumer API allows an application to subscribe to one or more
> 
> topic

Re: [Vote] KIP-321: Update TopologyDescription to better represent Source and Sink Nodes

2018-07-30 Thread Damian Guy
Hi Nishanth,

I have one nit on the KIP. I think the topicNameExtractor method should
return Optional rather than null.
Sorry I'm late here.

Thanks,
Damian

On Tue, 31 Jul 2018 at 01:14 Nishanth Pradeep  wrote:

> We need one more binding vote.
>
> Binding Votes:
>
>- Matthias J. Sax
>- Guozhang Wong
>
> Community Votes:
>
>- Bill Bejeck
>- Ted Yu
>
> Best,
> Nishanth Pradeep
>
> On Fri, Jul 27, 2018 at 10:02 AM Bill Bejeck  wrote:
>
> > Thanks for the KIP!
> >
> > +1
> >
> > -Bill
> >
> > On Thu, Jul 26, 2018 at 2:39 AM Guozhang Wang 
> wrote:
> >
> > > +1
> > >
> > > On Wed, Jul 25, 2018 at 11:13 PM, Matthias J. Sax <
> matth...@confluent.io
> > >
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > -Matthias
> > > >
> > > > On 7/25/18 7:47 PM, Ted Yu wrote:
> > > > > +1
> > > > >
> > > > > On Wed, Jul 25, 2018 at 7:24 PM Nishanth Pradeep <
> > > nishanth...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Hello,
> > > > >>
> > > > >> I'm calling a vote for KIP-321:
> > > > >>
> > > > >>
> > > > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-321%3A+Update+
> > > > TopologyDescription+to+better+represent+Source+and+Sink+Nodes
> > > > >>
> > > > >> Best,
> > > > >> Nishanth Pradeep
> > > > >>
> > > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>


[jira] [Created] (KAFKA-7218) Log thread names & ids when KafkaConsumer throws ConcurrentModificationException

2018-07-30 Thread Kevin Lu (JIRA)
Kevin Lu created KAFKA-7218:
---

 Summary: Log thread names & ids when KafkaConsumer throws 
ConcurrentModificationException
 Key: KAFKA-7218
 URL: https://issues.apache.org/jira/browse/KAFKA-7218
 Project: Kafka
  Issue Type: Improvement
  Components: consumer, log
Reporter: Kevin Lu
Assignee: Kevin Lu


KafkaConsumer does not support multi-threaded usage and any access by a thread 
that does not have the lock will cause ConcurrentModificationException to be 
thrown.

For some users leveraging frameworks on top of Kafka that abstract the actual 
KafkaConsumer class/calls, it can be hard to identify user and/or framework 
bugs when this exception is thrown.

It would be incredibly helpful to log both the thread name and the thread ID of 
the thread that has acquired the lock, and the current thread that is 
attempting to obtain the lock in the exception message.

 

KafkaConsumer currently only tracks the id of the thread that has acquired the 
lock. Additionally, we should also keep track of the thread name.

An example of the exception message: "KafkaConsumer is not safe for 
multi-threaded access: acquiredThreadName=acquiredThread, acquiredThreadId=1, 
currentThreadName=rejectedThread, currentThreadId=2"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7219) Add topic/partition level metrics.

2018-07-30 Thread Satish Duggana (JIRA)
Satish Duggana created KAFKA-7219:
-

 Summary: Add topic/partition level metrics.
 Key: KAFKA-7219
 URL: https://issues.apache.org/jira/browse/KAFKA-7219
 Project: Kafka
  Issue Type: Improvement
  Components: metrics
Reporter: Satish Duggana
Assignee: Satish Duggana



Currently, Kafka generates different metrics for topics on a broker.

  - MessagesInPerSec
  - BytesInPerSec
  - BytesOutPerSec
  - BytesRejectedPerSec
  - ReplicationBytesInPerSec
  - ReplicationBytesOutPerSec
  - FailedProduceRequestsPerSec
  - FailedFetchRequestsPerSec
  - TotalProduceRequestsPerSec
  - TotalFetchRequestsPerSec
  - FetchMessageConversionsPerSec
  - ProduceMessageConversionsPerSec

Add metrics for individual partitions instead of having only at topic level. 
Some of these partition level metrics are useful for monitoring applications to 
monitor individual topic/partitions.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)