Jenkins build is back to normal : Kafka » kafka-trunk-jdk11 #318

2020-12-16 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-10849) Remove useless ApiKeys#parseResponse and ApiKeys#parseRequest

2020-12-16 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-10849.

Resolution: Duplicate

duplicate to https://github.com/apache/kafka/pull/9748

> Remove useless ApiKeys#parseResponse and ApiKeys#parseRequest
> -
>
> Key: KAFKA-10849
> URL: https://issues.apache.org/jira/browse/KAFKA-10849
> Project: Kafka
>  Issue Type: Improvement
>Reporter: dengziming
>Assignee: dengziming
>Priority: Minor
>
> KAFKA-10818 has removed the conversion to Struct when parsing resp, 
> `{{ApiVersionsResponse.parse`}} will be used instead of 
> `{{API_VERSIONS.parseResponse`}}, so the overwrite in 
> {{ApiKeys.API_VERSIONS}} is useless, and in the same way, 
> `ApiKeys#parseResponse` and `ApiKeys#parseRequest` can also be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10292) fix flaky streams/streams_broker_bounce_test.py

2020-12-16 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-10292.

Fix Version/s: (was: 2.8.0)
   2.7.0
   Resolution: Fixed

> fix flaky streams/streams_broker_bounce_test.py
> ---
>
> Key: KAFKA-10292
> URL: https://issues.apache.org/jira/browse/KAFKA-10292
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams, system tests
>Reporter: Chia-Ping Tsai
>Assignee: Bruno Cadonna
>Priority: Blocker
> Fix For: 2.7.0
>
>
> {quote}
> Module: kafkatest.tests.streams.streams_broker_bounce_test
> Class:  StreamsBrokerBounceTest
> Method: test_broker_type_bounce
> Arguments:
> \{
>   "broker_type": "leader",
>   "failure_mode": "clean_bounce",
>   "num_threads": 1,
>   "sleep_time_secs": 120
> \}
> {quote}
> {quote}
> Module: kafkatest.tests.streams.streams_broker_bounce_test
> Class:  StreamsBrokerBounceTest
> Method: test_broker_type_bounce
> Arguments:
> \{
>   "broker_type": "controller",
>   "failure_mode": "hard_shutdown",
>   "num_threads": 3,
>   "sleep_time_secs": 120
> \}
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10858) Convert connect protocol header schemas to use generated protocol

2020-12-16 Thread dengziming (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dengziming resolved KAFKA-10858.

Resolution: Duplicate

> Convert connect protocol header schemas to use generated protocol
> -
>
> Key: KAFKA-10858
> URL: https://issues.apache.org/jira/browse/KAFKA-10858
> Project: Kafka
>  Issue Type: Improvement
>  Components: protocol
>Reporter: dengziming
>Assignee: dengziming
>Priority: Major
>
> manual managed schema code in and ConnectProtocol
>  IncrementalCooperativeConnectProtocol should be replaced by auto-generated 
> protocol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10859) add @Test annotation to FileStreamSourceTaskTest.testInvalidFile and reduce the loop count to speedup the test

2020-12-16 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-10859:
--

 Summary: add @Test annotation to 
FileStreamSourceTaskTest.testInvalidFile and reduce the loop count to speedup 
the test
 Key: KAFKA-10859
 URL: https://issues.apache.org/jira/browse/KAFKA-10859
 Project: Kafka
  Issue Type: Improvement
Reporter: Chia-Ping Tsai


FileStreamSourceTaskTest.testInvalidFile miss a `@Test` annotation. Also, it 
loops 100 times which spend about 2m to complete a unit test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KAFKA-10292) fix flaky streams/streams_broker_bounce_test.py

2020-12-16 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai reopened KAFKA-10292:


Thanks for the information. Reopen it

> fix flaky streams/streams_broker_bounce_test.py
> ---
>
> Key: KAFKA-10292
> URL: https://issues.apache.org/jira/browse/KAFKA-10292
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams, system tests
>Reporter: Chia-Ping Tsai
>Assignee: Bruno Cadonna
>Priority: Blocker
> Fix For: 2.7.0
>
>
> {quote}
> Module: kafkatest.tests.streams.streams_broker_bounce_test
> Class:  StreamsBrokerBounceTest
> Method: test_broker_type_bounce
> Arguments:
> \{
>   "broker_type": "leader",
>   "failure_mode": "clean_bounce",
>   "num_threads": 1,
>   "sleep_time_secs": 120
> \}
> {quote}
> {quote}
> Module: kafkatest.tests.streams.streams_broker_bounce_test
> Class:  StreamsBrokerBounceTest
> Method: test_broker_type_bounce
> Arguments:
> \{
>   "broker_type": "controller",
>   "failure_mode": "hard_shutdown",
>   "num_threads": 3,
>   "sleep_time_secs": 120
> \}
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


New Contributor

2020-12-16 Thread 山崎健史
Dear team. Could you please add us as a contributor for Apache Kafka?

・GitHub: zacky9664, JIRA username: zacky
・GitHub: moja0316, JIRA username: moja0316
・GitHub: runom, JIRA username: mintomio


[GitHub] [kafka-site] acraske opened a new pull request #315: Adding La Redoute as powered by Kafka

2020-12-16 Thread GitBox


acraske opened a new pull request #315:
URL: https://github.com/apache/kafka-site/pull/315


   From sharing with Anthony and Ale



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: [VOTE] voting on KIP-631: the quorum-based Kafka controller

2020-12-16 Thread Tom Bentley
Thanks for the KIP Colin, it does a great job of clearly explaining some
pretty complex changes.

+1 (non-binding)

Tom



On Tue, Dec 15, 2020 at 7:13 PM Boyang Chen 
wrote:

> Thanks Colin for the great work to polish the KIP and reach this final
> stage. +1 (binding) from me
>
> On Tue, Dec 15, 2020 at 9:11 AM David Arthur  wrote:
>
> > Colin, thanks for driving this. I just read through the KIP again and I
> > think it is in good shape. Exciting stuff!
> >
> > +1 binding
> >
> > -David
> >
> > On Sat, Dec 12, 2020 at 7:46 AM Ron Dagostino  wrote:
> >
> > > Thanks for shepherding this KIP through the extended discussion, Colin.
> > I
> > > think we’ve ended up in a good place.  I’m sure there will be more
> tweaks
> > > along the way, but the fundamentals are in place.  +1 (non-binding)
> from
> > me.
> > >
> > > Ron
> > >
> > > > On Dec 11, 2020, at 4:39 PM, Colin McCabe 
> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I'd like to restart the vote on KIP-631: the quorum-based Kafka
> > > Controller.  The KIP is here:
> > > >
> > > > https://cwiki.apache.org/confluence/x/4RV4CQ
> > > >
> > > > The original DISCUSS thread is here:
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r1ed098a88c489780016d963b065e8cb450a9080a4736457cd25f323c%40%3Cdev.kafka.apache.org%3E
> > > >
> > > > There is also a second email DISCUSS thread, which is here:
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r1ed098a88c489780016d963b065e8cb450a9080a4736457cd25f323c%40%3Cdev.kafka.apache.org%3E
> > > >
> > > > Please take a look and vote if you can.
> > > >
> > > > best,
> > > > Colin
> > >
> >
> >
> > --
> > David Arthur
> >
>


Re: [VOTE] 2.6.1 RC3

2020-12-16 Thread Manikumar
Hi,

+1 (binding)
- verified signatures
- ran the tests on the source archive with Scala 2.13
- verified the core/connect/streams quickstart with Scala 2.13 binary
archive.
- verified the artifacts, javadoc

Thanks for running the release!


Thanks,
Manikumar

On Tue, Dec 15, 2020 at 9:01 PM Rajini Sivaram  wrote:

> +1 (binding)
>
> Verified signatures, ran tests from source build (one flaky test failed but
> passed on rerun), ran Kafka quick start with the binary with both Scala
> 2.12 and Scala 2.13.
>
> Thanks for running the release, Mickael!
>
> Regards,
>
> Rajini
>
> On Fri, Dec 11, 2020 at 3:23 PM Mickael Maison 
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the fourth candidate for release of Apache Kafka 2.6.1.
> >
> > Since RC2, the following JIRAs have been fixed: KAFKA-10811, KAFKA-10802
> >
> > Release notes for the 2.6.1 release:
> > https://home.apache.org/~mimaison/kafka-2.6.1-rc3/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Friday, December 18, 12 PM ET ***
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~mimaison/kafka-2.6.1-rc3/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://home.apache.org/~mimaison/kafka-2.6.1-rc3/javadoc/
> >
> > * Tag to be voted upon (off 2.6 branch) is the 2.6.1 tag:
> > https://github.com/apache/kafka/releases/tag/2.6.1-rc3
> >
> > * Documentation:
> > https://kafka.apache.org/26/documentation.html
> >
> > * Protocol:
> > https://kafka.apache.org/26/protocol.html
> >
> > * Successful Jenkins builds for the 2.6 branch:
> > Unit/integration tests:
> > https://ci-builds.apache.org/job/Kafka/job/kafka-2.6-jdk8/62/
> >
> > /**
> >
> > Thanks,
> > Mickael
> >
>


Re: [VOTE] KIP-698: Add Explicit User Initialization of Broker-side State to Kafka Streams

2020-12-16 Thread Bruno Cadonna

Hi Guozhang,

Thank for the feedback!

Please find my answers inline.

Best,
Bruno


On 14.12.20 23:33, Guozhang Wang wrote:

Hello Bruno,

Just a few more questions about the KIP:

1) If the internal topics exist but the calculated num.partitions do not
match the existing topics, what would Streams do;


Good point! I missed to explicitly consider misconfigurations in the KIP.

I propose to throw a fatal error in this case during manual and 
automatic initialization. For the fatal error, we have two options:
a) introduce a second exception besides MissingInternalTopicException, 
e.g. MisconfiguredInternalTopicException
b) rename MissingInternalTopicException to 
MissingOrMisconfiguredInternalTopicException and throw that in both cases.


Since the process to react on such an exception user-side should be 
similar, I am fine with option b). However, IMO option a) is a bit 
cleaner. WDYT?



2) Since `init()` is a blocking call (we only return after all topics are
confirmed to be created), should we have a timeout for this call as well or
not;


I will add an overload with a timeout to the KIP.


3) If the configure is set to `MANUAL_SETUP`, then during rebalance do we
still check if number of partitions of the existing topic match or not; if
not, do we throw the newly added exception or throw a fatal
StreamsException? Today we would throw the StreamsException from assign()
which would be then thrown from consumer.poll() as a fatal error.



Yes, I think we should check if the number of partitions match. I 
propose to throw the newly added exception in the same way as we throw 
now the MissingSourceTopicException, i.e., throw it from 
consumer.poll(). WDYT?



Guozhang


On Mon, Dec 14, 2020 at 12:47 PM John Roesler  wrote:


Thanks, Bruno!

I'm +1 (binding)

-John

On Mon, 2020-12-14 at 09:57 -0600, Leah Thomas wrote:

Thanks for the KIP Bruno, LGTM. +1 (non-binding)

Cheers,
Leah

On Mon, Dec 14, 2020 at 4:29 AM Bruno Cadonna 

wrote:



Hi,

I'd like to start the voting on KIP-698 that proposes an explicit user
initialization of broker-side state for Kafka Streams instead of

letting

Kafka Streams setting up the broker-side state automatically during
rebalance. Such an explicit initialization avoids possible data loss
issues due to automatic initialization.

https://cwiki.apache.org/confluence/x/7CnZCQ

Best,
Bruno









[jira] [Resolved] (KAFKA-10656) NetworkClient.java: print out the feature flags received at DEBUG level, as well as the other version information

2020-12-16 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-10656.

Fix Version/s: 2.7.0
   Resolution: Fixed

> NetworkClient.java: print out the feature flags received at DEBUG level, as 
> well as the other version information
> -
>
> Key: KAFKA-10656
> URL: https://issues.apache.org/jira/browse/KAFKA-10656
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Tom Bentley
>Priority: Major
> Fix For: 2.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: Kafka » kafka-trunk-jdk15 #339

2020-12-16 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-10656: Log the feature flags received by the client (#9552)


--
[...truncated 13.13 KB...]

org.apache.kafka.message.VersionConditionalTest > testAlwaysFalseConditional 
PASSED

org.apache.kafka.message.VersionConditionalTest > 
testLowerRangeCheckWithIfMember STARTED

org.apache.kafka.message.VersionConditionalTest > 
testLowerRangeCheckWithIfMember PASSED

org.apache.kafka.message.VersionConditionalTest > 
testUpperRangeCheckWithIfNotMember STARTED

org.apache.kafka.message.VersionConditionalTest > 
testUpperRangeCheckWithIfNotMember PASSED

org.apache.kafka.message.VersionConditionalTest > 
testUpperRangeCheckWithIfMember STARTED

org.apache.kafka.message.VersionConditionalTest > 
testUpperRangeCheckWithIfMember PASSED

org.apache.kafka.message.VersionConditionalTest > 
testAnotherAlwaysFalseConditional STARTED

org.apache.kafka.message.VersionConditionalTest > 
testAnotherAlwaysFalseConditional PASSED

org.apache.kafka.message.VersionConditionalTest > 
testAllowMembershipCheckAlwaysFalseFails STARTED

org.apache.kafka.message.VersionConditionalTest > 
testAllowMembershipCheckAlwaysFalseFails PASSED

org.apache.kafka.message.VersionConditionalTest > 
testLowerRangeCheckWithIfNotMember STARTED

org.apache.kafka.message.VersionConditionalTest > 
testLowerRangeCheckWithIfNotMember PASSED

org.apache.kafka.message.VersionConditionalTest > testUpperRangeCheckWithElse 
STARTED

org.apache.kafka.message.VersionConditionalTest > testUpperRangeCheckWithElse 
PASSED

org.apache.kafka.message.StructRegistryTest > testDuplicateCommonStructError 
STARTED

org.apache.kafka.message.MessageGeneratorTest > testToSnakeCase PASSED

org.apache.kafka.message.MessageGeneratorTest > testCapitalizeFirst STARTED

org.apache.kafka.message.MessageGeneratorTest > testCapitalizeFirst PASSED

org.apache.kafka.message.MessageGeneratorTest > testLowerCaseFirst STARTED

org.apache.kafka.message.MessageGeneratorTest > testLowerCaseFirst PASSED

org.apache.kafka.message.MessageGeneratorTest > testFirstIsCapitalized STARTED

org.apache.kafka.message.MessageGeneratorTest > testFirstIsCapitalized PASSED

org.apache.kafka.message.MessageGeneratorTest > stripSuffixTest STARTED

org.apache.kafka.message.MessageGeneratorTest > stripSuffixTest PASSED

org.apache.kafka.message.VersionsTest > testIntersections STARTED

org.apache.kafka.message.VersionsTest > testIntersections PASSED

org.apache.kafka.message.VersionsTest > testSubtract STARTED

org.apache.kafka.message.VersionsTest > testSubtract PASSED

org.apache.kafka.message.VersionsTest > testRoundTrips STARTED

org.apache.kafka.message.VersionsTest > testRoundTrips PASSED

org.apache.kafka.message.VersionsTest > testVersionsParse STARTED

org.apache.kafka.message.VersionsTest > testVersionsParse PASSED

org.apache.kafka.message.VersionsTest > testContains STARTED

org.apache.kafka.message.VersionsTest > testContains PASSED

org.apache.kafka.message.StructRegistryTest > testDuplicateCommonStructError 
PASSED

org.apache.kafka.message.StructRegistryTest > testSingleStruct STARTED

org.apache.kafka.message.StructRegistryTest > testSingleStruct PASSED

org.apache.kafka.message.StructRegistryTest > testReSpecifiedCommonStructError 
STARTED

org.apache.kafka.message.StructRegistryTest > testReSpecifiedCommonStructError 
PASSED

org.apache.kafka.message.StructRegistryTest > testCommonStructs STARTED

org.apache.kafka.message.StructRegistryTest > testCommonStructs PASSED

org.apache.kafka.message.MessageDataGeneratorTest > 
testInvalidTaggedVersionsNotASubetOfVersions STARTED

org.apache.kafka.message.MessageDataGeneratorTest > 
testInvalidTaggedVersionsNotASubetOfVersions PASSED

org.apache.kafka.message.MessageDataGeneratorTest > 
testInvalidNullDefaultForPotentiallyNonNullableArray STARTED

org.apache.kafka.message.MessageDataGeneratorTest > 
testInvalidNullDefaultForPotentiallyNonNullableArray PASSED

org.apache.kafka.message.MessageDataGeneratorTest > testInvalidFieldName STARTED

org.apache.kafka.message.MessageDataGeneratorTest > testInvalidFieldName PASSED

org.apache.kafka.message.MessageDataGeneratorTest > 
testInvalidTaggedVersionsWithoutTag STARTED

org.apache.kafka.message.MessageDataGeneratorTest > 
testInvalidTaggedVersionsWithoutTag PASSED

org.apache.kafka.message.MessageDataGeneratorTest > 
testInvalidFlexibleVersionsRange STARTED

org.apache.kafka.message.MessageDataGeneratorTest > 
testInvalidFlexibleVersionsRange PASSED

org.apache.kafka.message.MessageDataGeneratorTest > 
testInvalidNullDefaultForInt STARTED

org.apache.kafka.message.MessageDataGeneratorTest > 
testInvalidNullDefaultForInt PASSED

org.apache.kafka.message.MessageDataGeneratorTest > testDuplicateTags STARTED

org.apache.kafka.message.MessageDataGeneratorTest > testDuplicateTags PASSED

org.apache.kafka.message.MessageDataGeneratorTest > 
testInva

[jira] [Resolved] (KAFKA-10417) suppress() with cogroup() throws ClassCastException

2020-12-16 Thread Leah Thomas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leah Thomas resolved KAFKA-10417.
-
Resolution: Fixed

> suppress() with cogroup() throws ClassCastException
> ---
>
> Key: KAFKA-10417
> URL: https://issues.apache.org/jira/browse/KAFKA-10417
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Wardha Perinkada Kattu
>Assignee: Leah Thomas
>Priority: Critical
>  Labels: kafka-streams
> Fix For: 2.8.0, 2.7.1
>
>
> Streams operation - `cogroup()`, `aggregate()` followed by `suppress()` 
> throws `ClassCastException`
> Works fine without the `suppress()`
> Code block tested -
> {code:java}
> val stream1 = requestStreams.merge(successStreams).merge(errorStreams)
> .groupByKey(Grouped.with(Serdes.String(), 
> serdesConfig.notificationSerde()))
> val streams2 = confirmationStreams
> .groupByKey(Grouped.with(Serdes.String(), 
> serdesConfig.confirmationsSerde()))
> val cogrouped = 
> stream1.cogroup(notificationAggregator).cogroup(streams2, 
> confirmationsAggregator)
> 
> .windowedBy(TimeWindows.of(Duration.ofMinutes(notificationStreamsConfig.joinWindowMinutes.toLong())).grace(Duration.ofMinutes(notificationStreamsConfig.graceDurationMinutes.toLong(
> .aggregate({ null }, Materialized.`as` NotificationMetric, WindowStore ByteArray>>("time-windowed-aggregated-stream-store")
> 
> .withValueSerde(serdesConfig.notificationMetricSerde()))
>  .suppress(Suppressed.untilWindowCloses(unbounded()))
> .toStream()
> {code}
> Exception thrown is:
> {code:java}
> Caused by: java.lang.ClassCastException: class 
> org.apache.kafka.streams.kstream.internals.PassThrough cannot be cast to 
> class org.apache.kafka.streams.kstream.internals.KTableProcessorSupplier 
> (org.apache.kafka.streams.kstream.internals.PassThrough and 
> org.apache.kafka.streams.kstream.internals.KTableProcessorSupplier are in 
> unnamed module of loader 'app')
> {code}
> [https://stackoverflow.com/questions/63459685/kgroupedstream-with-cogroup-aggregate-suppress]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[VOTE] 2.7.0 RC6

2020-12-16 Thread Bill Bejeck
Hello Kafka users, developers and client-developers,

This is the seventh candidate for release of Apache Kafka 2.7.0.

* Configurable TCP connection timeout and improve the initial metadata fetch
* Enforce broker-wide and per-listener connection creation rate (KIP-612,
part 1)
* Throttle Create Topic, Create Partition and Delete Topic Operations
* Add TRACE-level end-to-end latency metrics to Streams
* Add Broker-side SCRAM Config API
* Support PEM format for SSL certificates and private key
* Add RocksDB Memory Consumption to RocksDB Metrics
* Add Sliding-Window support for Aggregations

This release also includes a few other features, 53 improvements, and 91
bug fixes.

*** Please download, test and vote by Monday, December 21, 12 PM ET

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~bbejeck/kafka-2.7.0-rc6/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~bbejeck/kafka-2.7.0-rc6/javadoc/

* Tag to be voted upon (off 2.7 branch) is the 2.7.0 tag:
https://github.com/apache/kafka/releases/tag/2.7.0-rc6

* Documentation:
https://kafka.apache.org/27/documentation.html

* Protocol:
https://kafka.apache.org/27/protocol.html

* Successful Jenkins builds for the 2.7 branch:
Unit/integration tests:
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-2.7-jdk8/detail/kafka-2.7-jdk8/81/

Thanks,
Bill


Re: [VOTE] voting on KIP-631: the quorum-based Kafka controller

2020-12-16 Thread Unmesh Joshi
Went through the changes since the last discussion thread, and it's looking
in good shape. Thanks!.
+ 1 (non-binding)

On Wed, Dec 16, 2020 at 4:34 PM Tom Bentley  wrote:

> Thanks for the KIP Colin, it does a great job of clearly explaining some
> pretty complex changes.
>
> +1 (non-binding)
>
> Tom
>
>
>
> On Tue, Dec 15, 2020 at 7:13 PM Boyang Chen 
> wrote:
>
> > Thanks Colin for the great work to polish the KIP and reach this final
> > stage. +1 (binding) from me
> >
> > On Tue, Dec 15, 2020 at 9:11 AM David Arthur  wrote:
> >
> > > Colin, thanks for driving this. I just read through the KIP again and I
> > > think it is in good shape. Exciting stuff!
> > >
> > > +1 binding
> > >
> > > -David
> > >
> > > On Sat, Dec 12, 2020 at 7:46 AM Ron Dagostino 
> wrote:
> > >
> > > > Thanks for shepherding this KIP through the extended discussion,
> Colin.
> > > I
> > > > think we’ve ended up in a good place.  I’m sure there will be more
> > tweaks
> > > > along the way, but the fundamentals are in place.  +1 (non-binding)
> > from
> > > me.
> > > >
> > > > Ron
> > > >
> > > > > On Dec 11, 2020, at 4:39 PM, Colin McCabe 
> > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to restart the vote on KIP-631: the quorum-based Kafka
> > > > Controller.  The KIP is here:
> > > > >
> > > > > https://cwiki.apache.org/confluence/x/4RV4CQ
> > > > >
> > > > > The original DISCUSS thread is here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r1ed098a88c489780016d963b065e8cb450a9080a4736457cd25f323c%40%3Cdev.kafka.apache.org%3E
> > > > >
> > > > > There is also a second email DISCUSS thread, which is here:
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r1ed098a88c489780016d963b065e8cb450a9080a4736457cd25f323c%40%3Cdev.kafka.apache.org%3E
> > > > >
> > > > > Please take a look and vote if you can.
> > > > >
> > > > > best,
> > > > > Colin
> > > >
> > >
> > >
> > > --
> > > David Arthur
> > >
> >
>


Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

2020-12-16 Thread Colin McCabe
On Tue, Dec 15, 2020, at 13:08, Jun Rao wrote:
> Hi, Colin,
> 
> Thanks for the reply. A few more follow up comments.
> 
> 210. initial.broker.registration.timeout.ms: The default value is 90sec,
> which seems long. If a broker fails the registration because of incorrect
> configs, we want to fail faster. In comparison, the defaults for
> zookeeper.connection.timeout.ms is 18 secs.
> 

Hi Jun,

I agree that the initial connection timeout here is longer than what we had 
with ZK.  The main reason I selected a slightly longer timeout here is to 
handle the case where the controllers and the brokers are co-located.  For 
example, if you have a 3 node cluster and all three nodes are 
controllers+brokers, when you first bring up the cluster, we will have to stand 
up the controller quorum and then handle broker registrations.  Although we 
believe Raft will start up relatively quickly, it's good to leave some extra 
margin here.

I don't think there's a big disadvantage to having a slightly longer timeout 
here.  After all, starting up the brokers with no controllers is an admin 
mistake, which we don't expect to see very often.  Maybe let's set it to 60 
seconds for now and maybe see if we can tweak it in the future if that turns 
out to be too long or short?

> 211. "Once they are all moved, the controller responds to the heartbeat
> with a nextState of SHUTDOWN." It seems that nextState is no longer
> relevant. Also, could you add a bit more explanation on when ShouldFence is
> set to true in BrokerHeartbeatRequest?
> 

Good catch.  I removed the obsolete section referring to nextState and added a 
reference to the new boolean.  I also added some more information about 
ShouldFence and about the rationale for separating fencing from registration.

> 212. Related to the above, what does the broker do if IsFenced is true in
> the BrokerHeartbeatResponse? Will the broker do the same thing on receiving
> a FenceBrokerRecord from the metadata log?
> 

The broker only checks this boolean during the startup process.  After the 
startup process is finished, it ignores this boolean.

The broker uses fence / unfence records in the metadata log to determine which 
brokers should appear in its MetadataResponse.

best,
Colin

> Jun
> 
> On Tue, Dec 15, 2020 at 8:51 AM Colin McCabe  wrote:
> 
> > On Tue, Dec 15, 2020, at 04:13, Tom Bentley wrote:
> > > Hi Colin,
> > >
> > > The KIP says that "brokers which are fenced will not appear in
> > > MetadataResponses.  So clients that have up-to-date metadata will not try
> > > to contact fenced brokers.", which is fine, but it doesn't seem to
> > > elaborate on what happens for clients which try to contact a broker
> > (using
> > > stale metadata, for example) and find it in the registered-but-fenced
> > > state.
> >
> > Hi Tom,
> >
> > I have dropped the broker-side fencing from this proposal.  So now the
> > fencing is basically the same as today's fencing: it's controller-side
> > only.  That means that clients using stale metadata can contact fenced
> > brokers and communicate with them.  The only case where the broker will be
> > inaccessible is when it hasn't finished starting yet (i.e., has not yet
> > attained RUNNING state.)
> >
> > Just like today, we expect the safeguards built into the replication
> > protocol to prevent the worst corner cases that could result from this.  I
> > do think we will probably take up this issue later and find a better
> > solution for client-side metadata, but this KIP is big enough as-is.
> >
> > best,
> > Colin
> >
> >
> > > You said previously that the broker will respond with a retriable
> > > error, which is again fine, but you didn't answer my question about which
> > > error code you've chosen for this. I realise that this doesn't really
> > > affect the correctness of the behaviour, but I'm not aware of any
> > existing
> > > error code which is a good fit. So it would be good to understand about
> > how
> > > you're making this backward compatible for clients.
> > >
> > > Many thanks,
> > >
> > > Tom
> > >
> > > On Tue, Dec 15, 2020 at 1:42 AM Colin McCabe  wrote:
> > >
> > > > On Fri, Dec 11, 2020, at 17:07, Jun Rao wrote:
> > > > > Hi, Colin,
> > > > >
> > > > > Thanks for the reply. Just a couple of more comments below.
> > > > >
> > > > > 210. Since we are deprecating zookeeper.connection.timeout.ms,
> > should we
> > > > > add a new config to bound the time for a broker to connect to the
> > > > > controller during starting up?
> > > > >
> > > >
> > > > Good idea.  I added initial.broker.registration.timeout.ms for this.
> > > >
> > > > > 211. BrokerHeartbeat no longer has the state field in the
> > > > request/response.
> > > > > However, (a) the controller shutdown section still has "In its
> > periodic
> > > > > heartbeats, the broker asks the controller if it can transition into
> > the
> > > > > SHUTDOWN state.  This motivates the controller to move all of the
> > leaders
> > > > > off of that broker.  Once they are all

Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

2020-12-16 Thread Jun Rao
Hi, Colin,

Thanks for the reply. A few more comments below.

206. "RemoveTopic is the last step, that scrubs all metadata about the
topic.  In order to get to that last step, the topic data needs to removed
from all brokers (after each broker notices that the topic is being
deleted)." Currently, this is done based on the response of
StopReplicaRequest. Since the controller no longer sends this request, how
does the controller know that the data for the deleted topic has
been removed in the brokers?

210. Thanks for the explanation. Sounds good to me.

211. I still don't see an example when ShouldFence is set to true in
BrokerHeartbeatReques. Could we add one?

213. The KIP now allows replicas to be assigned on brokers that are fenced,
which is an improvement. How do we permanently remove a broker (e.g.
cluster shrinking) to prevent it from being used for future replica
assignments?

214. "Currently, when a node is down, all of its ZK registration
information is gone.  But  we need this information in order to understand
things like whether the replicas of a particular partition are
well-balanced across racks." This is not quite true. Currently, even when
ZK registration is gone, the existing replica assignment is still available
in the metadata response.

Jun

On Wed, Dec 16, 2020 at 8:48 AM Colin McCabe  wrote:

> On Tue, Dec 15, 2020, at 13:08, Jun Rao wrote:
> > Hi, Colin,
> >
> > Thanks for the reply. A few more follow up comments.
> >
> > 210. initial.broker.registration.timeout.ms: The default value is 90sec,
> > which seems long. If a broker fails the registration because of incorrect
> > configs, we want to fail faster. In comparison, the defaults for
> > zookeeper.connection.timeout.ms is 18 secs.
> >
>
> Hi Jun,
>
> I agree that the initial connection timeout here is longer than what we
> had with ZK.  The main reason I selected a slightly longer timeout here is
> to handle the case where the controllers and the brokers are co-located.
> For example, if you have a 3 node cluster and all three nodes are
> controllers+brokers, when you first bring up the cluster, we will have to
> stand up the controller quorum and then handle broker registrations.
> Although we believe Raft will start up relatively quickly, it's good to
> leave some extra margin here.
>
> I don't think there's a big disadvantage to having a slightly longer
> timeout here.  After all, starting up the brokers with no controllers is an
> admin mistake, which we don't expect to see very often.  Maybe let's set it
> to 60 seconds for now and maybe see if we can tweak it in the future if
> that turns out to be too long or short?
>
> > 211. "Once they are all moved, the controller responds to the heartbeat
> > with a nextState of SHUTDOWN." It seems that nextState is no longer
> > relevant. Also, could you add a bit more explanation on when ShouldFence
> is
> > set to true in BrokerHeartbeatRequest?
> >
>
> Good catch.  I removed the obsolete section referring to nextState and
> added a reference to the new boolean.  I also added some more information
> about ShouldFence and about the rationale for separating fencing from
> registration.
>
> > 212. Related to the above, what does the broker do if IsFenced is true in
> > the BrokerHeartbeatResponse? Will the broker do the same thing on
> receiving
> > a FenceBrokerRecord from the metadata log?
> >
>
> The broker only checks this boolean during the startup process.  After the
> startup process is finished, it ignores this boolean.
>
> The broker uses fence / unfence records in the metadata log to determine
> which brokers should appear in its MetadataResponse.
>
> best,
> Colin
>
> > Jun
> >
> > On Tue, Dec 15, 2020 at 8:51 AM Colin McCabe  wrote:
> >
> > > On Tue, Dec 15, 2020, at 04:13, Tom Bentley wrote:
> > > > Hi Colin,
> > > >
> > > > The KIP says that "brokers which are fenced will not appear in
> > > > MetadataResponses.  So clients that have up-to-date metadata will
> not try
> > > > to contact fenced brokers.", which is fine, but it doesn't seem to
> > > > elaborate on what happens for clients which try to contact a broker
> > > (using
> > > > stale metadata, for example) and find it in the registered-but-fenced
> > > > state.
> > >
> > > Hi Tom,
> > >
> > > I have dropped the broker-side fencing from this proposal.  So now the
> > > fencing is basically the same as today's fencing: it's controller-side
> > > only.  That means that clients using stale metadata can contact fenced
> > > brokers and communicate with them.  The only case where the broker
> will be
> > > inaccessible is when it hasn't finished starting yet (i.e., has not yet
> > > attained RUNNING state.)
> > >
> > > Just like today, we expect the safeguards built into the replication
> > > protocol to prevent the worst corner cases that could result from
> this.  I
> > > do think we will probably take up this issue later and find a better
> > > solution for client-side metadata, but this KIP is big enough as-is.
> >

Re: [VOTE] voting on KIP-631: the quorum-based Kafka controller

2020-12-16 Thread Jason Gustafson
+1 Thanks Colin for all the iterations. My only request is to change
"controller.connect" to "controller.quorum.voters." I think it's important
to emphasize that this must be the full set of voters unlike
"zookeeper.connect." In the future, I think we can consider supporting an
additional config like "controller.connect" for brokers which can discover
the voters more dynamically.

Best,
Jason

On Wed, Dec 16, 2020 at 7:36 AM Unmesh Joshi  wrote:

> Went through the changes since the last discussion thread, and it's looking
> in good shape. Thanks!.
> + 1 (non-binding)
>
> On Wed, Dec 16, 2020 at 4:34 PM Tom Bentley  wrote:
>
> > Thanks for the KIP Colin, it does a great job of clearly explaining some
> > pretty complex changes.
> >
> > +1 (non-binding)
> >
> > Tom
> >
> >
> >
> > On Tue, Dec 15, 2020 at 7:13 PM Boyang Chen 
> > wrote:
> >
> > > Thanks Colin for the great work to polish the KIP and reach this final
> > > stage. +1 (binding) from me
> > >
> > > On Tue, Dec 15, 2020 at 9:11 AM David Arthur  wrote:
> > >
> > > > Colin, thanks for driving this. I just read through the KIP again
> and I
> > > > think it is in good shape. Exciting stuff!
> > > >
> > > > +1 binding
> > > >
> > > > -David
> > > >
> > > > On Sat, Dec 12, 2020 at 7:46 AM Ron Dagostino 
> > wrote:
> > > >
> > > > > Thanks for shepherding this KIP through the extended discussion,
> > Colin.
> > > > I
> > > > > think we’ve ended up in a good place.  I’m sure there will be more
> > > tweaks
> > > > > along the way, but the fundamentals are in place.  +1 (non-binding)
> > > from
> > > > me.
> > > > >
> > > > > Ron
> > > > >
> > > > > > On Dec 11, 2020, at 4:39 PM, Colin McCabe 
> > > wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I'd like to restart the vote on KIP-631: the quorum-based Kafka
> > > > > Controller.  The KIP is here:
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/x/4RV4CQ
> > > > > >
> > > > > > The original DISCUSS thread is here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r1ed098a88c489780016d963b065e8cb450a9080a4736457cd25f323c%40%3Cdev.kafka.apache.org%3E
> > > > > >
> > > > > > There is also a second email DISCUSS thread, which is here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r1ed098a88c489780016d963b065e8cb450a9080a4736457cd25f323c%40%3Cdev.kafka.apache.org%3E
> > > > > >
> > > > > > Please take a look and vote if you can.
> > > > > >
> > > > > > best,
> > > > > > Colin
> > > > >
> > > >
> > > >
> > > > --
> > > > David Arthur
> > > >
> > >
> >
>


Build failed in Jenkins: Kafka » kafka-2.6-jdk8 #64

2020-12-16 Thread Apache Jenkins Server
See 


Changes:

[Rajini Sivaram] KAFKA-10798; Ensure response is delayed for failed SASL 
authentication with connection close delay (#9678)


--
[...truncated 3.16 MB...]

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTimeDeprecated[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessRecordForTopic[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessRecordForTopic[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldForwardRecordsFromSubtopologyToSubtopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldForwardRecordsFromSubtopologyToSubtopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForSmallerValue[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForSmallerValue[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCreateStateDirectoryForStatefulTopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCreateStateDirectoryForStatefulTopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateIfWallClockTimeAdvances[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateIfWallClockTimeAdvances[Eos enabled = false] PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairs STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKeyAndDefaultTimestamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKeyAndDefaultTimestamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithDefaultTimestamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithDefaultTimestamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKey STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKey PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithOtherTopicNameAndTimestampWithTimetamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithOtherTopicNameAndTimestampWithTimetamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullHeaders STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullHeaders PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithDefaultTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithDefaultTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKey STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKey PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecord STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecord PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecord STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecord PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > shouldAdvanceTime 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > shouldAdvanceTime 
PASSED

org.apache.kafka.stream

Re: [DISCUSS] KIP-405: Kafka Tiered Storage

2020-12-16 Thread Jun Rao
Hi, Satish,

Thanks for the reply. A few more followup comments.

6022. For packages used for server plugins, the convention is to
use org.apache.kafka.server. See java-based Authorizer as an example.

9100. Do we need DeletePartitionStateRecord in flat_file_format? The flat
file captures the state of the remote segments. After a partition is
deleted, it seems that we just need to remove the partitions's remote
segments from the flat file.

9101. Upgrade: It will be useful to allow direct upgrade from an old
version. It seems that's doable. One can just do the normal upgrade first
and wait enough time (for producer snapshots to be built), and then enable
remote storage.

9102. RemotePartitionRemover(RPM) process: Is it true that RPM starts
tracking the remote segments when RLMM.onPartitionLeadershipChanges() is
called with the broker being the leader for __remote_log_metadata
partition? If so, could we document it?

Jun

On Tue, Dec 15, 2020 at 8:47 AM Kowshik Prakasam 
wrote:

> Hi Satish,
>
> Thanks for the updates! A few more comments below.
>
> 9001. Under the "Upgrade" section, there is a line mentioning: "Upgrade the
> existing Kafka cluster to 2.7 version and allow this to run for the log
> retention of user topics that you want to enable tiered storage. This will
> allow all the topics to have the producer snapshots generated for each log
> segment." -- Which associated change in AK were you referring to here? Is
> it: https://github.com/apache/kafka/pull/7929 ? It seems like I don't see
> it in the 2.7 release branch yet, here is the link:
> https://github.com/apache/kafka/commits/2.7.
>
> 9002. Under the "Upgrade" section, the configuration mentioned is
> 'remote.log.storage.system.enable'. However, under "Public Interfaces"
> section the corresponding configuration is 'remote.storage.system.enable'.
> Could we use the same one in both, maybe
> 'remote.log.storage.system.enable'?
>
> 9003. Under "Per Topic Configuration", the KIP recommends setting
> 'remote.log.storage.enable' to true at a per-topic level. It will be useful
> to add a line that if the user wants to enable it for all topics, then they
> should be able to set the cluster-wide default to true. Also, it will be
> useful to mention that the KIP currently does not support setting it to
> false (after it is set to true), and add that to the future work section.
>
> 9004. Under "Committed offsets file format", the sample provided shows
> partition number and offset. Is the topic name required for identifying
> which topic the partitions belong to?
>
> 9005. Under "Internal flat-file store format of remote log metadata", it
> seems useful to specify both topic name and topic ID for debugging
> purposes.
>
> 9006. Under "Internal flat-file store format of remote log metadata", the
> description of "metadata-topic-offset" currently says "offset of the remote
> log metadata topic from which this topic partition's remote log metadata is
> fetched." Just for the wording, perhaps you meant to refer to the offset
> upto which the file has been committed? i.e. "offset of the remote log
> metadata topic upto which this topic partition's remote log metadata has
> been committed into this file."
>
> 9007. Under "Internal flat-file store format of remote log metadata", the
> schema of the payload (i.e. beyond the header) seems to contain the events
> from the metadata topic. It seems useful to instead persist the
> representation of the materialized state of the events, so that for the
> same segment only the latest state is stored. Besides reducing storage
> footprint, this also is likely to relate directly with the in-memory
> representation of the RLMM cache (which probably is some kind of a Map with
> key being segment ID and value being the segment state), so recovery from
> disk will be straightforward.
>
> 9008. Under "Topic deletion lifecycle", step (1), it will be useful to
> mention when in the deletion flow does the controller publish the
> delete_partition_marked event to say that the partition is marked for
> deletion?
>
> 9009. There are ~4 TODOs in the KIP. Could you please address these or
> remove them?
>
> 9010. There is a reference to a Google doc on the KIP which was used
> earlier for discussions. Please could you remove the reference, since the
> KIP is the source of the truth?
>
> 9011. This feedback is from an earlier comment. In the RemoteStorageManager
> interface, there is an API defined for each file type. For example,
> fetchOffsetIndex, fetchTimestampIndex etc. To avoid the duplication, I'd
> suggest we can instead have a FileType enum and a common get API based on
> the FileType. What do you think?
>
>
> Cheers,
> Kowshik
>
>
> On Mon, Dec 14, 2020 at 11:07 AM Satish Duggana 
> wrote:
>
> > Hi Jun,
> > Thanks for your comments. Please go through the inline replies.
> >
> >
> > 5102.2: It seems that both positions can just be int. Another option is
> to
> > have two methods. Would it be clearer?
> >
> > InputStream 

Re: [VOTE] KIP-695: Improve Streams Time Synchronization

2020-12-16 Thread Jason Gustafson
Hi John,

Just one question. It wasn't very clear to me exactly when the metadata
would be returned in `ConsumerRecords`. Would we /always/ include the
metadata for all partitions that are assigned, or would it be based on the
latest fetches?

Thanks,
Jason

On Fri, Dec 11, 2020 at 4:07 PM John Roesler  wrote:

> Thanks, Guozhang!
>
> All of your feedback sounds good to me. I’ll update the KIP when I am able.
>
> 3) I believe it is the position after the fetch, but I will confirm. I
> think omitting position may render beginning and end offsets useless as
> well, which leaves only lag. That would be fine with me, but it also seems
> nice to supply this extra metadata since it is well defined and probably
> handy for others. Therefore, I’d go the route of specifying the exact
> semantics and keeping it.
>
> Thanks for the review,
> John
>
> On Fri, Dec 11, 2020, at 17:36, Guozhang Wang wrote:
> > Hello John,
> >
> > Thanks for the updates! I've made a pass on the KIP and also the POC PR,
> > here are some minor comments:
> >
> > 1) nit: "receivedTimestamp" -> it seems the metadata keep getting
> updated,
> > and we do not create a new object but just update the values in-place, so
> > maybe calling it `lastUpdateTimstamp` is better?
> >
> > 2) It will be great to verify in javadocs that the new API
> > "ConsumerRecords#metadata(): Map" may return a
> > superset of TopicPartitions than the existing API that returns the data
> by
> > partitions, in case users assume their map key-entries would always be
> the
> > same.
> >
> > 3) The "position()" API of the call needs better clarification: is it the
> > current position AFTER the records are returned, or is it BEFORE the
> > records are returned? Personally I'd suggest we do not include it if it
> is
> > not used anywhere yet just to avoid possible misuage, but I'm fine if you
> > like to keep it still; in that case just clarify its semantics.
> >
> >
> > Other than that,I'm +1 on the KIP as well !
> >
> >
> > Guozhang
> >
> >
> > On Fri, Dec 11, 2020 at 8:15 AM Walker Carlson 
> > wrote:
> >
> > > Thanks for the KIP!
> > >
> > > +1 (non-binding)
> > >
> > > walker
> > >
> > > On Wed, Dec 9, 2020 at 11:40 AM Bruno Cadonna 
> wrote:
> > >
> > > > Thanks for the KIP, John!
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Best,
> > > > Bruno
> > > >
> > > > On 08.12.20 18:03, John Roesler wrote:
> > > > > Hello all,
> > > > >
> > > > > There hasn't been much discussion on KIP-695 so far, so I'd
> > > > > like to go ahead and call for a vote.
> > > > >
> > > > > As a reminder, the purpose of KIP-695 to improve on the
> > > > > "task idling" feature we introduced in KIP-353. This KIP
> > > > > will allow Streams to offer deterministic time semantics in
> > > > > join-type topologies. For example, it makes sure that
> > > > > when you join two topics, that we collate the topics by
> > > > > timestamp. That was always the intent with task idling (KIP-
> > > > > 353), but it turns out the previous mechanism couldn't
> > > > > provide the desired semantics.
> > > > >
> > > > > The details are here:
> > > > > https://cwiki.apache.org/confluence/x/JSXZCQ
> > > > >
> > > > > Thanks,
> > > > > -John
> > > > >
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


[jira] [Created] (KAFKA-10860) JmxTool fails with NPE when object-name contains a wildcard

2020-12-16 Thread Bob Barrett (Jira)
Bob Barrett created KAFKA-10860:
---

 Summary: JmxTool fails with NPE when object-name contains a 
wildcard
 Key: KAFKA-10860
 URL: https://issues.apache.org/jira/browse/KAFKA-10860
 Project: Kafka
  Issue Type: Bug
Reporter: Bob Barrett


When running JmxTool with a wildcard in the object name, the tool fails with a 
NullPointerException:
{code:java}
bin/kafka-run-class kafka.tools.JmxTool --jmx-url 
service:jmx:rmi:///jndi/rmi://localhost:/jmxrmi --object-name 
kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=*
Trying to connect to JMX url: 
service:jmx:rmi:///jndi/rmi://localhost:/jmxrmi.
Exception in thread "main" java.lang.NullPointerException at 
kafka.tools.JmxTool$.main(JmxTool.scala:194) at 
kafka.tools.JmxTool.main(JmxTool.scala)
{code}
It seems that we never populate the `names` variable when the object name 
includes a pattern:
{code:java}
var names: Iterable[ObjectName] = null
def namesSet = Option(names).toSet.flatten
def foundAllObjects = queries.toSet == namesSet
val waitTimeoutMs = 1
if (!hasPatternQueries) {
  val start = System.currentTimeMillis
  do {
if (names != null) {
  System.err.println("Could not find all object names, retrying")
  Thread.sleep(100)
}
names = queries.flatMap((name: ObjectName) => mbsc.queryNames(name, 
null).asScala)
  } while (wait && System.currentTimeMillis - start < waitTimeoutMs && 
!foundAllObjects)
}

if (wait && !foundAllObjects) {
  val missing = (queries.toSet - namesSet).mkString(", ")
  System.err.println(s"Could not find all requested object names after 
$waitTimeoutMs ms. Missing $missing")
  System.err.println("Exiting.")
  sys.exit(1)
}

val numExpectedAttributes: Map[ObjectName, Int] =
  if (!attributesWhitelistExists)
names.map{name: ObjectName =>
  val mbean = mbsc.getMBeanInfo(name)
  (name, mbsc.getAttributes(name, 
mbean.getAttributes.map(_.getName)).size)}.toMap
  else {
if (!hasPatternQueries)
  names.map{name: ObjectName =>
val mbean = mbsc.getMBeanInfo(name)
val attributes = mbsc.getAttributes(name, 
mbean.getAttributes.map(_.getName))
val expectedAttributes = 
attributes.asScala.asInstanceOf[mutable.Buffer[Attribute]]
  .filter(attr => attributesWhitelist.get.contains(attr.getName))
(name, expectedAttributes.size)}.toMap.filter(_._2 > 0)
else
  queries.map((_, attributesWhitelist.get.length)).toMap
  }
{code}
We need to add logic to query the object names that match the pattern when a 
pattern is part of the input.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Jenkins build is back to normal : Kafka » kafka-trunk-jdk15 #340

2020-12-16 Thread Apache Jenkins Server
See 




Re: New Contributor

2020-12-16 Thread Jun Rao
Thanks for your interests. Added all 3 names to the jira contributors list.

Jun

On Wed, Dec 16, 2020 at 2:25 AM 山崎健史  wrote:

> Dear team. Could you please add us as a contributor for Apache Kafka?
>
> ・GitHub: zacky9664, JIRA username: zacky
> ・GitHub: moja0316, JIRA username: moja0316
> ・GitHub: runom, JIRA username: mintomio
>


Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

2020-12-16 Thread Colin McCabe
On Wed, Dec 16, 2020, at 09:59, Jun Rao wrote:
> Hi, Colin,
> 
> Thanks for the reply. A few more comments below.
> 

Hi Jun,

Thanks for the comments.

> 206. "RemoveTopic is the last step, that scrubs all metadata about the
> topic.  In order to get to that last step, the topic data needs to removed
> from all brokers (after each broker notices that the topic is being
> deleted)." Currently, this is done based on the response of
> StopReplicaRequest. Since the controller no longer sends this request, how
> does the controller know that the data for the deleted topic has
> been removed in the brokers?
> 

That's a good point... definitely an oversight.

It seems complex to force the controller to track when log directories have 
been deleted.  Let's just assume that KIP-516 has been implemented, and track 
them by UUID.  Then we can just have a single topic deletion record.

I added a section on "topic identifiers" describing this.

> 210. Thanks for the explanation. Sounds good to me.
> 
> 211. I still don't see an example when ShouldFence is set to true in
> BrokerHeartbeatReques. Could we add one?
> 

It is sent to true when the broker is first starting up and doesn't yet want to 
be unfenced.  I added a longer explanation of this in the "Broker Leases" 
section.

> 213. The KIP now allows replicas to be assigned on brokers that are fenced,
> which is an improvement. How do we permanently remove a broker (e.g.
> cluster shrinking) to prevent it from being used for future replica
> assignments?
> 

This is a fair point.  I will create a kafka-cluster.sh script which can do 
this, plus a DecommissionBrokerRequest.

As a bonus the kafka-cluster.sh script can help people find the cluster ID of 
brokers, something that people have wanted a tool for in the past.

> 214. "Currently, when a node is down, all of its ZK registration
> information is gone.  But  we need this information in order to understand
> things like whether the replicas of a particular partition are
> well-balanced across racks." This is not quite true. Currently, even when
> ZK registration is gone, the existing replica assignment is still available
> in the metadata response.
> 

I agree that the existing replica assignment is still available.  But that just 
tells you partition X is on nodes A, B, and C.  If you don't have the ZK 
registration for one or more of A, B, or C then you don't know whether we are 
following the policy of "two replicas on one rack, one replica on another."  Or 
any other more complex rack placement policy that you might have.

best,
Colin

> Jun
> 
> On Wed, Dec 16, 2020 at 8:48 AM Colin McCabe  wrote:
> 
> > On Tue, Dec 15, 2020, at 13:08, Jun Rao wrote:
> > > Hi, Colin,
> > >
> > > Thanks for the reply. A few more follow up comments.
> > >
> > > 210. initial.broker.registration.timeout.ms: The default value is 90sec,
> > > which seems long. If a broker fails the registration because of incorrect
> > > configs, we want to fail faster. In comparison, the defaults for
> > > zookeeper.connection.timeout.ms is 18 secs.
> > >
> >
> > Hi Jun,
> >
> > I agree that the initial connection timeout here is longer than what we
> > had with ZK.  The main reason I selected a slightly longer timeout here is
> > to handle the case where the controllers and the brokers are co-located.
> > For example, if you have a 3 node cluster and all three nodes are
> > controllers+brokers, when you first bring up the cluster, we will have to
> > stand up the controller quorum and then handle broker registrations.
> > Although we believe Raft will start up relatively quickly, it's good to
> > leave some extra margin here.
> >
> > I don't think there's a big disadvantage to having a slightly longer
> > timeout here.  After all, starting up the brokers with no controllers is an
> > admin mistake, which we don't expect to see very often.  Maybe let's set it
> > to 60 seconds for now and maybe see if we can tweak it in the future if
> > that turns out to be too long or short?
> >
> > > 211. "Once they are all moved, the controller responds to the heartbeat
> > > with a nextState of SHUTDOWN." It seems that nextState is no longer
> > > relevant. Also, could you add a bit more explanation on when ShouldFence
> > is
> > > set to true in BrokerHeartbeatRequest?
> > >
> >
> > Good catch.  I removed the obsolete section referring to nextState and
> > added a reference to the new boolean.  I also added some more information
> > about ShouldFence and about the rationale for separating fencing from
> > registration.
> >
> > > 212. Related to the above, what does the broker do if IsFenced is true in
> > > the BrokerHeartbeatResponse? Will the broker do the same thing on
> > receiving
> > > a FenceBrokerRecord from the metadata log?
> > >
> >
> > The broker only checks this boolean during the startup process.  After the
> > startup process is finished, it ignores this boolean.
> >
> > The broker uses fence / unfence records in the metadata log to determ

Re: [VOTE] voting on KIP-631: the quorum-based Kafka controller

2020-12-16 Thread Colin McCabe
On Wed, Dec 16, 2020, at 10:10, Jason Gustafson wrote:
> +1 Thanks Colin for all the iterations. My only request is to change
> "controller.connect" to "controller.quorum.voters." I think it's important
> to emphasize that this must be the full set of voters unlike
> "zookeeper.connect." In the future, I think we can consider supporting an
> additional config like "controller.connect" for brokers which can discover
> the voters more dynamically.
> 

Thanks, Jason.  I have changed controller.connect to controller.quorum.voters.

cheers,
Colin

> Best,
> Jason
> 
> On Wed, Dec 16, 2020 at 7:36 AM Unmesh Joshi  wrote:
> 
> > Went through the changes since the last discussion thread, and it's looking
> > in good shape. Thanks!.
> > + 1 (non-binding)
> >
> > On Wed, Dec 16, 2020 at 4:34 PM Tom Bentley  wrote:
> >
> > > Thanks for the KIP Colin, it does a great job of clearly explaining some
> > > pretty complex changes.
> > >
> > > +1 (non-binding)
> > >
> > > Tom
> > >
> > >
> > >
> > > On Tue, Dec 15, 2020 at 7:13 PM Boyang Chen 
> > > wrote:
> > >
> > > > Thanks Colin for the great work to polish the KIP and reach this final
> > > > stage. +1 (binding) from me
> > > >
> > > > On Tue, Dec 15, 2020 at 9:11 AM David Arthur  wrote:
> > > >
> > > > > Colin, thanks for driving this. I just read through the KIP again
> > and I
> > > > > think it is in good shape. Exciting stuff!
> > > > >
> > > > > +1 binding
> > > > >
> > > > > -David
> > > > >
> > > > > On Sat, Dec 12, 2020 at 7:46 AM Ron Dagostino 
> > > wrote:
> > > > >
> > > > > > Thanks for shepherding this KIP through the extended discussion,
> > > Colin.
> > > > > I
> > > > > > think we’ve ended up in a good place.  I’m sure there will be more
> > > > tweaks
> > > > > > along the way, but the fundamentals are in place.  +1 (non-binding)
> > > > from
> > > > > me.
> > > > > >
> > > > > > Ron
> > > > > >
> > > > > > > On Dec 11, 2020, at 4:39 PM, Colin McCabe 
> > > > wrote:
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I'd like to restart the vote on KIP-631: the quorum-based Kafka
> > > > > > Controller.  The KIP is here:
> > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/x/4RV4CQ
> > > > > > >
> > > > > > > The original DISCUSS thread is here:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://lists.apache.org/thread.html/r1ed098a88c489780016d963b065e8cb450a9080a4736457cd25f323c%40%3Cdev.kafka.apache.org%3E
> > > > > > >
> > > > > > > There is also a second email DISCUSS thread, which is here:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://lists.apache.org/thread.html/r1ed098a88c489780016d963b065e8cb450a9080a4736457cd25f323c%40%3Cdev.kafka.apache.org%3E
> > > > > > >
> > > > > > > Please take a look and vote if you can.
> > > > > > >
> > > > > > > best,
> > > > > > > Colin
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > David Arthur
> > > > >
> > > >
> > >
> >
>


RE: New Contributor

2020-12-16 Thread tyamasak91
Thanks a lot.

-Original Message-
From: Jun Rao 
Sent: Thursday, December 17, 2020 4:33 AM
To: dev 
Subject: Re: New Contributor

Thanks for your interests. Added all 3 names to the jira contributors list.

Jun

On Wed, Dec 16, 2020 at 2:25 AM 山崎健史  wrote:

> Dear team. Could you please add us as a contributor for Apache Kafka?
>
> ・GitHub: zacky9664, JIRA username: zacky
> ・GitHub: moja0316, JIRA username: moja0316
> ・GitHub: runom, JIRA username: mintomio
>


--
このEメールはアバスト アンチウイルスによりウイルススキャンされています。
https://www.avast.com/antivirus



Re: [VOTE] voting on KIP-631: the quorum-based Kafka controller

2020-12-16 Thread Ismael Juma
Thanks for all the work on the KIP. Given the magnitude of the KIP, I
expect that some tweaks will be made as the code is implemented, reviewed
and tested. I'm overall +1 (binding).

A few comments below:
1. It's a bit weird for kafka-storage to output a random uuid. Would it be
better to have a dedicated command for that? Also, since we use base64
encoded uuids nearly everywhere (including cluster and topic ids), it would
be good to follow that pattern instead of the less compact
"51380268-1036-410d-a8fc-fb3b55f48033".
2. This is a nit, but I think it would be better to talk about built-in
quorum mode instead of KIP-500 mode. It's more self descriptive than a KIP
reference.
3. Did we consider using `session` (like the group coordinator) instead of
`regsitration` in `broker.registration.timeout.ms`?
4. The flat id space for the controller and broker while requiring a
different id in embedded mode seems a bit unintuitive. Are there any other
systems that do this? I know we covered some of the reasons in the "Shared
IDs between Multiple Nodes" rejected alternatives section, but it didn't
seem totally convincing to me.
5. With regards to the controller process listening on a separate port, it
may be worth adding a sentence about the forwarding KIP as that is a main
reason why the controller port doesn't need to be accessible.
6. The internal topic seems to be called @metadata. I'm personally not
convinced about the usage of @ in this way. I think I would go with the
same convention we have used for other existing internal topics.
7. We talk about the metadata.format feature flag. Is this intended to
allow for single roll upgrades?
8. Could the incarnation id be called registration id? Or is there a reason
why this would be a bad name?
9. Could `CurMetadataOffset` be called `CurrentMetadataOffset` for
`BrokerRegistrationRequest`? The abbreviation here doesn't seem to help
much and makes things slightly less readable. It would also make it
consistent with `BrokerHeartbeatRequest`.
10. Should `UnregisterBrokerRecord` be `DeregisterBrokerRecord`?
11. Broker metrics typically have a PerSec suffix, should we stick with
that for the `MetadataCommitRate`?
12. For the lag metrics, would it be clearer if we included "Offset" in the
name? In theory, we could have time based lag metrics too. Having said
that, existing offset lag metrics do seem to just have `Lag` in their name
without further qualification.

Ismael

On Fri, Dec 11, 2020 at 1:41 PM Colin McCabe  wrote:

> Hi all,
>
> I'd like to restart the vote on KIP-631: the quorum-based Kafka
> Controller.  The KIP is here:
>
> https://cwiki.apache.org/confluence/x/4RV4CQ
>
> The original DISCUSS thread is here:
>
>
> https://lists.apache.org/thread.html/r1ed098a88c489780016d963b065e8cb450a9080a4736457cd25f323c%40%3Cdev.kafka.apache.org%3E
>
> There is also a second email DISCUSS thread, which is here:
>
>
> https://lists.apache.org/thread.html/r1ed098a88c489780016d963b065e8cb450a9080a4736457cd25f323c%40%3Cdev.kafka.apache.org%3E
>
> Please take a look and vote if you can.
>
> best,
> Colin
>


[GitHub] [kafka-site] vvcephei commented on a change in pull request #314: [minor] Adding many more companies to powered by page

2020-12-16 Thread GitBox


vvcephei commented on a change in pull request #314:
URL: https://github.com/apache/kafka-site/pull/314#discussion_r544622598



##
File path: powered-by.html
##
@@ -237,6 +292,26 @@
 "logo": "ipinyou.png",
 "logoBgColor": "#ff",
 "description": "The largest DSP in China which has its HQ in Beijing 
and offices in Shanghai, Guangzhou, Silicon Valley and Seattle. Kafka clusters 
are the central data hub in iPinYou. All kinds of Internet display advertising 
data, such as bid/no-bid, impression, click, advertiser, conversion and etc., 
are collected as primary data streams into Kafka brokers in real time, by 
LogAggregator (a substitute for Apache Flume, which is implemented in C/C++ by 
iPinYou, has customized functionality, better performance, lower 
resource-consuming)."
+}, {
+"link": "https://www.ironsrc.com/";,
+"logo": "ironsource.png",
+"logoBgColor": "#ff",
+"description": "ironSource powers the growth of the world's top games, 
using Apache Kafka as the backbone infrastructure for the async messaging of 
millions of events per second that run through their industry-leading game 
growth platform. In addition ironSource uses the Kafka Stream API to handle 
multiple real-time use cases, such as budget management, monitoring and 
alerting."

Review comment:
   ```suggestion
   "description": "ironSource powers the growth of the world's top 
games, using Apache Kafka as the backbone infrastructure for the async 
messaging of millions of events per second that run through their 
industry-leading game growth platform. In addition ironSource uses the Kafka 
Streams API to handle multiple real-time use cases, such as budget management, 
monitoring and alerting."
   ```

##
File path: powered-by.html
##
@@ -62,6 +77,11 @@
 "logo": "ants.png",
 "logoBgColor": "#ff",
 "description": "Ants.vn use Kafka in production for stream processing 
and log transfer (over 5B messages/month and growing)"
+}, {
+"link": "http://appsflyer.com/";,
+"logo": "ants.png",
+"logoBgColor": "#ff",
+"description": "Ants.vn use Kafka in production for stream processing 
and log transfer (over 5B messages/month and growing)"

Review comment:
   ```suggestion
   "description": "Ants.vn uses Kafka in production for stream 
processing and log transfer (over 5B messages/month and growing)"
   ```

##
File path: powered-by.html
##
@@ -237,6 +292,26 @@
 "logo": "ipinyou.png",
 "logoBgColor": "#ff",
 "description": "The largest DSP in China which has its HQ in Beijing 
and offices in Shanghai, Guangzhou, Silicon Valley and Seattle. Kafka clusters 
are the central data hub in iPinYou. All kinds of Internet display advertising 
data, such as bid/no-bid, impression, click, advertiser, conversion and etc., 
are collected as primary data streams into Kafka brokers in real time, by 
LogAggregator (a substitute for Apache Flume, which is implemented in C/C++ by 
iPinYou, has customized functionality, better performance, lower 
resource-consuming)."
+}, {
+"link": "https://www.ironsrc.com/";,
+"logo": "ironsource.png",
+"logoBgColor": "#ff",
+"description": "ironSource powers the growth of the world's top games, 
using Apache Kafka as the backbone infrastructure for the async messaging of 
millions of events per second that run through their industry-leading game 
growth platform. In addition ironSource uses the Kafka Stream API to handle 
multiple real-time use cases, such as budget management, monitoring and 
alerting."
+}, {
+"link": "http://banno.com";,
+"logo": "JHD_Logo.jpg",
+"logoBgColor": "#ff",
+"description": "The Banno Digital Platform from Jack Henry enables 
community financial institutions to provide world-class service in today’s 
digital landscape. The Banno team integrates various streams of data through 
Apache Kafka, reacting to events as they occur, to provide innovative banking 
solutions."
+}, {
+"link": "https://www.kuaishou.com/en";,
+"logo": "KuaishouLogo.png",
+"logoBgColor": "#FF",
+"description": "At kuaishou, Kafka is used as the backbone of realtime 
data stream, including online training, data integration, realtime data 
processing, service asynchronous interaction processing and cache data 
synchronization."
+}, {
+"link": "https://www.laredoute-corporate.com";,
+"logo": "LaRedoute.svg",
+"logoBgColor": "#FF",
+"description": "La Redoute, the digital platform for families, uses 
Kafka as a central nervous system to decouple its application through business 
events. Thus, enabling a decentralized event-driven architecture bringing 
near-real-time data reporting, analytics and emerging AI-pipelines combining 
Kafka Connect, Kafka

Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

2020-12-16 Thread Jun Rao
Hi, Colin,

Thanks for the reply. A few follow up comments.

211. When does the broker send the BrokerRegistration request to the
controller? Is it after the recovery phase? If so, at that point, the
broker has already caught up on the metadata (in order to clean up deleted
partitions). Then, it seems we don't need the ShouldFence field
in BrokerHeartbeatRequest?

213. kafka-cluster.sh
213.1 For the decommision example, should the command take a broker id?
213.2 Existing tools use the "--" command line option (e.g. kafka-topics
--list --topic test). Should we follow the same convention
for kafka-cluster.sh (and kafka-storage.sh)?
213.3 Should we add an admin api for broker decommission so that this can
be done programmatically?

220. DecommissionBroker: "NOT_CONTROLLER if the node that the request was
sent to is not the controller". I thought the clients never send a request
to the controller directly now and the broker will forward it to the
controller?

221. Could we add the required ACL for the new requests?

Jun

On Wed, Dec 16, 2020 at 11:38 AM Colin McCabe  wrote:

> On Wed, Dec 16, 2020, at 09:59, Jun Rao wrote:
> > Hi, Colin,
> >
> > Thanks for the reply. A few more comments below.
> >
>
> Hi Jun,
>
> Thanks for the comments.
>
> > 206. "RemoveTopic is the last step, that scrubs all metadata about the
> > topic.  In order to get to that last step, the topic data needs to
> removed
> > from all brokers (after each broker notices that the topic is being
> > deleted)." Currently, this is done based on the response of
> > StopReplicaRequest. Since the controller no longer sends this request,
> how
> > does the controller know that the data for the deleted topic has
> > been removed in the brokers?
> >
>
> That's a good point... definitely an oversight.
>
> It seems complex to force the controller to track when log directories
> have been deleted.  Let's just assume that KIP-516 has been implemented,
> and track them by UUID.  Then we can just have a single topic deletion
> record.
>
> I added a section on "topic identifiers" describing this.
>
> > 210. Thanks for the explanation. Sounds good to me.
> >
> > 211. I still don't see an example when ShouldFence is set to true in
> > BrokerHeartbeatReques. Could we add one?
> >
>
> It is sent to true when the broker is first starting up and doesn't yet
> want to be unfenced.  I added a longer explanation of this in the "Broker
> Leases" section.
>
> > 213. The KIP now allows replicas to be assigned on brokers that are
> fenced,
> > which is an improvement. How do we permanently remove a broker (e.g.
> > cluster shrinking) to prevent it from being used for future replica
> > assignments?
> >
>
> This is a fair point.  I will create a kafka-cluster.sh script which can
> do this, plus a DecommissionBrokerRequest.
>
> As a bonus the kafka-cluster.sh script can help people find the cluster ID
> of brokers, something that people have wanted a tool for in the past.
>
> > 214. "Currently, when a node is down, all of its ZK registration
> > information is gone.  But  we need this information in order to
> understand
> > things like whether the replicas of a particular partition are
> > well-balanced across racks." This is not quite true. Currently, even when
> > ZK registration is gone, the existing replica assignment is still
> available
> > in the metadata response.
> >
>
> I agree that the existing replica assignment is still available.  But that
> just tells you partition X is on nodes A, B, and C.  If you don't have the
> ZK registration for one or more of A, B, or C then you don't know whether
> we are following the policy of "two replicas on one rack, one replica on
> another."  Or any other more complex rack placement policy that you might
> have.
>
> best,
> Colin
>
> > Jun
> >
> > On Wed, Dec 16, 2020 at 8:48 AM Colin McCabe  wrote:
> >
> > > On Tue, Dec 15, 2020, at 13:08, Jun Rao wrote:
> > > > Hi, Colin,
> > > >
> > > > Thanks for the reply. A few more follow up comments.
> > > >
> > > > 210. initial.broker.registration.timeout.ms: The default value is
> 90sec,
> > > > which seems long. If a broker fails the registration because of
> incorrect
> > > > configs, we want to fail faster. In comparison, the defaults for
> > > > zookeeper.connection.timeout.ms is 18 secs.
> > > >
> > >
> > > Hi Jun,
> > >
> > > I agree that the initial connection timeout here is longer than what we
> > > had with ZK.  The main reason I selected a slightly longer timeout
> here is
> > > to handle the case where the controllers and the brokers are
> co-located.
> > > For example, if you have a 3 node cluster and all three nodes are
> > > controllers+brokers, when you first bring up the cluster, we will have
> to
> > > stand up the controller quorum and then handle broker registrations.
> > > Although we believe Raft will start up relatively quickly, it's good to
> > > leave some extra margin here.
> > >
> > > I don't think there's a big disadvantage to having a slightly longer
>

[GitHub] [kafka-site] scott-confluent commented on pull request #314: [minor] Adding many more companies to powered by page

2020-12-16 Thread GitBox


scott-confluent commented on pull request #314:
URL: https://github.com/apache/kafka-site/pull/314#issuecomment-747058796


   @vvcephei yep, I've rendered the page. happy to provide a screenshot of the 
updates



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka-site] vvcephei merged pull request #314: [minor] Adding many more companies to powered by page

2020-12-16 Thread GitBox


vvcephei merged pull request #314:
URL: https://github.com/apache/kafka-site/pull/314


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka-site] scott-confluent opened a new pull request #316: Powered by update

2020-12-16 Thread GitBox


scott-confluent opened a new pull request #316:
URL: https://github.com/apache/kafka-site/pull/316


   Something got weird when merging suggestions with Github's UI. Fixing the 
duplicate.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka-site] vvcephei merged pull request #316: HOTFIX: Fix powered-by duplicate

2020-12-16 Thread GitBox


vvcephei merged pull request #316:
URL: https://github.com/apache/kafka-site/pull/316


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Reopened] (KAFKA-10140) Incremental config api excludes plugin config changes

2020-12-16 Thread Bill Bejeck (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck reopened KAFKA-10140:
-

I resolved this by mistake, reopening now

> Incremental config api excludes plugin config changes
> -
>
> Key: KAFKA-10140
> URL: https://issues.apache.org/jira/browse/KAFKA-10140
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Critical
> Fix For: 2.7.0
>
>
> I was trying to alter the jmx metric filters using the incremental alter 
> config api and hit this error:
> ```
> java.util.NoSuchElementException: key not found: metrics.jmx.blacklist
>   at scala.collection.MapLike.default(MapLike.scala:235)
>   at scala.collection.MapLike.default$(MapLike.scala:234)
>   at scala.collection.AbstractMap.default(Map.scala:65)
>   at scala.collection.MapLike.apply(MapLike.scala:144)
>   at scala.collection.MapLike.apply$(MapLike.scala:143)
>   at scala.collection.AbstractMap.apply(Map.scala:65)
>   at kafka.server.AdminManager.listType$1(AdminManager.scala:681)
>   at 
> kafka.server.AdminManager.$anonfun$prepareIncrementalConfigs$1(AdminManager.scala:693)
>   at 
> kafka.server.AdminManager.prepareIncrementalConfigs(AdminManager.scala:687)
>   at 
> kafka.server.AdminManager.$anonfun$incrementalAlterConfigs$1(AdminManager.scala:618)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:273)
>   at scala.collection.immutable.Map$Map1.foreach(Map.scala:154)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:273)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:266)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> kafka.server.AdminManager.incrementalAlterConfigs(AdminManager.scala:589)
>   at 
> kafka.server.KafkaApis.handleIncrementalAlterConfigsRequest(KafkaApis.scala:2698)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:188)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:78)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> ```
> It looks like we are only allowing changes to the keys defined in 
> `KafkaConfig` through this API. This excludes config changes to any plugin 
> components such as `JmxReporter`. 
> Note that I was able to use the regular `alterConfig` API to change this 
> config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] voting on KIP-631: the quorum-based Kafka controller

2020-12-16 Thread Colin McCabe
On Wed, Dec 16, 2020, at 13:08, Ismael Juma wrote:
> Thanks for all the work on the KIP. Given the magnitude of the KIP, I
> expect that some tweaks will be made as the code is implemented, reviewed
> and tested. I'm overall +1 (binding).
> 

Thanks, Ismael.

> A few comments below:
> 1. It's a bit weird for kafka-storage to output a random uuid. Would it be
> better to have a dedicated command for that?

I'm not sure.  The nice thing about putting it in kafka-storage.sh is that it's 
there when you need it.  I also think that having subcommands, like we do here, 
really reduces the "clutter" that we have in some other command-line tools.  
When you get help about the "info" subcommand, you don't see flags for any 
other subcommand, for example.  I guess we can move this later if it seems more 
intuitive though.

> Also, since we use base64
> encoded uuids nearly everywhere (including cluster and topic ids), it would
> be good to follow that pattern instead of the less compact
> "51380268-1036-410d-a8fc-fb3b55f48033".

Good idea.  I have updated this to use base64 encoded UUIDs.

> 2. This is a nit, but I think it would be better to talk about built-in
> quorum mode instead of KIP-500 mode. It's more self descriptive than a KIP
> reference.

I do like the sound of "quorum mode."  I guess the main question is, if we 
later implement raft quorums for regular topics, would that nomenclature be 
confusing?  I guess we could talk about "metadata quorum mode" to avoid 
confusion.  Hmm.

> 3. Did we consider using `session` (like the group coordinator) instead of
> `regsitration` in `broker.registration.timeout.ms`?

Hmm, broker.session.timeout.ms does sound better.  I changed it to that.

> 4. The flat id space for the controller and broker while requiring a
> different id in embedded mode seems a bit unintuitive. Are there any other
> systems that do this? I know we covered some of the reasons in the "Shared
> IDs between Multiple Nodes" rejected alternatives section, but it didn't
> seem totally convincing to me.

One of my concerns here is that using separate ID spaces for controllers versus 
brokers would potentially lead to metrics or logging collisions.  We can take a 
look at that again once the implementation is further along, I guess, to see 
how often that is a problem in practice.

> 5. With regards to the controller process listening on a separate port, it
> may be worth adding a sentence about the forwarding KIP as that is a main
> reason why the controller port doesn't need to be accessible.

Good idea... I added a short reference to KIP-590 in the "Networking" section.

> 6. The internal topic seems to be called @metadata. I'm personally not
> convinced about the usage of @ in this way. I think I would go with the
> same convention we have used for other existing internal topics.

I knew this one would be controversial :)

I guess the main argument here is that using @ avoids collisions with any 
existing topic.  Leading underscores, even double underscores, can be used by 
users to create new topics, but an "at sign" cannot  It would be nice to have a 
namespace for system topics that we knew nobody else could break into.

> 7. We talk about the metadata.format feature flag. Is this intended to
> allow for single roll upgrades?
> 8. Could the incarnation id be called registration id? Or is there a reason
> why this would be a bad name?

I liked "incarnation id" because it expresses the idea that each new 
incarnation of the broker gets a different one.  I think "registration id" 
might be confused with "the broker id is the ID we're registering."

> 9. Could `CurMetadataOffset` be called `CurrentMetadataOffset` for
> `BrokerRegistrationRequest`? The abbreviation here doesn't seem to help
> much and makes things slightly less readable. It would also make it
> consistent with `BrokerHeartbeatRequest`.

Yeah, the abbreviated name is inconsistent.  I will change it to 
CurrentMetadataOffset.

> 10. Should `UnregisterBrokerRecord` be `DeregisterBrokerRecord`?

Hmm, "Register/Unregister" is more consistent with "Fence/Unfence" which is why 
I went with Unregister.  It looks like they're both in the dictionary, so I'm 
not sure if "deregister" has an advantage...

> 11. Broker metrics typically have a PerSec suffix, should we stick with
> that for the `MetadataCommitRate`?

Added.

> 12. For the lag metrics, would it be clearer if we included "Offset" in the
> name? In theory, we could have time based lag metrics too. Having said
> that, existing offset lag metrics do seem to just have `Lag` in their name
> without further qualification.
>

Yeah, I think including Offset does make it a bit clearer.  Added.
 
best,
Colin


> Ismael
> 
> On Fri, Dec 11, 2020 at 1:41 PM Colin McCabe  wrote:
> 
> > Hi all,
> >
> > I'd like to restart the vote on KIP-631: the quorum-based Kafka
> > Controller.  The KIP is here:
> >
> > https://cwiki.apache.org/confluence/x/4RV4CQ
> >
> > The original DISCUSS thread is here:

Build failed in Jenkins: Kafka » kafka-trunk-jdk11 #320

2020-12-16 Thread Apache Jenkins Server
See 


Changes:

[github] MINOR: Simplify ApiKeys by relying on ApiMessageType (#9748)


--
[...truncated 6.99 MB...]
org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@2a72cadb,
 timestamped = true, caching = true, logging = true] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@2a72cadb,
 timestamped = true, caching = true, logging = true] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@79df5bcd,
 timestamped = true, caching = true, logging = true] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@79df5bcd,
 timestamped = true, caching = true, logging = true] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@4e46af2d,
 timestamped = true, caching = true, logging = false] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@4e46af2d,
 timestamped = true, caching = true, logging = false] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@23d3ebeb,
 timestamped = true, caching = true, logging = false] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@23d3ebeb,
 timestamped = true, caching = true, logging = false] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@1fcbc958,
 timestamped = true, caching = true, logging = false] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@1fcbc958,
 timestamped = true, caching = true, logging = false] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@1cc9e85f,
 timestamped = true, caching = false, logging = true] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@1cc9e85f,
 timestamped = true, caching = false, logging = true] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@670d032c,
 timestamped = true, caching = false, logging = true] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@670d032c,
 timestamped = true, caching = false, logging = true] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@5bbd3095,
 timestamped = true, caching = false, logging = true] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@5bbd3095,
 timestamped = true, caching = false, logging = true] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@1afdb3b3,
 timestamped = true, caching = false, logging = false] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@1afdb3b3,
 timestamped = true, caching = false, logging = false] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.TimestampedWindowStoreBuilder@49a53ba3,
 timestamped = true, caching = false, logging = false] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTe

[VOTE] KIP-687: Automatic Reloading of Security Store

2020-12-16 Thread Boyang Chen
Hey there,

I would like to start the voting for KIP-687:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-687
%3A+Automatic+Reloading+of+Security+Store to make the security store
reloading automated.

Best,
Boyang


[jira] [Created] (KAFKA-10861) Flaky test `TransactionsTest.testFencingOnSendOffsets`

2020-12-16 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-10861:
---

 Summary: Flaky test `TransactionsTest.testFencingOnSendOffsets`
 Key: KAFKA-10861
 URL: https://issues.apache.org/jira/browse/KAFKA-10861
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


{code}
org.scalatest.exceptions.TestFailedException: Got an unexpected exception from 
a fenced producer.
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)
at org.scalatest.Assertions.fail(Assertions.scala:1107)
at org.scalatest.Assertions.fail$(Assertions.scala:1103)
at org.scalatest.Assertions$.fail(Assertions.scala:1389)
at 
kafka.api.TransactionsTest.testFencingOnSendOffsets(TransactionsTest.scala:373)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at Caused by: 
org.apache.kafka.common.errors.InvalidProducerEpochException: Producer 
attempted to produce with an old epoch (producerId=0, epoch=0)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

2020-12-16 Thread Colin McCabe
On Wed, Dec 16, 2020, at 13:40, Jun Rao wrote:
> Hi, Colin,
> 
> Thanks for the reply. A few follow up comments.
> 
> 211. When does the broker send the BrokerRegistration request to the
> controller? Is it after the recovery phase? If so, at that point, the
> broker has already caught up on the metadata (in order to clean up deleted
> partitions). Then, it seems we don't need the ShouldFence field
> in BrokerHeartbeatRequest?

Hi Jun,

Thanks again for the reviews.

The broker sends the registration request as soon as it starts up.  It cannot 
wait until the recovery phase is over since sometimes log recovery can take 
quite a long time.

> 
> 213. kafka-cluster.sh
> 213.1 For the decommision example, should the command take a broker id?

Yes, the example should have a broker id.  Fixed.

> 213.2 Existing tools use the "--" command line option (e.g. kafka-topics
> --list --topic test). Should we follow the same convention
> for kafka-cluster.sh (and kafka-storage.sh)?

Hmm.  I don't think argparse4j supports using double dashes to identify 
subcommands.  I think it might be confusing as well, since the subcommand must 
come first, unlike a plain old argument which can be anywhere on the command 
line.

> 213.3 Should we add an admin api for broker decommission so that this can
> be done programmatically?
> 

Yes, there is an admin client API for decommissioning as well.

> 220. DecommissionBroker: "NOT_CONTROLLER if the node that the request was
> sent to is not the controller". I thought the clients never send a request
> to the controller directly now and the broker will forward it to the
> controller?
> 

If the controller moved recently, it's possible that the broker could send to a 
controller that has just recently become inactive.  In that case NOT_CONTROLLER 
would be returned.  (A standby controller returns NOT_CONTROLLER for most APIs).

> 221. Could we add the required ACL for the new requests?
> 

Good point.  I added the required ACL for each new RPC.

best,
Colin


> Jun
> 
> On Wed, Dec 16, 2020 at 11:38 AM Colin McCabe  wrote:
> 
> > On Wed, Dec 16, 2020, at 09:59, Jun Rao wrote:
> > > Hi, Colin,
> > >
> > > Thanks for the reply. A few more comments below.
> > >
> >
> > Hi Jun,
> >
> > Thanks for the comments.
> >
> > > 206. "RemoveTopic is the last step, that scrubs all metadata about the
> > > topic.  In order to get to that last step, the topic data needs to
> > removed
> > > from all brokers (after each broker notices that the topic is being
> > > deleted)." Currently, this is done based on the response of
> > > StopReplicaRequest. Since the controller no longer sends this request,
> > how
> > > does the controller know that the data for the deleted topic has
> > > been removed in the brokers?
> > >
> >
> > That's a good point... definitely an oversight.
> >
> > It seems complex to force the controller to track when log directories
> > have been deleted.  Let's just assume that KIP-516 has been implemented,
> > and track them by UUID.  Then we can just have a single topic deletion
> > record.
> >
> > I added a section on "topic identifiers" describing this.
> >
> > > 210. Thanks for the explanation. Sounds good to me.
> > >
> > > 211. I still don't see an example when ShouldFence is set to true in
> > > BrokerHeartbeatReques. Could we add one?
> > >
> >
> > It is sent to true when the broker is first starting up and doesn't yet
> > want to be unfenced.  I added a longer explanation of this in the "Broker
> > Leases" section.
> >
> > > 213. The KIP now allows replicas to be assigned on brokers that are
> > fenced,
> > > which is an improvement. How do we permanently remove a broker (e.g.
> > > cluster shrinking) to prevent it from being used for future replica
> > > assignments?
> > >
> >
> > This is a fair point.  I will create a kafka-cluster.sh script which can
> > do this, plus a DecommissionBrokerRequest.
> >
> > As a bonus the kafka-cluster.sh script can help people find the cluster ID
> > of brokers, something that people have wanted a tool for in the past.
> >
> > > 214. "Currently, when a node is down, all of its ZK registration
> > > information is gone.  But  we need this information in order to
> > understand
> > > things like whether the replicas of a particular partition are
> > > well-balanced across racks." This is not quite true. Currently, even when
> > > ZK registration is gone, the existing replica assignment is still
> > available
> > > in the metadata response.
> > >
> >
> > I agree that the existing replica assignment is still available.  But that
> > just tells you partition X is on nodes A, B, and C.  If you don't have the
> > ZK registration for one or more of A, B, or C then you don't know whether
> > we are following the policy of "two replicas on one rack, one replica on
> > another."  Or any other more complex rack placement policy that you might
> > have.
> >
> > best,
> > Colin
> >
> > > Jun
> > >
> > > On Wed, Dec 16, 2020 at 8:48 AM Colin McCabe  wrote:

[jira] [Created] (KAFKA-10862) kafak stream consume from the earliest by default

2020-12-16 Thread Yuexi Liu (Jira)
Yuexi Liu created KAFKA-10862:
-

 Summary: kafak stream consume from the earliest by default
 Key: KAFKA-10862
 URL: https://issues.apache.org/jira/browse/KAFKA-10862
 Project: Kafka
  Issue Type: Bug
  Components: config, consumer
Affects Versions: 2.3.1
 Environment: MAC
Reporter: Yuexi Liu


on [https://kafka.apache.org/documentation/#auto.offset.reset] it shows 
auto.offset.reset is by default using latest, but from code, it is not
[https://github.com/apache/kafka/blob/72918a98161ba71ff4fa8116fdf8ed02b09a0580/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L884]
and when I create a kafka stream without specified offset, it consumed from the 
beginning



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: Kafka » kafka-trunk-jdk15 #341

2020-12-16 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-9126: KIP-689: StreamJoined changelog configuration (#9708)


--
[...truncated 3.49 MB...]
org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.WindowStoreBuilder@45e38b27, 
timestamped = false, caching = false, logging = true] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.WindowStoreBuilder@d75d54f, 
timestamped = false, caching = false, logging = true] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.WindowStoreBuilder@d75d54f, 
timestamped = false, caching = false, logging = true] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.WindowStoreBuilder@64fc7bc8, 
timestamped = false, caching = false, logging = false] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.WindowStoreBuilder@64fc7bc8, 
timestamped = false, caching = false, logging = false] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.WindowStoreBuilder@83be613, 
timestamped = false, caching = false, logging = false] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.WindowStoreBuilder@83be613, 
timestamped = false, caching = false, logging = false] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.WindowStoreBuilder@51d8ee80, 
timestamped = false, caching = false, logging = false] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.WindowStoreBuilder@51d8ee80, 
timestamped = false, caching = false, logging = false] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.SessionStoreBuilder@72249659, 
timestamped = false, caching = true, logging = true] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.SessionStoreBuilder@72249659, 
timestamped = false, caching = true, logging = true] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.SessionStoreBuilder@758181ee, 
timestamped = false, caching = true, logging = true] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.SessionStoreBuilder@758181ee, 
timestamped = false, caching = true, logging = true] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.SessionStoreBuilder@225dea33, 
timestamped = false, caching = true, logging = false] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.SessionStoreBuilder@225dea33, 
timestamped = false, caching = true, logging = false] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.SessionStoreBuilder@3467ac16, 
timestamped = false, caching = true, logging = false] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.SessionStoreBuilder@3467ac16, 
timestamped = false, caching = true, logging = false] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.SessionStoreBuilder@62a7729d, 
timestamped = false, caching = false, logging = true] STARTED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.SessionStoreBuilder@62a7729d, 
timestamped = false, caching = false, logging = true] PASSED

org.apache.kafka.streams.test.MockProcessorContextStateStoreTest > 
shouldEitherInitOrThrow[builder = 
org.apache.kafka.streams.state.internals.SessionStoreBuilder@1bd011d6, 
timestamped = false, caching = false, logging = true] STARTED

o

Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

2020-12-16 Thread Jun Rao
Hi, Colin,

Thanks for the reply. Just a couple of more comments.

211. Currently, the broker only registers itself in ZK after log recovery.
Is there any benefit to change that? As you mentioned, the broker can't do
much before completing log recovery.

230. Regarding MetadataResponse, there is a slight awkwardness. We return
rack for each node. However, if that node is for the controller, the rack
field is not really relevant. Should we clean it up here or in another KIP
like KIP-700?

Thanks,

Jun

On Wed, Dec 16, 2020 at 4:23 PM Colin McCabe  wrote:

> On Wed, Dec 16, 2020, at 13:40, Jun Rao wrote:
> > Hi, Colin,
> >
> > Thanks for the reply. A few follow up comments.
> >
> > 211. When does the broker send the BrokerRegistration request to the
> > controller? Is it after the recovery phase? If so, at that point, the
> > broker has already caught up on the metadata (in order to clean up
> deleted
> > partitions). Then, it seems we don't need the ShouldFence field
> > in BrokerHeartbeatRequest?
>
> Hi Jun,
>
> Thanks again for the reviews.
>
> The broker sends the registration request as soon as it starts up.  It
> cannot wait until the recovery phase is over since sometimes log recovery
> can take quite a long time.
>
> >
> > 213. kafka-cluster.sh
> > 213.1 For the decommision example, should the command take a broker id?
>
> Yes, the example should have a broker id.  Fixed.
>
> > 213.2 Existing tools use the "--" command line option (e.g. kafka-topics
> > --list --topic test). Should we follow the same convention
> > for kafka-cluster.sh (and kafka-storage.sh)?
>
> Hmm.  I don't think argparse4j supports using double dashes to identify
> subcommands.  I think it might be confusing as well, since the subcommand
> must come first, unlike a plain old argument which can be anywhere on the
> command line.
>
> > 213.3 Should we add an admin api for broker decommission so that this can
> > be done programmatically?
> >
>
> Yes, there is an admin client API for decommissioning as well.
>
> > 220. DecommissionBroker: "NOT_CONTROLLER if the node that the request was
> > sent to is not the controller". I thought the clients never send a
> request
> > to the controller directly now and the broker will forward it to the
> > controller?
> >
>
> If the controller moved recently, it's possible that the broker could send
> to a controller that has just recently become inactive.  In that case
> NOT_CONTROLLER would be returned.  (A standby controller returns
> NOT_CONTROLLER for most APIs).
>
> > 221. Could we add the required ACL for the new requests?
> >
>
> Good point.  I added the required ACL for each new RPC.
>
> best,
> Colin
>
>
> > Jun
> >
> > On Wed, Dec 16, 2020 at 11:38 AM Colin McCabe 
> wrote:
> >
> > > On Wed, Dec 16, 2020, at 09:59, Jun Rao wrote:
> > > > Hi, Colin,
> > > >
> > > > Thanks for the reply. A few more comments below.
> > > >
> > >
> > > Hi Jun,
> > >
> > > Thanks for the comments.
> > >
> > > > 206. "RemoveTopic is the last step, that scrubs all metadata about
> the
> > > > topic.  In order to get to that last step, the topic data needs to
> > > removed
> > > > from all brokers (after each broker notices that the topic is being
> > > > deleted)." Currently, this is done based on the response of
> > > > StopReplicaRequest. Since the controller no longer sends this
> request,
> > > how
> > > > does the controller know that the data for the deleted topic has
> > > > been removed in the brokers?
> > > >
> > >
> > > That's a good point... definitely an oversight.
> > >
> > > It seems complex to force the controller to track when log directories
> > > have been deleted.  Let's just assume that KIP-516 has been
> implemented,
> > > and track them by UUID.  Then we can just have a single topic deletion
> > > record.
> > >
> > > I added a section on "topic identifiers" describing this.
> > >
> > > > 210. Thanks for the explanation. Sounds good to me.
> > > >
> > > > 211. I still don't see an example when ShouldFence is set to true in
> > > > BrokerHeartbeatReques. Could we add one?
> > > >
> > >
> > > It is sent to true when the broker is first starting up and doesn't yet
> > > want to be unfenced.  I added a longer explanation of this in the
> "Broker
> > > Leases" section.
> > >
> > > > 213. The KIP now allows replicas to be assigned on brokers that are
> > > fenced,
> > > > which is an improvement. How do we permanently remove a broker (e.g.
> > > > cluster shrinking) to prevent it from being used for future replica
> > > > assignments?
> > > >
> > >
> > > This is a fair point.  I will create a kafka-cluster.sh script which
> can
> > > do this, plus a DecommissionBrokerRequest.
> > >
> > > As a bonus the kafka-cluster.sh script can help people find the
> cluster ID
> > > of brokers, something that people have wanted a tool for in the past.
> > >
> > > > 214. "Currently, when a node is down, all of its ZK registration
> > > > information is gone.  But  we need this information in order to
> > > und

Re: [VOTE] KIP-665 Kafka Connect Hash SMT

2020-12-16 Thread Brandon Brown
I’d like to give this one another friendly bump. If there are no disagreements 
I can update my existing Pr with the latest KIP changes. 

Thanks,
-Brandon 

Brandon Brown
> On Oct 26, 2020, at 8:29 PM, Brandon Brown  wrote:
> 
> I’ve update the KIP with suggestions from Gunnar. I’d like to bring this up 
> for a vote. 
> 
> Brandon Brown
>> On Oct 22, 2020, at 12:53 PM, Brandon Brown  wrote:
>> 
>> Hey Gunnar,
>> 
>> Those are great questions!
>> 
>> 1) I went with it only selecting top level fields since it seems like that’s 
>> the way most of the out of the box SMTS work, however I could see a lot of 
>> value in it supporting nested fields. 
>> 2) I had not thought about adding salt but I think that would be a valid 
>> option as well. 
>> 
>> I think I’ll update the KIP to reflect those suggestions. One more, do you 
>> think this should allow a regex for fields or stick with the explicit naming 
>> of the fields?
>> 
>> Thanks for the great feedback
>> 
>> Brandon Brown
>> 
 On Oct 22, 2020, at 12:40 PM, Gunnar Morling 
  wrote:
>>> 
>>> Hey Brandon,
>>> 
>>> I think that's an interesting idea, we got something as a built-in
>>> connector feature in Debezium, too [1]. Two questions:
>>> 
>>> * Can "field" select nested fields, e.g. "after.email"?
>>> * Did you consider an option for specifying salt for the hash functions?
>>> 
>>> --Gunnar
>>> 
>>> [1]
>>> https://debezium.io/documentation/reference/connectors/mysql.html#mysql-property-column-mask-hash
>>> 
>>> 
>>> 
 Am Do., 22. Okt. 2020 um 12:53 Uhr schrieb Brandon Brown <
 bran...@bbrownsound.com>:
 
 Gonna give this another little bump. :)
 
 Brandon Brown
 
> On Oct 15, 2020, at 12:51 PM, Brandon Brown 
 wrote:
> 
> 
> As I mentioned in the KIP, this transformer is slightly different from
 the current MaskField SMT.
> 
>> Currently there exists a MaskField SMT but that would completely remove
 the value by setting it to an equivalent null value. One problem with this
 would be that you’d not be able to know in the case of say a password going
 through the mask transform it would become "" which could mean that no
 password was present in the message, or it was removed. However this hash
 transformer would remove this ambiguity if that makes sense. The proposed
 hash functions would be MD5, SHA1, SHA256. which are all supported via
 MessageDigest.
> 
> Given this take on things do you still think there would be value in
 this smt?
> 
> 
> Brandon Brown
>> On Oct 15, 2020, at 12:36 PM, Ning Zhang 
 wrote:
>> 
>> Hello, I think this SMT feature is parallel to
 https://docs.confluent.io/current/connect/transforms/index.html
>> 
 On 2020/10/15 15:24:51, Brandon Brown 
 wrote:
>>> Bumping this thread.
>>> Please take a look at the KIP and vote or let me know if you have any
 feedback.
>>> 
>>> KIP:
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-665%3A+Kafka+Connect+Hash+SMT
>>> 
>>> Proposed: https://github.com/apache/kafka/pull/9057
>>> 
>>> Thanks
>>> 
>>> Brandon Brown
>>> 
> On Oct 8, 2020, at 10:30 PM, Brandon Brown 
 wrote:
 
 Just wanted to give another bump on this and see if anyone had any
 comments.
 
 Thanks!
 
 Brandon Brown
 
> On Oct 1, 2020, at 9:11 AM, "bran...@bbrownsound.com" <
 bran...@bbrownsound.com> wrote:
> 
> Hey Kafka Developers,
> 
> I’ve created the following KIP and updated it based on feedback from
 Mickael. I was wondering if we could get a vote on my proposal and move
 forward with the proposed pr.
> 
> Thanks so much!
> -Brandon
>>> 
 


[jira] [Resolved] (KAFKA-10861) Flaky test `TransactionsTest.testFencingOnSendOffsets`

2020-12-16 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-10861.
-
Resolution: Fixed

> Flaky test `TransactionsTest.testFencingOnSendOffsets`
> --
>
> Key: KAFKA-10861
> URL: https://issues.apache.org/jira/browse/KAFKA-10861
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> {code}
> org.scalatest.exceptions.TestFailedException: Got an unexpected exception 
> from a fenced producer.
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)
>   at org.scalatest.Assertions.fail(Assertions.scala:1107)
>   at org.scalatest.Assertions.fail$(Assertions.scala:1103)
>   at org.scalatest.Assertions$.fail(Assertions.scala:1389)
>   at 
> kafka.api.TransactionsTest.testFencingOnSendOffsets(TransactionsTest.scala:373)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at Caused by: 
> org.apache.kafka.common.errors.InvalidProducerEpochException: Producer 
> attempted to produce with an old epoch (producerId=0, epoch=0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Jenkins build is back to normal : Kafka » kafka-trunk-jdk11 #321

2020-12-16 Thread Apache Jenkins Server
See 




Build failed in Jenkins: Kafka » kafka-trunk-jdk8 #295

2020-12-16 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-9126: KIP-689: StreamJoined changelog configuration (#9708)


--
[...truncated 6.94 MB...]
org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureSinkTopicNamesIfWrittenInto[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureSinkTopicNamesIfWrittenInto[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopicDeprecated[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopicDeprecated[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldSetRecordMetadata[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldSetRecordMetadata[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForLargerValue[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForLargerValue[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectInMemoryStoreTypeOnly[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectInMemoryStoreTypeOnly[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldThrowForMissingTime[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldThrowForMissingTime[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureInternalTopicNamesIfWrittenInto[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureInternalTopicNamesIfWrittenInto[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTimeDeprecated[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTimeDeprecated[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessRecordForTopic[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessRecordForTopic[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shou

Re: [DISCUSS] KIP-700: Add Describe Cluster API

2020-12-16 Thread Colin McCabe
Hi David,

This seems reasonable.  It would be nice to have an API specifically for 
describeCluster, so that we could extend this API without adding more fields to 
the already large MetadataRequest.

As you mention in the KIP, KIP-700 would allow us to deprecate 
MetadataRequest#ClusterAuthorizedOperations.  So it seems like this KIP should 
specify a new version of MetadataRequest where the related fields are absent, 
right?

best,
Colin


On Mon, Dec 14, 2020, at 08:10, David Jacot wrote:
> Hi all,
> 
> I'd like to propose a small KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-700%3A+Add+Describe+Cluster+API
> 
> Please let me know what you think.
> 
> Best,
> David
>


Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

2020-12-16 Thread Colin McCabe
On Wed, Dec 16, 2020, at 18:13, Jun Rao wrote:
> Hi, Colin,
> 
> Thanks for the reply. Just a couple of more comments.
> 
> 211. Currently, the broker only registers itself in ZK after log recovery.
> Is there any benefit to change that? As you mentioned, the broker can't do
> much before completing log recovery.
> 

Hi Jun,

Previously, it wasn't possible to register in ZK without immediately getting 
added to the MetadataResponse.  So I think that's the main reason why 
registration was delayed until after log recovery.  Since that constraint 
doesn't exist any more, there seems to be no reason to delay registration.

I think delaying registration would have some major downsides.  If log recovery 
takes a while, that's a longer window during which someone else could register 
a broker with the same ID.  Having parts of the cluster missing for a while 
gives up some of the benefit of separating registration from fencing.  For 
example, if a broker somehow gets unregistered and we want to re-register it, 
but we have to wait for a 10 minute log recovery to do that, that could be a 
window during which topics can't be created that need to be on that broker, and 
so forth.

> 230. Regarding MetadataResponse, there is a slight awkwardness. We return
> rack for each node. However, if that node is for the controller, the rack
> field is not really relevant. Should we clean it up here or in another KIP
> like KIP-700?

Oh, controllers don't appear in the MetadataResponses returned to clients, 
since clients can't access them.  I should have been more clear about that in 
the KIP-- I added a sentence to "Networking" describing this.

best,
Colin

> 
> Thanks,
> 
> Jun
> 
> On Wed, Dec 16, 2020 at 4:23 PM Colin McCabe  wrote:
> 
> > On Wed, Dec 16, 2020, at 13:40, Jun Rao wrote:
> > > Hi, Colin,
> > >
> > > Thanks for the reply. A few follow up comments.
> > >
> > > 211. When does the broker send the BrokerRegistration request to the
> > > controller? Is it after the recovery phase? If so, at that point, the
> > > broker has already caught up on the metadata (in order to clean up
> > deleted
> > > partitions). Then, it seems we don't need the ShouldFence field
> > > in BrokerHeartbeatRequest?
> >
> > Hi Jun,
> >
> > Thanks again for the reviews.
> >
> > The broker sends the registration request as soon as it starts up.  It
> > cannot wait until the recovery phase is over since sometimes log recovery
> > can take quite a long time.
> >
> > >
> > > 213. kafka-cluster.sh
> > > 213.1 For the decommision example, should the command take a broker id?
> >
> > Yes, the example should have a broker id.  Fixed.
> >
> > > 213.2 Existing tools use the "--" command line option (e.g. kafka-topics
> > > --list --topic test). Should we follow the same convention
> > > for kafka-cluster.sh (and kafka-storage.sh)?
> >
> > Hmm.  I don't think argparse4j supports using double dashes to identify
> > subcommands.  I think it might be confusing as well, since the subcommand
> > must come first, unlike a plain old argument which can be anywhere on the
> > command line.
> >
> > > 213.3 Should we add an admin api for broker decommission so that this can
> > > be done programmatically?
> > >
> >
> > Yes, there is an admin client API for decommissioning as well.
> >
> > > 220. DecommissionBroker: "NOT_CONTROLLER if the node that the request was
> > > sent to is not the controller". I thought the clients never send a
> > request
> > > to the controller directly now and the broker will forward it to the
> > > controller?
> > >
> >
> > If the controller moved recently, it's possible that the broker could send
> > to a controller that has just recently become inactive.  In that case
> > NOT_CONTROLLER would be returned.  (A standby controller returns
> > NOT_CONTROLLER for most APIs).
> >
> > > 221. Could we add the required ACL for the new requests?
> > >
> >
> > Good point.  I added the required ACL for each new RPC.
> >
> > best,
> > Colin
> >
> >
> > > Jun
> > >
> > > On Wed, Dec 16, 2020 at 11:38 AM Colin McCabe 
> > wrote:
> > >
> > > > On Wed, Dec 16, 2020, at 09:59, Jun Rao wrote:
> > > > > Hi, Colin,
> > > > >
> > > > > Thanks for the reply. A few more comments below.
> > > > >
> > > >
> > > > Hi Jun,
> > > >
> > > > Thanks for the comments.
> > > >
> > > > > 206. "RemoveTopic is the last step, that scrubs all metadata about
> > the
> > > > > topic.  In order to get to that last step, the topic data needs to
> > > > removed
> > > > > from all brokers (after each broker notices that the topic is being
> > > > > deleted)." Currently, this is done based on the response of
> > > > > StopReplicaRequest. Since the controller no longer sends this
> > request,
> > > > how
> > > > > does the controller know that the data for the deleted topic has
> > > > > been removed in the brokers?
> > > > >
> > > >
> > > > That's a good point... definitely an oversight.
> > > >
> > > > It seems complex to force the controller to track when log directories
> 

Jenkins build is back to normal : Kafka » kafka-trunk-jdk15 #342

2020-12-16 Thread Apache Jenkins Server
See