Re: [DISCUSS] Website changes required for Apache projects

2022-07-28 Thread Luke Chen
Hi Divij,

Thanks for working on this, initiating the discussion and fixing all the
issues!
I'll go ahead to close KAFKA-13868
.

Thank you.
Luke

On Wed, Jul 27, 2022 at 10:26 PM Divij Vaidya 
wrote:

> Hi all
>
> To conclude this thread, all required changes listed for adhering to ASF
> guidelines (documented at
> https://issues.apache.org/jira/browse/KAFKA-13868)
> have been merged to the website. If you find any other aspects where we are
> not adhering to ASF privacy policy
> , please feel free to
> create a new ticket.
>
> Thanks everyone for chiming in on this discussion thread.
>
> --
> Divij Vaidya
>
>
>
> On Fri, Jul 22, 2022 at 5:08 PM Mickael Maison 
> wrote:
>
> > Hi,
> >
> > Don't get me wrong, the videos are great and it's definitively the
> > type of content we want on the website. We just got to be careful that
> > all content is vendor neutral. I'm not advocating for introducing new
> > policies or processes, I think the current PR process should be good
> > enough.
> >
> > As noted, in this case the main issue comes from Youtube automatically
> > adding the channel branding to the videos. Also on the quickstart and
> > intro videos Tim says he's from Confluent. The intro he uses in the
> > Streams videos [0] is in my opinion preferable. If it's possible to
> > address this without some major editing, I think it would be worth
> > doing.
> >
> > Thanks,
> > Mickael
> >
> > 0: https://kafka.apache.org/32/documentation/streams/
> >
> > On Fri, Jul 22, 2022 at 4:22 PM Bill Bejeck  wrote:
> > >
> > > Hi Divij,
> > >
> > > After thinking about the embedded videos some more I think it's
> probably
> > > best for now to go with option 1 you presented above (text links to the
> > > videos).
> > > I will do a follow on PR for option #2 - creating an image placeholder
> > that
> > > will trigger the video once clicked.
> > >
> > > Thanks again for driving this update effort.
> > >
> > > -Bill
> > >
> > > On Thu, Jul 21, 2022 at 5:25 PM Bill Bejeck  wrote:
> > >
> > > > Hi All,
> > > >
> > > > I've filed an issue with INFRA (
> > > > https://issues.apache.org/jira/browse/INFRA-23499) to ask about
> > uploading
> > > > the videos to the ASF YouTube channel, which would resolve the
> branding
> > > > issue.
> > > >
> > > > Thanks,
> > > > Bill
> > > >
> > > > On Thu, Jul 21, 2022 at 1:43 PM Bill Bejeck 
> wrote:
> > > >
> > > >> Hi Divij,
> > > >>
> > > >> First of all, let me say thanks for taking up this task.
> > > >>
> > > >> We seem to have two options:
> > > >>> 1. Replace videos on the website with links to the videos OR
> > > >>> 2. Take a placeholder image and use JS to trigger playback after
> the
> > user
> > > >>> clicks.
> > > >>>
> > > >>> I would suggest going with option#1 right now due to time
> > constraints and
> > > >>> create a ticket to do (more user friendly) option#2 in the future.*
> > What
> > > >>> do
> > > >>> you think?*
> > > >>>
> > > >>
> > > >> I'm inclined to go with option #2.
> > > >>
> > > >> But taking a look at the https://apache.org/ site, there's an
> > embedded
> > > >> video directly on the page, not an image or a link.
> > > >>
> > > >> So I'm wondering, since the video doesn't start playing right away
> and
> > > >> requires a user to click to start it, that the "click image to
> start"
> > > >> requirement is satisfied,
> > > >>
> > > >> as it aligns with what we see now on the Apache® Software Foundation
> > page.
> > > >>
> > > >>
> > > >> Regarding the branding, that's not in the video file itself but
> comes
> > > >> from YouTube and the video's channel.
> > > >>
> > > >> I propose that we host the video on the Apache YouTube
> > > >>  channel,
> and
> > > >> that would take care of the branding issue.
> > > >>
> > > >>
> > > >> What do you think?
> > > >>
> > > >>
> > > >> On Thu, Jul 21, 2022 at 4:19 AM Divij Vaidya <
> divijvaidy...@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >>> Thanks for chiming in with your opinions John/Mickael.
> > > >>>
> > > >>> The current set of videos are very helpful and removing them might
> > be a
> > > >>> disservice to our users. The most ideal solution would be to host
> the
> > > >>> videos on Apache servers without any branding. Another less than
> > ideal
> > > >>> solution would be to host a repository of links to educational
> > content on
> > > >>> our website.
> > > >>>
> > > >>> As for the next steps, I am going to do the following which would
> > help us
> > > >>> get answers on whether solution 1 or solution 2 is more feasible.
> > Please
> > > >>> let me know if you think we need to do something different here.
> > > >>> 1. Reach out to ASF legal and ask what permissions/licence would we
> > > >>> require
> > > >>> from the video owners to host the videos ourselves.
> > > >>> 2. Reach out to ASF community mailing list
> > > >>> <
> > > >>>
> > https:

[jira] [Resolved] (KAFKA-13868) Website updates to satisfy Apache privacy policies

2022-07-28 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-13868.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

> Website updates to satisfy Apache privacy policies
> --
>
> Key: KAFKA-13868
> URL: https://issues.apache.org/jira/browse/KAFKA-13868
> Project: Kafka
>  Issue Type: Bug
>  Components: website
>Reporter: Mickael Maison
>Assignee: Divij Vaidya
>Priority: Critical
> Fix For: 3.3.0
>
>
> The ASF has updated its privacy policy and all websites must be compliant.
> The full guidelines can be found in 
> [https://privacy.apache.org/faq/committers.html]
> The Kafka website has a few issues, including:
> - It's missing a link to the privacy policy: 
> [https://privacy.apache.org/policies/privacy-policy-public.html]
> - It's using Google Analytics
> - It's using Google Fonts
> - It's using scripts hosted on Cloudflare CDN
> - Embedded videos don't have an image placeholder
> As per the email sent to the PMC, all updates have to be done by July 22.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14120) Produce Kafka Streams Skipped Records Metrics

2022-07-28 Thread Yusu Jwa (Jira)
Yusu Jwa created KAFKA-14120:


 Summary: Produce Kafka Streams Skipped Records Metrics
 Key: KAFKA-14120
 URL: https://issues.apache.org/jira/browse/KAFKA-14120
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 3.2.0
Reporter: Yusu Jwa


Hi, I want to monitor "skip records" metrics and find a page that the feature 
for Skipped Records Metrics is adopted.
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-274%3A+Kafka+Streams+Skipped+Records+Metrics]

However, there is no Skipped Records Metrics in Kafka 3.2 version.

I found [the 
metric|https://github.com/apache/kafka/blob/3.2/streams/src/main/java/org/apache/kafka/streams/processor/internals/metrics/ThreadMetrics.java#L48]
 in source code, but it is used in only test case.
[https://github.com/apache/kafka/blob/8464e366827d4c3a822beff32b8a0123767cbf0e/streams/src/main/java/org/apache/kafka/streams/processor/internals/metrics/ThreadMetrics.java#L126-L136]
[https://github.com/apache/kafka/blob/8464e366827d4c3a822beff32b8a0123767cbf0e/streams/src/test/java/org/apache/kafka/streams/processor/internals/metrics/ThreadMetricsTest.java#L52-L68]

Could you check it and produce the Skipped Records Metrics?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14121) AlterPartitionReassignments API should allow callers to specify the option of preserving the replication factor

2022-07-28 Thread Stanislav Kozlovski (Jira)
Stanislav Kozlovski created KAFKA-14121:
---

 Summary: AlterPartitionReassignments API should allow callers to 
specify the option of preserving the replication factor
 Key: KAFKA-14121
 URL: https://issues.apache.org/jira/browse/KAFKA-14121
 Project: Kafka
  Issue Type: New Feature
Reporter: Stanislav Kozlovski


Using Kafka's public APIs to get metadata regarding the non-reassigning 
replicas for a topic is unreliable and prone to race conditions.
If a person or a system is to rely on the provided metadata, it can end up 
{color:#202124}unintentionally {color}increasing the replication factor for a 
partition.
It would be useful to have some sort of guardrail against this happening 
{color:#202124}inadvertently.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCISS] KIP-860: Add client-provided option to guard against unintentional replication factor change during partition reassignments

2022-07-28 Thread Stanislav Kozlovski
Hey all,

I'd like to start a discussion on a proposal to help API users from
inadvertently increasing the replication factor of a topic through
the alter partition reassignments API. The KIP describes two fairly
easy-to-hit race conditions in which this can happen.

The KIP itself is pretty simple, yet has a couple of alternatives that can
help solve the same problem. I would appreciate thoughts from the community
on how you think we should proceed, and whether the proposal makes sense in
the first place.

Thanks!

KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-860%3A+Add+client-provided+option+to+guard+against+replication+factor+change+during+partition+reassignments
JIRA: https://issues.apache.org/jira/browse/KAFKA-14121

-- 
Best,
Stanislav


[jira] [Created] (KAFKA-14122) Flaky test DynamicBrokerReconfigurationTest.testKeyStoreAlter

2022-07-28 Thread Divij Vaidya (Jira)
Divij Vaidya created KAFKA-14122:


 Summary: Flaky test 
DynamicBrokerReconfigurationTest.testKeyStoreAlter
 Key: KAFKA-14122
 URL: https://issues.apache.org/jira/browse/KAFKA-14122
 Project: Kafka
  Issue Type: Bug
  Components: consumer, core
Reporter: Divij Vaidya
Assignee: Divij Vaidya
 Fix For: 3.4.0


CI Build: 
[https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-12439/2/testReport/?cloudbees-analytics-link=scm-reporting%2Ftests%2Ffailed]
 

Failure log:


{code:java}
org.opentest4j.AssertionFailedError: Duplicates not expected ==> expected: 
 but was: 
at 
app//org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
at 
app//org.junit.jupiter.api.AssertFalse.assertFalse(AssertFalse.java:40)
at 
app//org.junit.jupiter.api.Assertions.assertFalse(Assertions.java:235)
at 
app//kafka.server.DynamicBrokerReconfigurationTest.stopAndVerifyProduceConsume(DynamicBrokerReconfigurationTest.scala:1579)
at 
app//kafka.server.DynamicBrokerReconfigurationTest.testKeyStoreAlter(DynamicBrokerReconfigurationTest.scala:399)
at 
java.base@17.0.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
at 
java.base@17.0.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at 
java.base@17.0.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base@17.0.1/java.lang.reflect.Method.invoke(Method.java:568)
at 
app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
at 
app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
at 
app//org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
at 
app//org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14117) Flaky Test DynamicBrokerReconfigurationTest.testKeyStoreAlter

2022-07-28 Thread Divij Vaidya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Divij Vaidya resolved KAFKA-14117.
--
Resolution: Duplicate

Resolving this one as duplicate of  
https://issues.apache.org/jira/browse/KAFKA-14122 
We have a PR opened to fix this in the other ticket.

> Flaky Test DynamicBrokerReconfigurationTest.testKeyStoreAlter
> -
>
> Key: KAFKA-14117
> URL: https://issues.apache.org/jira/browse/KAFKA-14117
> Project: Kafka
>  Issue Type: Test
>  Components: unit tests
>Reporter: Hao Li
>Priority: Major
>  Labels: flaky-test
>
> This is a flaky test. Log:
>  
> {code:java}
> [2022-07-27T11:44:23.102Z] DynamicBrokerReconfigurationTest > 
> testKeyStoreAlter() FAILED [2022-07-27T11:44:23.102Z] 
> org.opentest4j.AssertionFailedError: Duplicates not expected ==> expected: 
>  but was:  [2022-07-27T11:44:23.102Z] at 
> org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) 
> [2022-07-27T11:44:23.102Z] at 
> org.junit.jupiter.api.AssertFalse.assertFalse(AssertFalse.java:40) 
> [2022-07-27T11:44:23.103Z] at 
> org.junit.jupiter.api.Assertions.assertFalse(Assertions.java:235) 
> [2022-07-27T11:44:23.103Z] at 
> kafka.server.DynamicBrokerReconfigurationTest.stopAndVerifyProduceConsume(DynamicBrokerReconfigurationTest.scala:1579)
>  [2022-07-27T11:44:23.103Z] at 
> kafka.server.DynamicBrokerReconfigurationTest.testKeyStoreAlter(DynamicBrokerReconfigurationTest.scala:399){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14123) Delete with null value not supported in Streams PersistantWindowsStore

2022-07-28 Thread Pawan Sharma (Jira)
Pawan Sharma created KAFKA-14123:


 Summary: Delete with null value not supported in Streams 
PersistantWindowsStore
 Key: KAFKA-14123
 URL: https://issues.apache.org/jira/browse/KAFKA-14123
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 3.0.0
Reporter: Pawan Sharma


Unable to delete an Window entry from Persistant Windows Store by passing null 
value in the body.

 

Put in this class does not check if the value is null and invoke the remove 
method.

[https://github.com/apache/kafka/blob/3.0.0/streams/src/main/java/org/apache/kafka/streams/state/internals/AbstractRocksDBSegmentedBytesStore.java]

 

Where as the same feature is working in InMemoryWindowsStore, where the null 
values are treated as delete. line no 126.

[https://github.com/apache/kafka/blob/3.0.0/streams/src/main/java/org/apache/kafka/streams/state/internals/InMemoryWindowStore.java]

 

This behaviour is little in contrast to all other stores including kv stores, 
where a null value is treated as delete and also complys with the behaviour of 
compressed Kafka topic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Website changes required for Apache projects

2022-07-28 Thread John Roesler
Thank you, Divij!

-John

On Thu, Jul 28, 2022, at 02:05, Luke Chen wrote:
> Hi Divij,
>
> Thanks for working on this, initiating the discussion and fixing all the
> issues!
> I'll go ahead to close KAFKA-13868
> .
>
> Thank you.
> Luke
>
> On Wed, Jul 27, 2022 at 10:26 PM Divij Vaidya 
> wrote:
>
>> Hi all
>>
>> To conclude this thread, all required changes listed for adhering to ASF
>> guidelines (documented at
>> https://issues.apache.org/jira/browse/KAFKA-13868)
>> have been merged to the website. If you find any other aspects where we are
>> not adhering to ASF privacy policy
>> , please feel free to
>> create a new ticket.
>>
>> Thanks everyone for chiming in on this discussion thread.
>>
>> --
>> Divij Vaidya
>>
>>
>>
>> On Fri, Jul 22, 2022 at 5:08 PM Mickael Maison 
>> wrote:
>>
>> > Hi,
>> >
>> > Don't get me wrong, the videos are great and it's definitively the
>> > type of content we want on the website. We just got to be careful that
>> > all content is vendor neutral. I'm not advocating for introducing new
>> > policies or processes, I think the current PR process should be good
>> > enough.
>> >
>> > As noted, in this case the main issue comes from Youtube automatically
>> > adding the channel branding to the videos. Also on the quickstart and
>> > intro videos Tim says he's from Confluent. The intro he uses in the
>> > Streams videos [0] is in my opinion preferable. If it's possible to
>> > address this without some major editing, I think it would be worth
>> > doing.
>> >
>> > Thanks,
>> > Mickael
>> >
>> > 0: https://kafka.apache.org/32/documentation/streams/
>> >
>> > On Fri, Jul 22, 2022 at 4:22 PM Bill Bejeck  wrote:
>> > >
>> > > Hi Divij,
>> > >
>> > > After thinking about the embedded videos some more I think it's
>> probably
>> > > best for now to go with option 1 you presented above (text links to the
>> > > videos).
>> > > I will do a follow on PR for option #2 - creating an image placeholder
>> > that
>> > > will trigger the video once clicked.
>> > >
>> > > Thanks again for driving this update effort.
>> > >
>> > > -Bill
>> > >
>> > > On Thu, Jul 21, 2022 at 5:25 PM Bill Bejeck  wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > I've filed an issue with INFRA (
>> > > > https://issues.apache.org/jira/browse/INFRA-23499) to ask about
>> > uploading
>> > > > the videos to the ASF YouTube channel, which would resolve the
>> branding
>> > > > issue.
>> > > >
>> > > > Thanks,
>> > > > Bill
>> > > >
>> > > > On Thu, Jul 21, 2022 at 1:43 PM Bill Bejeck 
>> wrote:
>> > > >
>> > > >> Hi Divij,
>> > > >>
>> > > >> First of all, let me say thanks for taking up this task.
>> > > >>
>> > > >> We seem to have two options:
>> > > >>> 1. Replace videos on the website with links to the videos OR
>> > > >>> 2. Take a placeholder image and use JS to trigger playback after
>> the
>> > user
>> > > >>> clicks.
>> > > >>>
>> > > >>> I would suggest going with option#1 right now due to time
>> > constraints and
>> > > >>> create a ticket to do (more user friendly) option#2 in the future.*
>> > What
>> > > >>> do
>> > > >>> you think?*
>> > > >>>
>> > > >>
>> > > >> I'm inclined to go with option #2.
>> > > >>
>> > > >> But taking a look at the https://apache.org/ site, there's an
>> > embedded
>> > > >> video directly on the page, not an image or a link.
>> > > >>
>> > > >> So I'm wondering, since the video doesn't start playing right away
>> and
>> > > >> requires a user to click to start it, that the "click image to
>> start"
>> > > >> requirement is satisfied,
>> > > >>
>> > > >> as it aligns with what we see now on the Apache® Software Foundation
>> > page.
>> > > >>
>> > > >>
>> > > >> Regarding the branding, that's not in the video file itself but
>> comes
>> > > >> from YouTube and the video's channel.
>> > > >>
>> > > >> I propose that we host the video on the Apache YouTube
>> > > >>  channel,
>> and
>> > > >> that would take care of the branding issue.
>> > > >>
>> > > >>
>> > > >> What do you think?
>> > > >>
>> > > >>
>> > > >> On Thu, Jul 21, 2022 at 4:19 AM Divij Vaidya <
>> divijvaidy...@gmail.com
>> > >
>> > > >> wrote:
>> > > >>
>> > > >>> Thanks for chiming in with your opinions John/Mickael.
>> > > >>>
>> > > >>> The current set of videos are very helpful and removing them might
>> > be a
>> > > >>> disservice to our users. The most ideal solution would be to host
>> the
>> > > >>> videos on Apache servers without any branding. Another less than
>> > ideal
>> > > >>> solution would be to host a repository of links to educational
>> > content on
>> > > >>> our website.
>> > > >>>
>> > > >>> As for the next steps, I am going to do the following which would
>> > help us
>> > > >>> get answers on whether solution 1 or solution 2 is more feasible.
>> > Please
>> > > >>> let me know if you think we need to do something different here.
>> 

[jira] [Resolved] (KAFKA-14101) Flaky ExactlyOnceSourceIntegrationTest.testConnectorBoundary

2022-07-28 Thread Chris Egerton (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton resolved KAFKA-14101.
---
Resolution: Fixed

> Flaky ExactlyOnceSourceIntegrationTest.testConnectorBoundary
> 
>
> Key: KAFKA-14101
> URL: https://issues.apache.org/jira/browse/KAFKA-14101
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Mickael Maison
>Assignee: Chris Egerton
>Priority: Major
> Attachments: 
> org.apache.kafka.connect.integration.ExactlyOnceSourceIntegrationTest.testConnectorBoundary.test.stdout
>
>
> I hit this one while running the tests on your branch from 
> https://github.com/apache/kafka/pull/12429
> org.apache.kafka.connect.integration.ExactlyOnceSourceIntegrationTest > 
> testConnectorBoundary FAILED
> java.lang.AssertionError: Committed records should exclude 
> connector-aborted transactions expected:<[1, 3, 4, 5, 9, 10, 11, 12, 13, 14, 
> 15, 16, 17, 18, 19, 20, 21, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 
> 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 
> 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 
> 85, 86, 87, 88, 89]> but was:<[4, 5, 10, 13, 16, 18, 20, 37, 39, 40, 46, 47, 
> 49, 54, 59, 64, 65, 68, 70, 71, 77, 83, 85, 89, 146, 148, 153, 154, 157, 158, 
> 159, 163, 165, 169, 175, 176, 178, 183, 184, 185, 187, 188, 191, 196, 199, 
> 211, 216, 217, 218, 222, 223, 229, 232, 238, 244, 251, 255, 259, 261, 269, 
> 272, 274, 275, 276, 277, 278, 279, 285, 291, 293, 296, 299]>
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotEquals(Assert.java:835)
> at org.junit.Assert.assertEquals(Assert.java:120)
> at 
> org.apache.kafka.connect.integration.ExactlyOnceSourceIntegrationTest.testConnectorBoundary(ExactlyOnceSourceIntegrationTest.java:456)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14089) Flaky ExactlyOnceSourceIntegrationTest.testSeparateOffsetsTopic

2022-07-28 Thread Mickael Maison (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison resolved KAFKA-14089.

Fix Version/s: 3.3.0
   Resolution: Fixed

> Flaky ExactlyOnceSourceIntegrationTest.testSeparateOffsetsTopic
> ---
>
> Key: KAFKA-14089
> URL: https://issues.apache.org/jira/browse/KAFKA-14089
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Mickael Maison
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: failure.txt, 
> org.apache.kafka.connect.integration.ExactlyOnceSourceIntegrationTest.testSeparateOffsetsTopic.test.stdout
>
>
> It looks like the sequence got broken around "65535, 65537, 65536, 65539, 
> 65538, 65541, 65540, 65543"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS]: Including TLA+ in the repo

2022-07-28 Thread Jason Gustafson
Yeah, good idea. I'm happy to submit the specs I wrote for normal kafka
replication. It will make them more accessible and I have long been looking
for help reviewing them. Hopefully it will also provide a better chance to
keep them in sync with the codebase as we update protocols.

-Jason

On Wed, Jul 27, 2022 at 1:50 AM Jack Vanlightly
 wrote:

> +1 for me too. Once the KIP-853 is agreed I will make any necessary changes
> and submit a PR to the apache/kafka repo.
>
> Jack
>
> On Tue, Jul 26, 2022 at 10:10 PM Ismael Juma  wrote:
>
> > I'm +1 for inclusion in the main repo and I was going to suggest the same
> > in the KIP-853 discussion. The original authors of 3 and 4 are part of
> the
> > kafka community, so we can ask them to submit PRs.
> >
> > Ismael
> >
> > On Tue, Jul 26, 2022 at 7:58 AM Tom Bentley  wrote:
> >
> > > Hi,
> > >
> > > I noticed that TLA+ has featured in the Test Plans of a couple of
> recent
> > > KIPs [1,2]. This is a good thing in my opinion. I'm aware that TLA+ has
> > > been used in the past to prove properties of various parts of the Kafka
> > > protocol [3,4].
> > >
> > > The point I wanted to raise is that I think it would be beneficial to
> the
> > > community if these models could be part of the main Kafka repo. That
> way
> > > there are fewer hurdles to their discoverability and it makes it easier
> > for
> > > people to compare the implementation with the spec. Spreading
> familiarity
> > > with TLA+ within the community is also a potential side-benefit.
> > >
> > > I notice that the specs in [4] are MIT-licensed, but according to the
> > > Apache 3rd party license policy [5] it should be OK to include.
> > >
> > > Thoughts?
> > >
> > > Kind regards,
> > >
> > > Tom
> > >
> > > [1]:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol#KIP848:TheNextGenerationoftheConsumerRebalanceProtocol-TestPlan
> > > [2]:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Voter+Changes#KIP853:KRaftVoterChanges-TestPlan
> > > [3]: https://github.com/hachikuji/kafka-specification
> > > [4]:
> > >
> > >
> >
> https://github.com/Vanlightly/raft-tlaplus/tree/main/specifications/pull-raft
> > > [5]: https://www.apache.org/legal/resolved.html
> > >
> >
>


Re: [DISCUSS]: Including TLA+ in the repo

2022-07-28 Thread Tom Bentley
Thanks Jason and Jack!

I count myself as a beginner with TLA+, but would like to take this as an
opportunity to learn.

Tom

On Thu, 28 Jul 2022 at 17:34, Jason Gustafson 
wrote:

> Yeah, good idea. I'm happy to submit the specs I wrote for normal kafka
> replication. It will make them more accessible and I have long been looking
> for help reviewing them. Hopefully it will also provide a better chance to
> keep them in sync with the codebase as we update protocols.
>
> -Jason
>
> On Wed, Jul 27, 2022 at 1:50 AM Jack Vanlightly
>  wrote:
>
> > +1 for me too. Once the KIP-853 is agreed I will make any necessary
> changes
> > and submit a PR to the apache/kafka repo.
> >
> > Jack
> >
> > On Tue, Jul 26, 2022 at 10:10 PM Ismael Juma  wrote:
> >
> > > I'm +1 for inclusion in the main repo and I was going to suggest the
> same
> > > in the KIP-853 discussion. The original authors of 3 and 4 are part of
> > the
> > > kafka community, so we can ask them to submit PRs.
> > >
> > > Ismael
> > >
> > > On Tue, Jul 26, 2022 at 7:58 AM Tom Bentley 
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I noticed that TLA+ has featured in the Test Plans of a couple of
> > recent
> > > > KIPs [1,2]. This is a good thing in my opinion. I'm aware that TLA+
> has
> > > > been used in the past to prove properties of various parts of the
> Kafka
> > > > protocol [3,4].
> > > >
> > > > The point I wanted to raise is that I think it would be beneficial to
> > the
> > > > community if these models could be part of the main Kafka repo. That
> > way
> > > > there are fewer hurdles to their discoverability and it makes it
> easier
> > > for
> > > > people to compare the implementation with the spec. Spreading
> > familiarity
> > > > with TLA+ within the community is also a potential side-benefit.
> > > >
> > > > I notice that the specs in [4] are MIT-licensed, but according to the
> > > > Apache 3rd party license policy [5] it should be OK to include.
> > > >
> > > > Thoughts?
> > > >
> > > > Kind regards,
> > > >
> > > > Tom
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol#KIP848:TheNextGenerationoftheConsumerRebalanceProtocol-TestPlan
> > > > [2]:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Voter+Changes#KIP853:KRaftVoterChanges-TestPlan
> > > > [3]: https://github.com/hachikuji/kafka-specification
> > > > [4]:
> > > >
> > > >
> > >
> >
> https://github.com/Vanlightly/raft-tlaplus/tree/main/specifications/pull-raft
> > > > [5]: https://www.apache.org/legal/resolved.html
> > > >
> > >
> >
>


Re: [DISCUSS]: Including TLA+ in the repo

2022-07-28 Thread Matthew Benedict de Detrich
+1 from me as well, having the formal TLA+ proofs in the main repo is hugely 
beneficial not only from understanding the high level protocol but also in 
terms of awareness/making sure the proof is not outdated.

--
Matthew de Detrich
Aiven Deutschland GmbH
Immanuelkirchstraße 26, 10405 Berlin
Amtsgericht Charlottenburg, HRB 209739 B

Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
m: +491603708037
w: aiven.io e: matthew.dedetr...@aiven.io
On 26. Jul 2022, 16:58 +0200, Tom Bentley , wrote:
> Hi,
>
> I noticed that TLA+ has featured in the Test Plans of a couple of recent
> KIPs [1,2]. This is a good thing in my opinion. I'm aware that TLA+ has
> been used in the past to prove properties of various parts of the Kafka
> protocol [3,4].
>
> The point I wanted to raise is that I think it would be beneficial to the
> community if these models could be part of the main Kafka repo. That way
> there are fewer hurdles to their discoverability and it makes it easier for
> people to compare the implementation with the spec. Spreading familiarity
> with TLA+ within the community is also a potential side-benefit.
>
> I notice that the specs in [4] are MIT-licensed, but according to the
> Apache 3rd party license policy [5] it should be OK to include.
>
> Thoughts?
>
> Kind regards,
>
> Tom
>
> [1]:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol#KIP848:TheNextGenerationoftheConsumerRebalanceProtocol-TestPlan
> [2]:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Voter+Changes#KIP853:KRaftVoterChanges-TestPlan
> [3]: https://github.com/hachikuji/kafka-specification
> [4]:
> https://github.com/Vanlightly/raft-tlaplus/tree/main/specifications/pull-raft
> [5]: https://www.apache.org/legal/resolved.html


Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

2022-07-28 Thread Kirk True
Hi Justine,

Thanks for the KIP. I appreciated the background context and clarity you added.

On Wed, Jul 27, 2022, at 2:57 AM, Sagar wrote:
> Thanks Justine for the KIP. I think it might be better to document the
> correlation between the new config and delivery.timeout.ms in the Public
> Interfaces Description.

+1.

A bi-directional reference between the two configuration options would be great 
for clarity. This is especially true given that the value of 
`producer.id.expiration.ms, when left at -1, comes from the value of 
transactional.id.expiration.ms.`

Thanks!
Kirk

> 
> Also, I agree with Luke that for now setting a default to -1 should be
> good. We can look to switch to 1 day with major release.
> 
> Thanks!
> Sagar.
> 
> On Wed, Jul 27, 2022 at 9:05 AM Luke Chen  wrote:
> 
> > Hi Justine,
> >
> > Thanks for the KIP.
> > I agree with you that we should try our best to keep backward
> > compatibility, although our intention is to have lower producer id
> > expiration timeout.
> > So, I think we should keep default to -1 IMO.
> > Maybe we change the default to 1 day in next major release (4.0)?
> >
> > Thank you.
> > Luke
> >
> > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > 
> > wrote:
> >
> > > Thanks for taking a look Jason!
> > >
> > > I wondered if we wanted to have a smaller default but wasn't sure about
> > the
> > > compatibility story -- especially since there is the chance for producer
> > > IDs to expire silently.
> > > I do think that 1 day is fairly reasonable. If I don't hear any
> > conflicting
> > > opinions I can go ahead and update the default.
> > >
> > > Justine
> > >
> > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > 
> > > wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > Thanks for the KIP. Although I hate seeing new configurations, I think
> > > this
> > > > is a good change. Combining these timeout behaviors into a single
> > > > configuration was definitely a mistake, but we didn't anticipate the
> > > > problem with the producer id cache. I do wonder if we can make the
> > > default
> > > > a bit lower to reduce the chances that users will hit the same memory
> > > > issues we have seen. After decoupling this configuration from
> > > > transactional.id.expiration.ms, the new timeout just needs to cover
> > the
> > > > longest duration that a producer might be retrying the same Produce
> > > > request. 7 days seems too high. Although I think it could go a fair
> > even
> > > > lower, perhaps 1 day is a reasonable place to start?
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > 
> > > > wrote:
> > > >
> > > > > Hey Bill,
> > > > > Thanks! I was just going to say that hopefully
> > > > > transactional.id.expiration.ms would also be over the delivery
> > > timeout.
> > > > :)
> > > > > Thanks for the +1!
> > > > >
> > > > > Justine
> > > > >
> > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck 
> > wrote:
> > > > >
> > > > > > Hi Justine,
> > > > > >
> > > > > > I just took another look at the KIP, and I realize my
> > > > question/suggestion
> > > > > > about default values has already been addressed in the
> > > `Compatibility`
> > > > > > section.
> > > > > >
> > > > > > I'm +1 on the KIP.
> > > > > >
> > > > > > -Bill
> > > > > >
> > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck 
> > > wrote:
> > > > > >
> > > > > > > Hi Justine,
> > > > > > >
> > > > > > > Thanks for the well written KIP, this looks like it will be a
> > > useful
> > > > > > > addition.
> > > > > > >
> > > > > > > Overall the KIP looks good to me, I have one question/comment.
> > > > > > >
> > > > > > > You mentioned that setting the `producer.id.expiration.ms` less
> > > than
> > > > > the
> > > > > > > delivery timeout could lead to duplicates, which makes sense.  To
> > > > help
> > > > > > > avoid this situation, do we want to consider a default value that
> > > is
> > > > > the
> > > > > > > same as the delivery timeout?
> > > > > > >
> > > > > > > Thanks again for the KIP.
> > > > > > >
> > > > > > > Bill
> > > > > > >
> > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine Olshan
> > > > > > >  wrote:
> > > > > > >
> > > > > > >> Hey all!
> > > > > > >>
> > > > > > >> I'd like to start a discussion on my proposal to separate
> > > time-based
> > > > > > >> producer ID expiration from transactional ID expiration by
> > > > > introducing a
> > > > > > >> new configuration.
> > > > > > >>
> > > > > > >> The KIP Is pretty small and simple, but will be helpful in
> > > > controlling
> > > > > > >> memory usage in brokers -- especially now that by default
> > > producers
> > > > > are
> > > > > > >> idempotent and create producer ID state.
> > > > > > >>
> > > > > > >> Please take a look and leave any comments you may have!
> > > > > > >>
> > > > > > >> KIP:
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+e

Re: [DISCUSS] Apache Kafka 3.3.0 Release

2022-07-28 Thread Chris Egerton
Hi José,

Would it be okay to backport https://github.com/apache/kafka/pull/12451 to
the current 3.3 branch? It's a strictly cosmetic change that updates a
misleading comment about exactly-once support for source connectors. I'm
hoping it'll make life easier for anyone who has to debug this feature by
saving some confusion.

Cheers,

Chris

On Thu, Jul 21, 2022 at 11:11 PM Luke Chen  wrote:

> Hi Jose,
>
> I just found the KIP-831 is not listed in the v3.3 planned KIPs.
> It is completed and merged.
> Please help add it.
>
> Thank you.
> Luke
>
> On Fri, Jul 22, 2022 at 4:56 AM Randall Hauch  wrote:
>
> > Hi, Jose.
> >
> > Thanks for driving the 3.3.0 release.
> >
> > Chris E and Chris S both mentioned
> > https://issues.apache.org/jira/browse/KAFKA-14079 in this thread a few
> > days
> > ago. This was deemed a critical bug fix for 3.2, and the PR for the `3.2`
> > branch [1] has already been merged and is in the recently-cut 3.2.1 RC.
> > Likewise, the PR for the `trunk` branch [2] has also been merged. Will
> you
> > approve me merging the fix to the `3.3` branch for inclusion in 3.3.0?
> >
> > Best regards,
> >
> > Randall
> >
> > [1] https://github.com/apache/kafka/pull/12412
> > [2] https://github.com/apache/kafka/pull/12415
> >
> > On Sun, Jul 17, 2022 at 5:46 PM Christopher Shannon <
> > christopher.l.shan...@gmail.com> wrote:
> >
> > > I submitted a tweaked PR for 3.3.0 based on the 3.2.1 fix here:
> > > https://github.com/apache/kafka/pull/12415
> > >
> > > On Sun, Jul 17, 2022 at 1:52 PM Chris Egerton  >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Although KAFKA-14079 is not a regression, it'd be nice if we could
> get
> > it
> > > > into 3.3.0. As I mentioned on the 3.2.1 release thread, the risk is
> > > fairly
> > > > low (the functional changes are just two lines long), and the impact
> is
> > > > high for users who have configured source connectors with
> > > > "errors.tolerance" set to "all".
> > > >
> > > > Given that the code freeze for the release is coming up soon (the
> > release
> > > > doc currently has it at July 20th), it'd be nice if we could get some
> > > eyes
> > > > on the PR for 3.2.1 and, if that looks good, a PR for trunk and
> 3.3.0.
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Sat, Jul 16, 2022 at 12:31 PM Christopher Shannon <
> > > > christopher.l.shan...@gmail.com> wrote:
> > > >
> > > > > There is a bug I found that I think is worthwhile fixing in 3.3.0
> (I
> > > also
> > > > > sent a note to 3.2.1 thread):
> > > > > https://issues.apache.org/jira/browse/KAFKA-14079
> > > > >
> > > > > On Thu, Jul 14, 2022 at 7:56 PM Jason Gustafson
> > > >  > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hey Jose,
> > > > > >
> > > > > > Thanks for volunteering to manage the release! KIP-833 is
> currently
> > > > > slotted
> > > > > > for 3.3. We've been getting some help from Jack Vanlighty to
> > validate
> > > > the
> > > > > > raft implementation in TLA+ and with frameworks like Jepsen. The
> > > > > > specification is written here if anyone is interested:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/Vanlightly/raft-tlaplus/blob/main/specifications/pull-raft/KRaft.tla
> > > > > > .
> > > > > > The main gap that this work uncovered in our implementation is
> > > > documented
> > > > > > here: https://issues.apache.org/jira/browse/KAFKA-14077. I do
> > > believe
> > > > > that
> > > > > > KIP-833 depends on fixing this issue, so I wanted to see how you
> > feel
> > > > > about
> > > > > > giving us a little more time to address it?
> > > > > >
> > > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > > > On Wed, Jul 13, 2022 at 10:01 AM Sagar <
> sagarmeansoc...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hey Jose,
> > > > > > >
> > > > > > > Well actually I have 2 approved PRs from Kafka Connect:
> > > > > > >
> > > > > > > https://github.com/apache/kafka/pull/12321
> > > > > > > https://github.com/apache/kafka/pull/12309
> > > > > > >
> > > > > > > Not sure how to get these merged though but I think these can
> go
> > > into
> > > > > 3.3
> > > > > > > release.
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Sagar.
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jul 13, 2022 at 5:03 PM Divij Vaidya <
> > > > divijvaidy...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Jose
> > > > > > > >
> > > > > > > > A few of my PRs are pending review for quite some which I was
> > > > hoping
> > > > > to
> > > > > > > > merge into 3.3. I have already marked them with "Fix
> > > version=3.3.0"
> > > > > so
> > > > > > > that
> > > > > > > > you can track them using the JIRA filter you shared earlier
> > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%203.3.0%20AND%20status%20not%20in%20(resolved%2C%20closed)%20ORDER%20BY%20priority%20DESC%2C%20status%20DESC%2C%20updated%20DESC%

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

2022-07-28 Thread Justine Olshan
Thanks Jason, Luke, Sagar, and Kirk,

Seems like there is still some debate over the default value. I think there
is a general consensus that we can reduce the default at some point, but
exactly when is still not clear. I do think Jason made a good point about
applications taking 1 day to retry. I am interested if there are other use
cases we didn't consider though.

I've also updated the description to reference `delivery.timeout.ms.` I'm
not sure if we also need that config to reference this one (the
bi-directional reference Kirk mentioned). Let me know if something should
still be updated or if something is unclear.

Thanks again,
Justine

On Thu, Jul 28, 2022 at 10:46 AM Kirk True  wrote:

> Hi Justine,
>
> Thanks for the KIP. I appreciated the background context and clarity you
> added.
>
> On Wed, Jul 27, 2022, at 2:57 AM, Sagar wrote:
> > Thanks Justine for the KIP. I think it might be better to document the
> > correlation between the new config and delivery.timeout.ms in the Public
> > Interfaces Description.
>
> +1.
>
> A bi-directional reference between the two configuration options would be
> great for clarity. This is especially true given that the value of `
> producer.id.expiration.ms, when left at -1, comes from the value of
> transactional.id.expiration.ms.`
>
> Thanks!
> Kirk
>
> >
> > Also, I agree with Luke that for now setting a default to -1 should be
> > good. We can look to switch to 1 day with major release.
> >
> > Thanks!
> > Sagar.
> >
> > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen  wrote:
> >
> > > Hi Justine,
> > >
> > > Thanks for the KIP.
> > > I agree with you that we should try our best to keep backward
> > > compatibility, although our intention is to have lower producer id
> > > expiration timeout.
> > > So, I think we should keep default to -1 IMO.
> > > Maybe we change the default to 1 day in next major release (4.0)?
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > 
> > > wrote:
> > >
> > > > Thanks for taking a look Jason!
> > > >
> > > > I wondered if we wanted to have a smaller default but wasn't sure
> about
> > > the
> > > > compatibility story -- especially since there is the chance for
> producer
> > > > IDs to expire silently.
> > > > I do think that 1 day is fairly reasonable. If I don't hear any
> > > conflicting
> > > > opinions I can go ahead and update the default.
> > > >
> > > > Justine
> > > >
> > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > 
> > > > wrote:
> > > >
> > > > > Hi Justine,
> > > > >
> > > > > Thanks for the KIP. Although I hate seeing new configurations, I
> think
> > > > this
> > > > > is a good change. Combining these timeout behaviors into a single
> > > > > configuration was definitely a mistake, but we didn't anticipate
> the
> > > > > problem with the producer id cache. I do wonder if we can make the
> > > > default
> > > > > a bit lower to reduce the chances that users will hit the same
> memory
> > > > > issues we have seen. After decoupling this configuration from
> > > > > transactional.id.expiration.ms, the new timeout just needs to
> cover
> > > the
> > > > > longest duration that a producer might be retrying the same Produce
> > > > > request. 7 days seems too high. Although I think it could go a fair
> > > even
> > > > > lower, perhaps 1 day is a reasonable place to start?
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > 
> > > > > wrote:
> > > > >
> > > > > > Hey Bill,
> > > > > > Thanks! I was just going to say that hopefully
> > > > > > transactional.id.expiration.ms would also be over the delivery
> > > > timeout.
> > > > > :)
> > > > > > Thanks for the +1!
> > > > > >
> > > > > > Justine
> > > > > >
> > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck 
> > > wrote:
> > > > > >
> > > > > > > Hi Justine,
> > > > > > >
> > > > > > > I just took another look at the KIP, and I realize my
> > > > > question/suggestion
> > > > > > > about default values has already been addressed in the
> > > > `Compatibility`
> > > > > > > section.
> > > > > > >
> > > > > > > I'm +1 on the KIP.
> > > > > > >
> > > > > > > -Bill
> > > > > > >
> > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck  >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Justine,
> > > > > > > >
> > > > > > > > Thanks for the well written KIP, this looks like it will be a
> > > > useful
> > > > > > > > addition.
> > > > > > > >
> > > > > > > > Overall the KIP looks good to me, I have one
> question/comment.
> > > > > > > >
> > > > > > > > You mentioned that setting the `producer.id.expiration.ms`
> less
> > > > than
> > > > > > the
> > > > > > > > delivery timeout could lead to duplicates, which makes
> sense.  To
> > > > > help
> > > > > > > > avoid this situation, do we want to consider a default value
> that
> > > > is
> > > > > > the
> > > > > > > > same as the delivery timeout?
> > > > > > > >
> > > > > > > > Thanks again 

[jira] [Created] (KAFKA-14124) Improve QuorumController fault handling

2022-07-28 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14124:


 Summary: Improve QuorumController fault handling
 Key: KAFKA-14124
 URL: https://issues.apache.org/jira/browse/KAFKA-14124
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1105

2022-07-28 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 488660 lines...]
[2022-07-29T01:26:35.943Z] 
[2022-07-29T01:26:35.943Z] KafkaZkClientTest > testJuteMaxBufffer() STARTED
[2022-07-29T01:26:35.943Z] 
[2022-07-29T01:26:35.943Z] KafkaZkClientTest > testJuteMaxBufffer() PASSED
[2022-07-29T01:26:35.943Z] 
[2022-07-29T01:26:35.943Z] KafkaZkClientTest > 
testCreateTokenChangeNotification() STARTED
[2022-07-29T01:26:36.996Z] 
[2022-07-29T01:26:36.996Z] KafkaZkClientTest > 
testCreateTokenChangeNotification() PASSED
[2022-07-29T01:26:36.996Z] 
[2022-07-29T01:26:36.996Z] KafkaZkClientTest > testGetTopicsAndPartitions() 
STARTED
[2022-07-29T01:26:36.996Z] 
[2022-07-29T01:26:36.996Z] KafkaZkClientTest > testGetTopicsAndPartitions() 
PASSED
[2022-07-29T01:26:36.996Z] 
[2022-07-29T01:26:36.996Z] KafkaZkClientTest > testChroot(boolean) > 
kafka.zk.KafkaZkClientTest.testChroot(boolean)[1] STARTED
[2022-07-29T01:26:36.996Z] 
[2022-07-29T01:26:36.996Z] KafkaZkClientTest > testChroot(boolean) > 
kafka.zk.KafkaZkClientTest.testChroot(boolean)[1] PASSED
[2022-07-29T01:26:36.996Z] 
[2022-07-29T01:26:36.996Z] KafkaZkClientTest > testChroot(boolean) > 
kafka.zk.KafkaZkClientTest.testChroot(boolean)[2] STARTED
[2022-07-29T01:26:38.048Z] 
[2022-07-29T01:26:38.048Z] KafkaZkClientTest > testChroot(boolean) > 
kafka.zk.KafkaZkClientTest.testChroot(boolean)[2] PASSED
[2022-07-29T01:26:38.048Z] 
[2022-07-29T01:26:38.048Z] KafkaZkClientTest > testRegisterBrokerInfo() STARTED
[2022-07-29T01:26:38.048Z] 
[2022-07-29T01:26:38.048Z] KafkaZkClientTest > testRegisterBrokerInfo() PASSED
[2022-07-29T01:26:38.048Z] 
[2022-07-29T01:26:38.048Z] KafkaZkClientTest > testRetryRegisterBrokerInfo() 
STARTED
[2022-07-29T01:26:39.104Z] 
[2022-07-29T01:26:39.104Z] KafkaZkClientTest > testRetryRegisterBrokerInfo() 
PASSED
[2022-07-29T01:26:39.104Z] 
[2022-07-29T01:26:39.104Z] KafkaZkClientTest > testConsumerOffsetPath() STARTED
[2022-07-29T01:26:39.104Z] 
[2022-07-29T01:26:39.104Z] KafkaZkClientTest > testConsumerOffsetPath() PASSED
[2022-07-29T01:26:39.104Z] 
[2022-07-29T01:26:39.104Z] KafkaZkClientTest > 
testDeleteRecursiveWithControllerEpochVersionCheck() STARTED
[2022-07-29T01:26:39.104Z] 
[2022-07-29T01:26:39.104Z] KafkaZkClientTest > 
testDeleteRecursiveWithControllerEpochVersionCheck() PASSED
[2022-07-29T01:26:39.104Z] 
[2022-07-29T01:26:39.104Z] KafkaZkClientTest > testTopicAssignments() STARTED
[2022-07-29T01:26:40.157Z] 
[2022-07-29T01:26:40.157Z] KafkaZkClientTest > testTopicAssignments() PASSED
[2022-07-29T01:26:40.157Z] 
[2022-07-29T01:26:40.157Z] KafkaZkClientTest > 
testControllerManagementMethods() STARTED
[2022-07-29T01:26:40.157Z] 
[2022-07-29T01:26:40.157Z] KafkaZkClientTest > 
testControllerManagementMethods() PASSED
[2022-07-29T01:26:40.157Z] 
[2022-07-29T01:26:40.157Z] KafkaZkClientTest > testTopicAssignmentMethods() 
STARTED
[2022-07-29T01:26:41.385Z] 
[2022-07-29T01:26:41.385Z] KafkaZkClientTest > testTopicAssignmentMethods() 
PASSED
[2022-07-29T01:26:41.385Z] 
[2022-07-29T01:26:41.385Z] KafkaZkClientTest > testConnectionViaNettyClient() 
STARTED
[2022-07-29T01:26:41.385Z] 
[2022-07-29T01:26:41.385Z] KafkaZkClientTest > testConnectionViaNettyClient() 
PASSED
[2022-07-29T01:26:41.385Z] 
[2022-07-29T01:26:41.385Z] KafkaZkClientTest > testPropagateIsrChanges() STARTED
[2022-07-29T01:26:41.385Z] 
[2022-07-29T01:26:41.385Z] KafkaZkClientTest > testPropagateIsrChanges() PASSED
[2022-07-29T01:26:41.385Z] 
[2022-07-29T01:26:41.385Z] KafkaZkClientTest > testControllerEpochMethods() 
STARTED
[2022-07-29T01:26:42.438Z] 
[2022-07-29T01:26:42.438Z] KafkaZkClientTest > testControllerEpochMethods() 
PASSED
[2022-07-29T01:26:42.438Z] 
[2022-07-29T01:26:42.438Z] KafkaZkClientTest > testDeleteRecursive() STARTED
[2022-07-29T01:26:42.438Z] 
[2022-07-29T01:26:42.438Z] KafkaZkClientTest > testDeleteRecursive() PASSED
[2022-07-29T01:26:42.438Z] 
[2022-07-29T01:26:42.438Z] KafkaZkClientTest > testGetTopicPartitionStates() 
STARTED
[2022-07-29T01:26:42.438Z] 
[2022-07-29T01:26:42.438Z] KafkaZkClientTest > testGetTopicPartitionStates() 
PASSED
[2022-07-29T01:26:42.438Z] 
[2022-07-29T01:26:42.438Z] KafkaZkClientTest > 
testCreateConfigChangeNotification() STARTED
[2022-07-29T01:26:43.493Z] 
[2022-07-29T01:26:43.494Z] KafkaZkClientTest > 
testCreateConfigChangeNotification() PASSED
[2022-07-29T01:26:43.494Z] 
[2022-07-29T01:26:43.494Z] KafkaZkClientTest > testDelegationTokenMethods() 
STARTED
[2022-07-29T01:26:43.494Z] 
[2022-07-29T01:26:43.494Z] KafkaZkClientTest > testDelegationTokenMethods() 
PASSED
[2022-07-29T01:26:43.494Z] 
[2022-07-29T01:26:43.494Z] ZooKeeperClientTest > 
testZNodeChangeHandlerForDataChange() STARTED
[2022-07-29T01:26:43.494Z] 
[2022-07-29T01:26:43.494Z] ZooKeeperClientTest > 
testZNodeChangeHandlerForDataChange() PASSED
[2022-07-29T01:26:43.494Z] 
[2022-07-29T01:26:43.494Z] ZooKeeperClientTest > 
testZooKeeperSessionStateMetric()

Re: [DISCUSS] KIP-854 Separate configuration for producer ID expiry

2022-07-28 Thread Luke Chen
Hi Jason,

Thanks for the info. I don't strongly insist on making the default to -1
for backward compatibility.
If other people in the community also agree with the change, I'm good with
that.

Thank you.
Luke




On Fri, Jul 29, 2022 at 5:35 AM Justine Olshan 
wrote:

> Thanks Jason, Luke, Sagar, and Kirk,
>
> Seems like there is still some debate over the default value. I think there
> is a general consensus that we can reduce the default at some point, but
> exactly when is still not clear. I do think Jason made a good point about
> applications taking 1 day to retry. I am interested if there are other use
> cases we didn't consider though.
>
> I've also updated the description to reference `delivery.timeout.ms.` I'm
> not sure if we also need that config to reference this one (the
> bi-directional reference Kirk mentioned). Let me know if something should
> still be updated or if something is unclear.
>
> Thanks again,
> Justine
>
> On Thu, Jul 28, 2022 at 10:46 AM Kirk True  wrote:
>
> > Hi Justine,
> >
> > Thanks for the KIP. I appreciated the background context and clarity you
> > added.
> >
> > On Wed, Jul 27, 2022, at 2:57 AM, Sagar wrote:
> > > Thanks Justine for the KIP. I think it might be better to document the
> > > correlation between the new config and delivery.timeout.ms in the
> Public
> > > Interfaces Description.
> >
> > +1.
> >
> > A bi-directional reference between the two configuration options would be
> > great for clarity. This is especially true given that the value of `
> > producer.id.expiration.ms, when left at -1, comes from the value of
> > transactional.id.expiration.ms.`
> >
> > Thanks!
> > Kirk
> >
> > >
> > > Also, I agree with Luke that for now setting a default to -1 should be
> > > good. We can look to switch to 1 day with major release.
> > >
> > > Thanks!
> > > Sagar.
> > >
> > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen  wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > Thanks for the KIP.
> > > > I agree with you that we should try our best to keep backward
> > > > compatibility, although our intention is to have lower producer id
> > > > expiration timeout.
> > > > So, I think we should keep default to -1 IMO.
> > > > Maybe we change the default to 1 day in next major release (4.0)?
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > > 
> > > > wrote:
> > > >
> > > > > Thanks for taking a look Jason!
> > > > >
> > > > > I wondered if we wanted to have a smaller default but wasn't sure
> > about
> > > > the
> > > > > compatibility story -- especially since there is the chance for
> > producer
> > > > > IDs to expire silently.
> > > > > I do think that 1 day is fairly reasonable. If I don't hear any
> > > > conflicting
> > > > > opinions I can go ahead and update the default.
> > > > >
> > > > > Justine
> > > > >
> > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > > 
> > > > > wrote:
> > > > >
> > > > > > Hi Justine,
> > > > > >
> > > > > > Thanks for the KIP. Although I hate seeing new configurations, I
> > think
> > > > > this
> > > > > > is a good change. Combining these timeout behaviors into a single
> > > > > > configuration was definitely a mistake, but we didn't anticipate
> > the
> > > > > > problem with the producer id cache. I do wonder if we can make
> the
> > > > > default
> > > > > > a bit lower to reduce the chances that users will hit the same
> > memory
> > > > > > issues we have seen. After decoupling this configuration from
> > > > > > transactional.id.expiration.ms, the new timeout just needs to
> > cover
> > > > the
> > > > > > longest duration that a producer might be retrying the same
> Produce
> > > > > > request. 7 days seems too high. Although I think it could go a
> fair
> > > > even
> > > > > > lower, perhaps 1 day is a reasonable place to start?
> > > > > >
> > > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > > 
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Bill,
> > > > > > > Thanks! I was just going to say that hopefully
> > > > > > > transactional.id.expiration.ms would also be over the delivery
> > > > > timeout.
> > > > > > :)
> > > > > > > Thanks for the +1!
> > > > > > >
> > > > > > > Justine
> > > > > > >
> > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck  >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Justine,
> > > > > > > >
> > > > > > > > I just took another look at the KIP, and I realize my
> > > > > > question/suggestion
> > > > > > > > about default values has already been addressed in the
> > > > > `Compatibility`
> > > > > > > > section.
> > > > > > > >
> > > > > > > > I'm +1 on the KIP.
> > > > > > > >
> > > > > > > > -Bill
> > > > > > > >
> > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <
> bbej...@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Justine,
> > > > > > > > >
> > > > > > > > > Thanks for the well written KIP, this looks like i