Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-07-04 Thread Josep Prat
Hi all,

We have had 2[1][2] runs of the system tests since the last blocker was
merged on 3.8. So far we have 19 tests that failed on both runs. I've
compiled them in this list[3].

There seems to these different categories of failing tests:
- QuotaTest --> speaking with Bruno we suspect there is a problem with the
test setup, failed with "ValueError: max() arg is an empty sequence"
- Streams cooperative rebalance upgrade --> It fails on versions 2.3.1 or
older, failed with Timeout
- KRaft Upgrade --> from dev with Isolated and combined KRaft, failed with
RemoteCommandError
- Network degrade test -> failed with RemoteCommandError
- Replica verification tool test --> Timeout for KRaft, but ZK failed on
the first run but worked on the second

If someone has further ideas on what could be causing these failures,
please let me know. Given holidays in the US, the possible test setup
problem might not be able to be fixed today.

[1]:
https://confluent-open-source-kafka-system-test-results.s3-us-west-2.amazonaws.com/3.8/2024-07-02--001.05d6b151-356a-47e5-b724-6fcd79493422--1719991984--confluentinc--3.8--49d2ee3db9/report.html
[2]:
https://confluent-open-source-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/trunk/2024-07-03--001.4803d99b-52df-4f6d-82c2-3f050a6207fa--1720038529--apache--3.8--2fbe32ecb9/report.html
[3]:
https://docs.google.com/document/d/1wbcyzO6GM2SYQaqTMITBTBjHgZgM7mmiAt7TUfh1xt8/edit

Best,

On Tue, Jul 2, 2024 at 7:29 PM Josep Prat  wrote:

> Hi all,
> Thanks for reviewing and merging the latest blockers for 3.8.0. Tomorrow,
> I will start with the process to get the first RC out.
>
> Best!
>
> On Sat, Jun 29, 2024 at 9:04 PM Josep Prat  wrote:
>
>> Hi Justine,
>>
>> Marking MV 3.8-IV0 as latest
>> production MV is done in this PR (I did both together)
>> https://github.com/apache/kafka/pull/16400
>>
>> Best,
>>
>> --
>> Josep Prat
>> Open Source Engineering Director, Aiven
>> josep.p...@aiven.io   |   +491715557497 | aiven.io
>> Aiven Deutschland GmbH
>> Alexanderufer 3-7, 10117 Berlin
>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>> Amtsgericht Charlottenburg, HRB 209739 B
>>
>> On Sat, Jun 29, 2024, 00:52 Justine Olshan 
>> wrote:
>>
>>> The PR is merged. I lowered the severity of the blocker ticket as we
>>> still
>>> have the change in trunk to merge. However, the 3.8 release is no longer
>>> blocked by KAFKA-17050.
>>> I think that was the remaining blocker. The other ones are either already
>>> fixed for 3.8 (KAFKA-17011) or diverted to 3.9 (KAFKA-16840)
>>>
>>> I think there was one more needed change to mark MV 3.8-IV0 as latest
>>> production MV. I will follow up with that.
>>>
>>> Justine
>>>
>>> On Thu, Jun 27, 2024 at 2:34 PM Justine Olshan 
>>> wrote:
>>>
>>> > Here is the PR: https://github.com/apache/kafka/pull/16478
>>> >
>>> > Justine
>>> >
>>> > On Thu, Jun 27, 2024 at 2:21 PM Justine Olshan 
>>> > wrote:
>>> >
>>> >> Hey all,
>>> >> Thanks for your patience. After some discussion, we decided to revert
>>> >> group version from 3.8 since there were too many complexities
>>> associated
>>> >> with getting it to work.
>>> >> I've downgraded the severity of KAFKA-17011 to not be a blocker and
>>> >> opened a ticket (https://issues.apache.org/jira/browse/KAFKA-17050)
>>> to
>>> >> revert from 3.8 (and 3.9) as a blocker instead. I hope to get the PR
>>> out
>>> >> shortly.
>>> >> This one should be less controversial and merged quickly.
>>> >>
>>> >> Thanks again,
>>> >> Justine
>>> >>
>>> >> On Thu, Jun 27, 2024 at 1:22 AM Josep Prat
>>> 
>>> >> wrote:
>>> >>
>>> >>> Hi all,
>>> >>>
>>> >>> I just wanted to ask again for your help in reviewing these 2 last
>>> >>> blockers
>>> >>> for the 3.8.0 release:
>>> >>> https://github.com/apache/kafka/pull/16400
>>> >>> https://github.com/apache/kafka/pull/16420
>>> >>>
>>> >>> Thanks!
>>> >>>
>>> >>>
>>> >>> On Mon, Jun 24, 2024 at 9:27 AM Josep Prat 
>>> wrote:
>>> >>>
>>> >>> > Hi all,
>>> >>> >
>>> >>> > We currently have a couple of blockers for the 3.8.0 release.
>>> These are
>>> >>> > the following:
>>> >>> > - Reverting commit KAFKA-16154 and mark latest production metadata
>>> as
>>> >>> > 3.8.0: https://github.com/apache/kafka/pull/16400
>>> >>> > - Fix some failing system tests:
>>> >>> > https://github.com/apache/kafka/pull/16420
>>> >>> > Can we get some eyes on these 2 PRs? Thanks!
>>> >>> >
>>> >>> > To easily track this in feature releases, I created a new label
>>> called
>>> >>> > "Blocker" the idea is to mark PRs that are solving an Issue marked
>>> as
>>> >>> > "blocker". This might increase visibility and help getting those
>>> >>> reviewed
>>> >>> > promptly. Here is the link to the PRs with this label:
>>> >>> > https://github.com/apache/kafka/labels/Blocker
>>> >>> >
>>> >>> > Best,
>>> >>> >
>>> >>> > On Thu, Jun 20, 2024 at 7:09 PM Josep Prat 
>>> >>> wrote:
>>> >>> >
>>> >>> >> Thanks for the heads up Justine!
>>> >>> >>
>>> >>> >> On Thu, Jun 20, 2024 at 5:54 PM Jus

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #3075

2024-07-04 Thread Apache Jenkins Server
See 




Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-07-04 Thread Luke Chen
Hi Josep,

For this
- QuotaTest --> speaking with Bruno we suspect there is a problem with the
test setup, failed with "ValueError: max() arg is an empty sequence"

It's a known issue: KAFKA-16138
 .
It should be passed with local specific tests run.
Do you want me help verify it by running it in my environment?

Thanks.
Luke



On Thu, Jul 4, 2024 at 4:03 PM Josep Prat 
wrote:

> Hi all,
>
> We have had 2[1][2] runs of the system tests since the last blocker was
> merged on 3.8. So far we have 19 tests that failed on both runs. I've
> compiled them in this list[3].
>
> There seems to these different categories of failing tests:
> - QuotaTest --> speaking with Bruno we suspect there is a problem with the
> test setup, failed with "ValueError: max() arg is an empty sequence"
> - Streams cooperative rebalance upgrade --> It fails on versions 2.3.1 or
> older, failed with Timeout
> - KRaft Upgrade --> from dev with Isolated and combined KRaft, failed with
> RemoteCommandError
> - Network degrade test -> failed with RemoteCommandError
> - Replica verification tool test --> Timeout for KRaft, but ZK failed on
> the first run but worked on the second
>
> If someone has further ideas on what could be causing these failures,
> please let me know. Given holidays in the US, the possible test setup
> problem might not be able to be fixed today.
>
> [1]:
>
> https://confluent-open-source-kafka-system-test-results.s3-us-west-2.amazonaws.com/3.8/2024-07-02--001.05d6b151-356a-47e5-b724-6fcd79493422--1719991984--confluentinc--3.8--49d2ee3db9/report.html
> [2]:
>
> https://confluent-open-source-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/trunk/2024-07-03--001.4803d99b-52df-4f6d-82c2-3f050a6207fa--1720038529--apache--3.8--2fbe32ecb9/report.html
> [3]:
>
> https://docs.google.com/document/d/1wbcyzO6GM2SYQaqTMITBTBjHgZgM7mmiAt7TUfh1xt8/edit
>
> Best,
>
> On Tue, Jul 2, 2024 at 7:29 PM Josep Prat  wrote:
>
> > Hi all,
> > Thanks for reviewing and merging the latest blockers for 3.8.0. Tomorrow,
> > I will start with the process to get the first RC out.
> >
> > Best!
> >
> > On Sat, Jun 29, 2024 at 9:04 PM Josep Prat  wrote:
> >
> >> Hi Justine,
> >>
> >> Marking MV 3.8-IV0 as latest
> >> production MV is done in this PR (I did both together)
> >> https://github.com/apache/kafka/pull/16400
> >>
> >> Best,
> >>
> >> --
> >> Josep Prat
> >> Open Source Engineering Director, Aiven
> >> josep.p...@aiven.io   |   +491715557497 | aiven.io
> >> Aiven Deutschland GmbH
> >> Alexanderufer 3-7, 10117 Berlin
> >> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >> Amtsgericht Charlottenburg, HRB 209739 B
> >>
> >> On Sat, Jun 29, 2024, 00:52 Justine Olshan  >
> >> wrote:
> >>
> >>> The PR is merged. I lowered the severity of the blocker ticket as we
> >>> still
> >>> have the change in trunk to merge. However, the 3.8 release is no
> longer
> >>> blocked by KAFKA-17050.
> >>> I think that was the remaining blocker. The other ones are either
> already
> >>> fixed for 3.8 (KAFKA-17011) or diverted to 3.9 (KAFKA-16840)
> >>>
> >>> I think there was one more needed change to mark MV 3.8-IV0 as latest
> >>> production MV. I will follow up with that.
> >>>
> >>> Justine
> >>>
> >>> On Thu, Jun 27, 2024 at 2:34 PM Justine Olshan 
> >>> wrote:
> >>>
> >>> > Here is the PR: https://github.com/apache/kafka/pull/16478
> >>> >
> >>> > Justine
> >>> >
> >>> > On Thu, Jun 27, 2024 at 2:21 PM Justine Olshan  >
> >>> > wrote:
> >>> >
> >>> >> Hey all,
> >>> >> Thanks for your patience. After some discussion, we decided to
> revert
> >>> >> group version from 3.8 since there were too many complexities
> >>> associated
> >>> >> with getting it to work.
> >>> >> I've downgraded the severity of KAFKA-17011 to not be a blocker and
> >>> >> opened a ticket (https://issues.apache.org/jira/browse/KAFKA-17050)
> >>> to
> >>> >> revert from 3.8 (and 3.9) as a blocker instead. I hope to get the PR
> >>> out
> >>> >> shortly.
> >>> >> This one should be less controversial and merged quickly.
> >>> >>
> >>> >> Thanks again,
> >>> >> Justine
> >>> >>
> >>> >> On Thu, Jun 27, 2024 at 1:22 AM Josep Prat
> >>> 
> >>> >> wrote:
> >>> >>
> >>> >>> Hi all,
> >>> >>>
> >>> >>> I just wanted to ask again for your help in reviewing these 2 last
> >>> >>> blockers
> >>> >>> for the 3.8.0 release:
> >>> >>> https://github.com/apache/kafka/pull/16400
> >>> >>> https://github.com/apache/kafka/pull/16420
> >>> >>>
> >>> >>> Thanks!
> >>> >>>
> >>> >>>
> >>> >>> On Mon, Jun 24, 2024 at 9:27 AM Josep Prat 
> >>> wrote:
> >>> >>>
> >>> >>> > Hi all,
> >>> >>> >
> >>> >>> > We currently have a couple of blockers for the 3.8.0 release.
> >>> These are
> >>> >>> > the following:
> >>> >>> > - Reverting commit KAFKA-16154 and mark latest production
> metadata
> >>> as
> >>> >>> > 3.8.0: https://github.com/apache/kafka/pull/16400
> >>> >>> > - Fix some failing system tests:
> >>> >>> > https://git

Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-07-04 Thread Josep Prat
Hi Luke,

Thanks for the pointer, if you have an environment where you can run the
tests I would highly appreciate it!

I managed to run this test suite locally and currently only this one fails
consistently, the rest pass:

Module: kafkatest.tests.client.quota_test
Class:  QuotaTest
Method: test_quota
Arguments:
{
  "old_client_throttling_behavior": true,
  "quota_type": "client-id"
}

Failure:
TimeoutError("Timed out waiting 600 seconds for service nodes to
finish. These nodes are still alive:
['ProducerPerformanceService-0-140496695824336 node 1 on worker3']")
Traceback (most recent call last):
  File 
"/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktape/tests/runner_client.py",
line 184, in _do_run
data = self.run_test()
  File 
"/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktape/tests/runner_client.py",
line 262, in run_test
return self.test_context.function(self.test)
  File 
"/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktape/mark/_mark.py",
line 433, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/jlprat/projects/kafka/tests/kafkatest/tests/client/quota_test.py",
line 157, in test_quota
producer.run()
  File 
"/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktape/services/service.py",
line 345, in run
self.wait()
  File 
"/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktape/services/background_thread.py",
line 72, in wait
super(BackgroundThreadService, self).wait(timeout_sec)
  File 
"/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktape/services/service.py",
line 293, in wait
raise TimeoutError("Timed out waiting %s seconds for service nodes
to finish. " % str(timeout_sec)
ducktape.errors.TimeoutError: Timed out waiting 600 seconds for
service nodes to finish. These nodes are still alive:
['ProducerPerformanceService-0-140496695824336 node 1 on worker3']


On Thu, Jul 4, 2024 at 11:57 AM Luke Chen  wrote:

> Hi Josep,
>
> For this
> - QuotaTest --> speaking with Bruno we suspect there is a problem with the
> test setup, failed with "ValueError: max() arg is an empty sequence"
>
> It's a known issue: KAFKA-16138
>  .
> It should be passed with local specific tests run.
> Do you want me help verify it by running it in my environment?
>
> Thanks.
> Luke
>
>
>
> On Thu, Jul 4, 2024 at 4:03 PM Josep Prat 
> wrote:
>
> > Hi all,
> >
> > We have had 2[1][2] runs of the system tests since the last blocker was
> > merged on 3.8. So far we have 19 tests that failed on both runs. I've
> > compiled them in this list[3].
> >
> > There seems to these different categories of failing tests:
> > - QuotaTest --> speaking with Bruno we suspect there is a problem with
> the
> > test setup, failed with "ValueError: max() arg is an empty sequence"
> > - Streams cooperative rebalance upgrade --> It fails on versions 2.3.1 or
> > older, failed with Timeout
> > - KRaft Upgrade --> from dev with Isolated and combined KRaft, failed
> with
> > RemoteCommandError
> > - Network degrade test -> failed with RemoteCommandError
> > - Replica verification tool test --> Timeout for KRaft, but ZK failed on
> > the first run but worked on the second
> >
> > If someone has further ideas on what could be causing these failures,
> > please let me know. Given holidays in the US, the possible test setup
> > problem might not be able to be fixed today.
> >
> > [1]:
> >
> >
> https://confluent-open-source-kafka-system-test-results.s3-us-west-2.amazonaws.com/3.8/2024-07-02--001.05d6b151-356a-47e5-b724-6fcd79493422--1719991984--confluentinc--3.8--49d2ee3db9/report.html
> > [2]:
> >
> >
> https://confluent-open-source-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/trunk/2024-07-03--001.4803d99b-52df-4f6d-82c2-3f050a6207fa--1720038529--apache--3.8--2fbe32ecb9/report.html
> > [3]:
> >
> >
> https://docs.google.com/document/d/1wbcyzO6GM2SYQaqTMITBTBjHgZgM7mmiAt7TUfh1xt8/edit
> >
> > Best,
> >
> > On Tue, Jul 2, 2024 at 7:29 PM Josep Prat  wrote:
> >
> > > Hi all,
> > > Thanks for reviewing and merging the latest blockers for 3.8.0.
> Tomorrow,
> > > I will start with the process to get the first RC out.
> > >
> > > Best!
> > >
> > > On Sat, Jun 29, 2024 at 9:04 PM Josep Prat 
> wrote:
> > >
> > >> Hi Justine,
> > >>
> > >> Marking MV 3.8-IV0 as latest
> > >> production MV is done in this PR (I did both together)
> > >> https://github.com/apache/kafka/pull/16400
> > >>
> > >> Best,
> > >>
> > >> --
> > >> Josep Prat
> > >> Open Source Engineering Director, Aiven
> > >> josep.p...@aiven.io   |   +491715557497 | aiven.io
> > >> Aiven Deutschland GmbH
> > >> Alexanderufer 3-7, 101

[jira] [Created] (KAFKA-17080) bump metadata version for topic Record

2024-07-04 Thread Luke Chen (Jira)
Luke Chen created KAFKA-17080:
-

 Summary: bump metadata version for topic Record
 Key: KAFKA-17080
 URL: https://issues.apache.org/jira/browse/KAFKA-17080
 Project: Kafka
  Issue Type: Sub-task
Reporter: Luke Chen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-752: Support --bootstrap-server in ReplicaVerificationTool

2024-07-04 Thread Chia-Ping Tsai
hi Dongjin

I have assigned https://issues.apache.org/jira/browse/KAFKA-17073 to you. 
thanks for you to take over it :)

On 2024/07/04 05:38:42 Dongjin Lee wrote:
> Okay, Ismael's opinion seems reasonable, so I will follow it.
> 
> Then, could you please assign me the deprecation issue & KIP? Since I
> opened this KIP in the past, I hope to wrap it up. If you are okay with it,
> I will close this issue & KIP and file the deprecation KIP instead.
> 
> Thanks,
> Dongjin
> 
> On Thu, Jul 4, 2024 at 1:42 AM Chia-Ping Tsai  wrote:
> 
> > see https://issues.apache.org/jira/browse/KAFKA-17073 for deprecation.
> >
> > On 2024/07/03 14:57:51 Chia-Ping Tsai wrote:
> > > Agree to Juma
> > >
> > > > Ismael Juma  於 2024年7月3日 晚上10:41 寫道:
> > > >
> > > > I think we should just do a KIP to remove it in 4.0 with deprecation
> > in 3.9.
> > > >
> > > > Ismael
> > > >
> > > >> On Wed, Jul 3, 2024 at 7:38 AM Chia-Ping Tsai 
> > wrote:
> > > >>
> > > >> hi Dongjin
> > > >>
> > > >> It will be removed in 4.0 if we are able to deprecate it in 3.9.
> > Hence, it
> > > >> seems to me enhancing it is a bit weird since the feature is active
> > only
> > > >> for one release …
> > > >>
> > > >>>
> > >  Dongjin Lee  於 2024年7月3日 晚上10:04 寫道:
> > > >>>
> > > >>> Hi Tsai,
> > > >>>
> > > >>> Sorry for being late. How about this way?
> > > >>>
> > > >>> 1. Amend mention on the deprecation plan to the original KIP.
> > > >>> 2. You cast +1 to this voting thread.
> > > >>> 3. Add a new KIP to remove this tool with the 4.0 release.
> > > >>>
> > > >>> Since this KIP already has +2 bindings with PR, this way would be
> > > >> slightly
> > > >>> more swift. How do you think?
> > > >>>
> > > >>> Thanks,
> > > >>> Dongjin
> > > >>>
> > >  On Mon, Jun 3, 2024 at 4:15 AM Chia-Ping Tsai 
> > > >> wrote:
> > > 
> > >  `replica_verification_test.py` is unstable in my jenkins, and then I
> > >  notice this thread.
> > > 
> > >  Maybe kafka 4 is a good timing to remove this tool, but does it
> > need a
> > >  KIP? If so, I'd like to file a KIP for it.
> > > 
> > >  Best,
> > >  Chia-Ping
> > > 
> > > > On 2021/06/10 05:01:43 Ismael Juma wrote:
> > > > KAFKA-12600 was a general change, not related to this tool
> > > >> specifically.
> > >  I
> > > > am not convinced this tool is actually useful, I haven't seen
> > anyone
> > >  using
> > > > it in years.
> > > >
> > > > Ismael
> > > >
> > > >> On Wed, Jun 9, 2021 at 9:51 PM Dongjin Lee 
> > > >> wrote:
> > > >
> > > >> Hi Ismael,
> > > >>
> > > >> Before I submit this KIP, I reviewed some history. When KIP-499
> > > >> <
> > > >>
> > > 
> > > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-499+-+Unify+connection+name+flag+for+command+line+tool
> > > >>>
> > > >> tried to resolve the inconsistencies between the command line
> > tools,
> > >  two
> > > >> tools were omitted, probably by mistake.
> > > >>
> > > >> - KAFKA-12878: Support --bootstrap-server
> > >  kafka-streams-application-reset
> > > >> 
> > > >> - KAFKA-12899: Support --bootstrap-server in
> > ReplicaVerificationTool
> > > >>  (this one)
> > > >>
> > > >> And it seems like this tool is still working. The last update was
> > > >> KAFKA-12600 
> > by
> > >  you,
> > > >> which will also be included in this 3.0.0 release. It is why I
> > >  determined
> > > >> that this tool is worth updating.
> > > >>
> > > >> Thanks,
> > > >> Dongjin
> > > >>
> > > >> On Thu, Jun 10, 2021 at 1:26 PM Ismael Juma 
> > > >> wrote:
> > > >>
> > > >>> Hi Dongjin,
> > > >>>
> > > >>> Does this tool still work? I recall that there were some doubts
> > >  about it
> > > >>> and that's why it wasn't updated previously.
> > > >>>
> > > >>> Ismael
> > > >>>
> > > >>> On Sat, Jun 5, 2021 at 2:38 PM Dongjin Lee 
> > >  wrote:
> > > >>>
> > >  Hi all,
> > > 
> > >  I'd like to call for a vote on KIP-752: Support
> > --bootstrap-server
> > >  in
> > >  ReplicaVerificationTool:
> > > 
> > > 
> > > 
> > > >>>
> > > >>
> > > 
> > > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-752%3A+Support+--bootstrap-server+in+ReplicaVerificationTool
> > > 
> > >  Best,
> > >  Dongjin
> > > 
> > >  --
> > >  *Dongjin Lee*
> > > 
> > >  *A hitchhiker in the mathematical world.*
> > > 
> > > 
> > > 
> > >  *github:  github.com/dongjinleekr
> > >  keybase:
> > > >>> https://keybase.io/dongjinleekr
> > >  

[jira] [Resolved] (KAFKA-17058) Extend CoordinatorRuntime to support non-atomic writes

2024-07-04 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-17058.
-
Fix Version/s: 3.9.0
   Resolution: Fixed

> Extend CoordinatorRuntime to support non-atomic writes
> --
>
> Key: KAFKA-17058
> URL: https://issues.apache.org/jira/browse/KAFKA-17058
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: David Jacot
>Assignee: David Jacot
>Priority: Major
> Fix For: 3.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17081) Tweak GroupCoordinatorConfig: re-introduce local attributes and validation

2024-07-04 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-17081:
--

 Summary: Tweak GroupCoordinatorConfig: re-introduce local 
attributes and validation
 Key: KAFKA-17081
 URL: https://issues.apache.org/jira/browse/KAFKA-17081
 Project: Kafka
  Issue Type: Improvement
Reporter: Chia-Ping Tsai
Assignee: Kuan Po Tseng


see discussion: 
https://github.com/apache/kafka/pull/16458#issuecomment-220683



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-07-04 Thread Luke Chen
Hi Josep,

I had run tests for tests/kafkatest/tests/client/quota_test.py based on 3.8
branch, and they all passed.

*19:54:24*  
*19:54:24*
 SESSION REPORT (ALL TESTS)*19:54:24*  ducktape version:
0.11.4*19:54:24*  session_id:   2024-07-04--001*19:54:24*  run
time: 12 minutes 39.940 seconds*19:54:24*  tests run:
9*19:54:24*  passed:   9*19:54:24*  flaky:
0*19:54:24*  failed:   0*19:54:24*  ignored:
0*19:54:24*  
*19:54:24*
 test_id:
kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=client-id.consumer_num=2*19:54:24*
 status: PASS*19:54:24*  run time:   3 minutes 51.280
seconds*19:54:24*
*19:54:24*
 test_id:
kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=.user.client-id.override_quota=True*19:54:24*
 status: PASS*19:54:24*  run time:   4 minutes 21.082
seconds*19:54:24*
*19:54:24*
 test_id:
kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=.user.client-id.override_quota=False*19:54:24*
 status: PASS*19:54:24*  run time:   5 minutes 14.854
seconds*19:54:24*
*19:54:24*
 test_id:
kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=client-id.old_broker_throttling_behavior=True*19:54:24*
 status: PASS*19:54:24*  run time:   3 minutes 0.505
seconds*19:54:24*
*19:54:24*
 test_id:
kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=client-id.old_client_throttling_behavior=True*19:54:24*
 status: PASS*19:54:24*  run time:   3 minutes 19.629
seconds*19:54:24*
*19:54:24*
 test_id:
kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=client-id.override_quota=False*19:54:24*
 status: PASS*19:54:24*  run time:   4 minutes 11.296
seconds*19:54:24*
*19:54:24*
 test_id:
kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=client-id.override_quota=True*19:54:24*
 status: PASS*19:54:24*  run time:   4 minutes 10.578
seconds*19:54:24*
*19:54:24*
 test_id:
kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=user.override_quota=False*19:54:24*
 status: PASS*19:54:24*  run time:   4 minutes 19.187
seconds*19:54:24*
*19:54:24*
 test_id:
kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=user.override_quota=True*19:54:24*
 status: PASS*19:54:24*  run time:   3 minutes 13.666
seconds*19:54:24*



Thanks.
Luke

On Thu, Jul 4, 2024 at 6:01 PM Josep Prat 
wrote:

> Hi Luke,
>
> Thanks for the pointer, if you have an environment where you can run the
> tests I would highly appreciate it!
>
> I managed to run this test suite locally and currently only this one fails
> consistently, the rest pass:
>
> Module: kafkatest.tests.client.quota_test
> Class:  QuotaTest
> Method: test_quota
> Arguments:
> {
>   "old_client_throttling_behavior": true,
>   "quota_type": "client-id"
> }
>
> Failure:
> TimeoutError("Timed out waiting 600 seconds for service nodes to
> finish. These nodes are still alive:
> ['ProducerPerformanceService-0-140496695824336 node 1 on worker3']")
> Traceback (most recent call last):
>   File
> "/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktape/tests/runner_client.py",
> line 184, in _do_run
> data = self.run_test()
>   File
> "/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktape/tests/runner_client.py",
> line 262, in run_test
> return self.test_context.function(self.test)
>   File
> "/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktape/mark/_mark.py",
> line 433, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File
> "/home/jlprat/projects/kafka/tests/kafkatest/tests/client/quota_test.py",
> line 157, in test_quota
> producer.run()
>   File
> "/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktape/services/service.py",
> line 345, in run
> self.wait()
>   File
> "/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktap

Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-07-04 Thread Josep Prat
Thanks Luke!

--
Josep Prat
Open Source Engineering Director, Aiven
josep.p...@aiven.io   |   +491715557497 | aiven.io
Aiven Deutschland GmbH
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B

On Thu, Jul 4, 2024, 14:04 Luke Chen  wrote:

> Hi Josep,
>
> I had run tests for tests/kafkatest/tests/client/quota_test.py based on 3.8
> branch, and they all passed.
>
> *19:54:24*
> *19:54:24*
>  SESSION REPORT (ALL TESTS)*19:54:24*  ducktape version:
> 0.11.4*19:54:24*  session_id:   2024-07-04--001*19:54:24*  run
> time: 12 minutes 39.940 seconds*19:54:24*  tests run:
> 9*19:54:24*  passed:   9*19:54:24*  flaky:
> 0*19:54:24*  failed:   0*19:54:24*  ignored:
> 0*19:54:24*
> *19:54:24*
>  test_id:
> kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=client-id.consumer_num=2*19:54:24*
>  status: PASS*19:54:24*  run time:   3 minutes 51.280
> seconds*19:54:24*
>
> *19:54:24*
>  test_id:
> kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=.user.client-id.override_quota=True*19:54:24*
>  status: PASS*19:54:24*  run time:   4 minutes 21.082
> seconds*19:54:24*
>
> *19:54:24*
>  test_id:
> kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=.user.client-id.override_quota=False*19:54:24*
>  status: PASS*19:54:24*  run time:   5 minutes 14.854
> seconds*19:54:24*
>
> *19:54:24*
>  test_id:
> kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=client-id.old_broker_throttling_behavior=True*19:54:24*
>  status: PASS*19:54:24*  run time:   3 minutes 0.505
> seconds*19:54:24*
>
> *19:54:24*
>  test_id:
> kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=client-id.old_client_throttling_behavior=True*19:54:24*
>  status: PASS*19:54:24*  run time:   3 minutes 19.629
> seconds*19:54:24*
>
> *19:54:24*
>  test_id:
> kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=client-id.override_quota=False*19:54:24*
>  status: PASS*19:54:24*  run time:   4 minutes 11.296
> seconds*19:54:24*
>
> *19:54:24*
>  test_id:
> kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=client-id.override_quota=True*19:54:24*
>  status: PASS*19:54:24*  run time:   4 minutes 10.578
> seconds*19:54:24*
>
> *19:54:24*
>  test_id:
> kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=user.override_quota=False*19:54:24*
>  status: PASS*19:54:24*  run time:   4 minutes 19.187
> seconds*19:54:24*
>
> *19:54:24*
>  test_id:
> kafkatest.tests.client.quota_test.QuotaTest.test_quota.quota_type=user.override_quota=True*19:54:24*
>  status: PASS*19:54:24*  run time:   3 minutes 13.666
> seconds*19:54:24*
>
> 
>
>
> Thanks.
> Luke
>
> On Thu, Jul 4, 2024 at 6:01 PM Josep Prat 
> wrote:
>
> > Hi Luke,
> >
> > Thanks for the pointer, if you have an environment where you can run the
> > tests I would highly appreciate it!
> >
> > I managed to run this test suite locally and currently only this one
> fails
> > consistently, the rest pass:
> >
> > Module: kafkatest.tests.client.quota_test
> > Class:  QuotaTest
> > Method: test_quota
> > Arguments:
> > {
> >   "old_client_throttling_behavior": true,
> >   "quota_type": "client-id"
> > }
> >
> > Failure:
> > TimeoutError("Timed out waiting 600 seconds for service nodes to
> > finish. These nodes are still alive:
> > ['ProducerPerformanceService-0-140496695824336 node 1 on worker3']")
> > Traceback (most recent call last):
> >   File
> >
> "/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktape/tests/runner_client.py",
> > line 184, in _do_run
> > data = self.run_test()
> >   File
> >
> "/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktape/tests/runner_client.py",
> > line 262, in run_test
> > return self.test_context.function(self.test)
> >   File
> >
> "/home/jlprat/projects/kafka/tests/venv39/lib64/python3.9/site-packages/ducktape-0.8.14-py3.9.egg/ducktape/mark/_mark.py",
> > line 433,

[DISCUSS] KIP-1066: Mechanism to cordon brokers and log directories

2024-07-04 Thread Mickael Maison
Hi,

I'd like to start a discussion on KIP-1066 that introduces a mechanism
to cordon log directories and brokers.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1066%3A+Mechanism+to+cordon+brokers+and+log+directories

Thanks,
Mickael


Re: [DISCUSS] KIP-1051 Statically configured log replication throttling

2024-07-04 Thread Harry Fallows
Hi everyone,

Bumping this one last time before I call a vote. Please take a look if you're 
interested in replication throttling and/or static/dynamic config.

Kind regards,
Harry

On Thursday, 13 June 2024 at 19:39, Harry Fallows 
 wrote:

> Hi Hector,
> 
> I did see your colleague's KIP, and I actually mentioned it in the KIP that I 
> have written. As I see it, both of these KIPs move towards more easily 
> configurable replication throttling and both should be implemented. KIP-1009 
> makes it easier to enable throttling and KIP-1051 makes it easier to apply a 
> throttle rate. I did try to look at supporting KIP-1009 in the discussion 
> thread, however, I only subscribed to the mailing list after it was published 
> and I couldn't figure out how to respond to it in Pony mail. I would be 
> definitely be interested in partnering up to get both changes across the 
> line, whether that be by combining them or supporting both individually (I'm 
> not sure which is best, this is my first contribution!).
> 
> I also see that KAFKA-10190 is mentioned in KIP-1009 as a related ticket. 
> Coincidentally, I raised a PR to address this bug a couple of days ago 
> (https://github.com/apache/kafka/pull/16280). I think this is also a change 
> that will move towards more easily configurable replication throttling as it 
> allows configuring the throttle rate across the whole cluster via a default 
> value. As far as I understand, this change does not need a KIP though because 
> it is a bugfix (the current behaviour of ignoring the default is 
> unintentional).
> 
> Let me know what you think.
> 
> Kind regards,
> Harry
> 
> 
>  Original Message 
> On 6/13/24 19:08, Hector Geraldino (BLOOMBERG/ 919 3RD A) 
> hgerald...@bloomberg.net wrote:
> 
> > Hi Harry,
> > 
> > A colleague of mine opened KIP-1009: Add Broker-level Throttle 
> > Configurations, which aims to achieve the same goal (although from a 
> > different angle).
> > 
> > Can you please take a look and see if this would work for the things you 
> > have in mind? Maybe we can partner and coalesce around either KIP and try 
> > to push it to the end line.
> > 
> > KIP: 
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1009%3A+Add+Broker-level+Throttle+Configurations
> > 
> > From: dev@kafka.apache.org At: 06/13/24 09:22:40 UTC-4:00To: 
> > dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-1051 Statically configured log replication 
> > throttling
> > 
> > Hi everyone,
> > 
> > Bumping this thread, as I haven't yet had any replies.
> > 
> > Kind regards,
> > Harry
> > 
> > On Thursday, 6 June 2024 at 17:59, Harry Fallows
> > harryfall...@protonmail.com.INVALID wrote:
> > 
> > > Hi everyone,
> > > 
> > > I would like to propose a change to allow the static configuration of 
> > > leader
> > > and follower replication throttling rates.
> > > 
> > > These configurations are very useful for preventing client traffic from
> > > getting throttled by replication traffic during events that cause a spike 
> > > in
> > > replication. Currently they are only configurable dynamically, which 
> > > means they
> > > are only really useful for throttling replication traffic during planned
> > > events. By allowing these configurations to be set statically, they can 
> > > be used
> > > to prevent client traffic throttling during unplanned events.
> > > 
> > > KIP:
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1051%3A+Statically+configu
> > > red+log+replication+throttling
> > > 
> > > Best regards,
> > > Harry Fallows


Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-07-04 Thread Alieh Saeedi
Salut from the KIP’s author


Clarifying two points:


1) broker side errors:

As far as I remember we are not going to cover the errors originating from
the broker!

A historical fact: One of the debate points in KIP-1038 was that by
defining a producer custom handler, the user may assume that broker-side
errors must be covered as well. They may define a handler for handling
`RecordTooLargeException` and still see such errors not being handled as
they wish.


2) Regarding irrecoverable/recoverable errors:

Before the fix of `KAFKA-9279`,  errors such as `RecordTooLargeException`
or errors related to missing meta data (both originating from Producer
`send()`) were considered as recoverable but after that they turned into
being irrecoverable without changing any Javadocs or having any KIP.  All
the effort made in this KIP and the former one have been towards returning
to the former state.


I am sure that it is clear for you that which sort of errors we are going
to cover: A single record may happen to NOT get added to the batch due to
the issues with the record or its corresponding topic. The point was that
if the record is not added to the batch let ’s don’t fail the whole batch
because of that non-existing record. We never intended to do sth in broker
side or ignore more important errors.  But I agree with you Chris. If we
are adding a new API, we must have good documentation for that. The
sentence `all irrecoverable transactional errors will still be fatal` as
you suggested is good. What do you think? I am totally against enumerating
errors in Javadocs since these sort of errors can be changing during
time.  More
over, have you ever seen any list of recoverable or irrecoverable errors
somewhere so far?


Bests,

Alieh

On Wed, Jul 3, 2024 at 6:07 PM Chris Egerton 
wrote:

> Hi Justine,
>
> I agree that enumerating a list of errors that should be covered by the KIP
> is difficult; I was thinking it might be easier if we list the errors that
> should _not_ be covered by the KIP, and only if we can't define a
> reasonable heuristic that would cover them without having to explicitly
> list them. Could it be enough to say "all irrecoverable transactional
> errors will still be fatal", or even just "all transactional errors (as
> opposed to errors related to this specific record) will still be fatal"?
>
> Cheers,
>
> Chris
>
> On Wed, Jul 3, 2024 at 11:56 AM Justine Olshan
> 
> wrote:
>
> > Hey Chris,
> >
> > I think what you say makes sense. I agree that defining the behavior
> based
> > on code that can possibly change is not a good idea, and I was trying to
> > get a clearer definition from the KIP's author :)
> >
> > I think it can always be hard to ensure that only specific errors are
> > handled unless they are explicitly enumerated in code as the code can
> > change and can be changed by folks who are not aware of this KIP or
> > conversation.
> > I personally don't have the bandwidth to do this definition/enumeration
> of
> > errors, so hopefully Alieh can expand upon this.
> >
> > Justine
> >
> > On Wed, Jul 3, 2024 at 8:28 AM Chris Egerton 
> > wrote:
> >
> > > Hi Alieh,
> > >
> > > I don't love defining the changes for this KIP in terms of a catch
> clause
> > > in the KafkaProducer class, for two reasons. First, the set of errors
> > that
> > > are handled by that clause may shift over time as the code base is
> > > modified, and second, it would be fairly opaque to users who want to
> > > understand whether an error would be affected by using this API or not.
> > >
> > > It also seems strange that we'd handle some types of
> > > RecordTooLargeException (i.e., ones reported client-side) with this
> API,
> > > but not others (i.e., ones reported by a broker).
> > >
> > > I think this kind of API would be most powerful, most intuitive to
> users,
> > > and easiest to document if we expanded the scope to all
> > record-send-related
> > > errors, except anything indicating issues with exactly-once semantics.
> > That
> > > would include records that are too large (when caught both client- and
> > > server-side), records that can't be sent due to authorization failures,
> > > records sent to nonexistent topics/topic partitions, and keyless
> records
> > > sent to compacted topics. It would not include
> > > ProducerFencedException, InvalidProducerEpochException,
> > > UnsupportedVersionException,
> > > and possibly others.
> > >
> > > @Justine -- do you think it would be possible to develop either a
> better
> > > definition for the kinds of "excluded" errors that should not be
> covered
> > by
> > > this API, or, barring that, a comprehensive list of exact error types?
> > And
> > > do you think this would be acceptable in terms of risk and complexity?
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Tue, Jul 2, 2024 at 5:05 PM Alieh Saeedi
>  > >
> > > wrote:
> > >
> > > > Hey Justine,
> > > >
> > > > About the consequences: the consequences will be like when we did not
> > > have
> > > > the fix

[jira] [Resolved] (KAFKA-16944) Range assignor doesn't co-partition with stickiness

2024-07-04 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16944.
-
Fix Version/s: 3.9.0
   Resolution: Fixed

> Range assignor doesn't co-partition with stickiness
> ---
>
> Key: KAFKA-16944
> URL: https://issues.apache.org/jira/browse/KAFKA-16944
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Ritika Reddy
>Priority: Major
> Fix For: 3.9.0
>
>
> When stickiness is considered during range assignments, it is possible that 
> in certain cases where co-partitioning is guaranteed we fail. 
> An example would be:
> Consider two topics T1, T2 with 3 partitions each and three members A, B, C.
> Let's say the existing assignment (for whatever reason) is:
> {quote}A -> T1P0  ||  B -> T1P1, T2P0, T2P1, T2P2 || C -> T1P2
> {quote}
> Now we trigger a rebalance with the following subscriptions where all members 
> are subscribed to both topics everything else is the same
> {quote}A -> T1, T2 || B -> T1, T2 || C -> T1, T2
> {quote}
> Since all the topics have an equal number of partitions and all the members 
> are subscribed to the same set of topics we would expect co-partitioning 
> right so would we want the final assignment returned to be
> {quote}A -> T1P0, T2P0  ||  B -> T1P1, T2P1 || C -> T1P2, T2P2
> {quote}
> SO currently the client side assignor returns the following but it's because 
> they don't  assign sticky partitions
> {{{}C=[topic1-2, topic2-2], B=[topic1-1, topic2-1], A=[topic1-0, 
> topic2-0]{}}}Our
>  
> Server side assignor returns:
> (The partitions in bold are the sticky partitions)
> {{{}A=MemberAssignment(targetPartitions={topic2=[1], 
> }}\{{{}{*}topic1=[0]{*}{}}}{{{}}), 
> B=MemberAssignment(targetPartitions={{}}}{{{}*topic2=[0]*{}}}{{{}, 
> {{{{{}*topic1=[1]*{}}}{{{}}), 
> C=MemberAssignment(targetPartitions={topic2=[2], {{{{{}*topic1=[2]*{}}}
> *As seen above co-partitioning is expected but not returned.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] - KIP-1064: Upgrade slf4j to 2.x

2024-07-04 Thread Chia-Ping Tsai
hi Muralidhar

thanks for writing the KIP. Please take a look at following comments:

1. please update the Discussion thread. it has incorrect link

2. please complete the section "Compatibility, Deprecation, and Migration
Plan".  We had a good discussion in the PR (
https://github.com/apache/kafka/pull/16324#discussion_r1643359783).

3. how to keep the compatibility of updating slf4j version in the future?
According to slf4j compatibility (
https://www.slf4j.org/manual.html#compatibility), users need to update
their binding version after we update the slf4j-api version. That will be a
trouble as we have to file KIP every time in updating slf4j-api. Including
all binding jars in kafka is a solution, as we can take control over all
binding jars. WDYT?

Best,
Chia-Ping

Muralidhar Basani  於 2024年7月2日 週二
上午3:30寫道:

> Hello,
>
> Regarding KIP-1064 [0], I would like to start a discussion on upgrading
> slf4j to 2.x, which is currently at 1.7.36, and with an option to
> provide slf4j provider in run class.
>
> This is also discussed in jira [1] and git pr [2], and thought we should
> have a kip.
>
> Please note this kip is intended from kafka 4.0.
>
> [0] -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1064%3A+Upgrade+slf4j+to+2.x
> [1] - https://issues.apache.org/jira/browse/KAFKA-16936
> [2] - https://github.com/apache/kafka/pull/16324#discussion_r1644295632
>
> Thanks,
> Murali
>


[jira] [Created] (KAFKA-17082) Rewrite `LogCaptureAppender` by java

2024-07-04 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-17082:
--

 Summary: Rewrite `LogCaptureAppender` by java
 Key: KAFKA-17082
 URL: https://issues.apache.org/jira/browse/KAFKA-17082
 Project: Kafka
  Issue Type: Improvement
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


`LogCaptureAppender` can be used to verify the logger output, and it is useful 
to test kafka tools in the future. Hence, we should rewrite it by java to make 
tools happy :)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] - KIP-1064: Upgrade slf4j to 2.x

2024-07-04 Thread Muralidhar Basani
Hi Chia,
Thank you for dropping by on this.

I have updated both the discussion thread and the Compatibility section
partly. But I would like to discuss a bit more about this and update.

3. how to keep the compatibility of updating slf4j version in the future?
According to slf4j compatibility (
https://www.slf4j.org/manual.html#compatibility), users need to update
their binding version after we update the slf4j-api version. That will be a
trouble as we have to file KIP every time in updating slf4j-api. Including
all binding jars in kafka is a solution, as we can take control over all
binding jars. WDYT?
Indeed, it is not ideal to file a kip always, and ship the upgraded
slf4j-api jars.
I like the approach of providing all binding jars within kafka, however I
see we have only reload4j backend in our dependencies, or I could be wrong.
So just providing a compatible reload4j with it should be ok ?
Or do we also provide other binding jars for logback, log4j, simple etc ?

Thanks,
Murali

On Thu, Jul 4, 2024 at 8:04 PM Chia-Ping Tsai  wrote:

> hi Muralidhar
>
> thanks for writing the KIP. Please take a look at following comments:
>
> 1. please update the Discussion thread. it has incorrect link
>
> 2. please complete the section "Compatibility, Deprecation, and Migration
> Plan".  We had a good discussion in the PR (
> https://github.com/apache/kafka/pull/16324#discussion_r1643359783).
>
> 3. how to keep the compatibility of updating slf4j version in the future?
> According to slf4j compatibility (
> https://www.slf4j.org/manual.html#compatibility), users need to update
> their binding version after we update the slf4j-api version. That will be a
> trouble as we have to file KIP every time in updating slf4j-api. Including
> all binding jars in kafka is a solution, as we can take control over all
> binding jars. WDYT?
>
> Best,
> Chia-Ping
>
> Muralidhar Basani  於 2024年7月2日 週二
> 上午3:30寫道:
>
> > Hello,
> >
> > Regarding KIP-1064 [0], I would like to start a discussion on upgrading
> > slf4j to 2.x, which is currently at 1.7.36, and with an option to
> > provide slf4j provider in run class.
> >
> > This is also discussed in jira [1] and git pr [2], and thought we should
> > have a kip.
> >
> > Please note this kip is intended from kafka 4.0.
> >
> > [0] -
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1064%3A+Upgrade+slf4j+to+2.x
> > [1] - https://issues.apache.org/jira/browse/KAFKA-16936
> > [2] - https://github.com/apache/kafka/pull/16324#discussion_r1644295632
> >
> > Thanks,
> > Murali
> >
>


Re: [DISCUSS] - KIP-1064: Upgrade slf4j to 2.x

2024-07-04 Thread Chia-Ping Tsai
>
> Or do we also provide other binding jars for logback, log4j, simple etc ?


yep, that is a solution which kafka can have strong dependencies on those,
but we need to reach the consensus about "which" providers should be
included

Muralidhar Basani  於 2024年7月5日 週五
上午3:50寫道:

> Hi Chia,
> Thank you for dropping by on this.
>
> I have updated both the discussion thread and the Compatibility section
> partly. But I would like to discuss a bit more about this and update.
>
> 3. how to keep the compatibility of updating slf4j version in the future?
> According to slf4j compatibility (
> https://www.slf4j.org/manual.html#compatibility), users need to update
> their binding version after we update the slf4j-api version. That will be a
> trouble as we have to file KIP every time in updating slf4j-api. Including
> all binding jars in kafka is a solution, as we can take control over all
> binding jars. WDYT?
> Indeed, it is not ideal to file a kip always, and ship the upgraded
> slf4j-api jars.
> I like the approach of providing all binding jars within kafka, however I
> see we have only reload4j backend in our dependencies, or I could be wrong.
> So just providing a compatible reload4j with it should be ok ?
> Or do we also provide other binding jars for logback, log4j, simple etc ?
>
> Thanks,
> Murali
>
> On Thu, Jul 4, 2024 at 8:04 PM Chia-Ping Tsai  wrote:
>
> > hi Muralidhar
> >
> > thanks for writing the KIP. Please take a look at following comments:
> >
> > 1. please update the Discussion thread. it has incorrect link
> >
> > 2. please complete the section "Compatibility, Deprecation, and Migration
> > Plan".  We had a good discussion in the PR (
> > https://github.com/apache/kafka/pull/16324#discussion_r1643359783).
> >
> > 3. how to keep the compatibility of updating slf4j version in the future?
> > According to slf4j compatibility (
> > https://www.slf4j.org/manual.html#compatibility), users need to update
> > their binding version after we update the slf4j-api version. That will
> be a
> > trouble as we have to file KIP every time in updating slf4j-api.
> Including
> > all binding jars in kafka is a solution, as we can take control over all
> > binding jars. WDYT?
> >
> > Best,
> > Chia-Ping
> >
> > Muralidhar Basani  於 2024年7月2日 週二
> > 上午3:30寫道:
> >
> > > Hello,
> > >
> > > Regarding KIP-1064 [0], I would like to start a discussion on upgrading
> > > slf4j to 2.x, which is currently at 1.7.36, and with an option to
> > > provide slf4j provider in run class.
> > >
> > > This is also discussed in jira [1] and git pr [2], and thought we
> should
> > > have a kip.
> > >
> > > Please note this kip is intended from kafka 4.0.
> > >
> > > [0] -
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1064%3A+Upgrade+slf4j+to+2.x
> > > [1] - https://issues.apache.org/jira/browse/KAFKA-16936
> > > [2] -
> https://github.com/apache/kafka/pull/16324#discussion_r1644295632
> > >
> > > Thanks,
> > > Murali
> > >
> >
>


[jira] [Resolved] (KAFKA-17059) Remove `dynamicConfigOverride` from KafkaConfig

2024-07-04 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-17059.

Fix Version/s: 3.9.0
   Resolution: Fixed

> Remove `dynamicConfigOverride` from KafkaConfig
> ---
>
> Key: KAFKA-17059
> URL: https://issues.apache.org/jira/browse/KAFKA-17059
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: TengYao Chi
>Priority: Minor
> Fix For: 3.9.0
>
>
> It seems the field is never defined now, so we can remove it to simplify 
> constructor of KafkaConfig



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1066: Mechanism to cordon brokers and log directories

2024-07-04 Thread Haruki Okada
Hi,

Thank you for the KIP.
The motivation sounds make sense to me.

I have a few questions:

- [nits] "AlterPartitions request" in Error handling section is
"AlterPartitionReassignments request" actually, right?
- Don't we need to include cordoned information in DescribeLogDirs response
too? Some tools (e.g. CruiseControl) need to have a way to know which
broker/log-dirs are cordoned to generate partition reassignment proposal.

Thanks,

2024年7月4日(木) 22:57 Mickael Maison :

> Hi,
>
> I'd like to start a discussion on KIP-1066 that introduces a mechanism
> to cordon log directories and brokers.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1066%3A+Mechanism+to+cordon+brokers+and+log+directories
>
> Thanks,
> Mickael
>


-- 

Okada Haruki
ocadar...@gmail.com