Re: [VOTE] 3.7.0 RC4

2024-02-18 Thread Stanislav Kozlovski
The upgrade test passed ->
https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1708103771--apache--3.7--bb6990114b/2024-02-16--001./2024-02-16--001./report.html

The replica verification test succeeded in ZK mode, but failed in
ISOLATED_KRAFT. It just seems to be very flaky.
https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1708100119--apache--3.7--bb6990114b/2024-02-16--001./2024-02-16--001./report.html

Scheduling another run in
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/6062/

On Fri, Feb 16, 2024 at 6:39 PM Stanislav Kozlovski 
wrote:

> Thanks all for the help in verifying.
>
> I have updated
> https://gist.github.com/stanislavkozlovski/820976fc7bfb5f4dcdf9742fd96a9982
> with the system tests.
> There were two builds ran, and across those - the following tests failed
> two times in a row:
>
>
> *kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest#test_replica_lagsArguments:{
> "metadata_quorum": "ZK"}*Fails with the same error of
> *`TimeoutError('Timed out waiting to reach non-zero number of replica
> lags.')`*
> I have scheduled a re-run of this specific test here ->
> https://jenkins.confluent.io/job/system-test-kafka-branch-builder/6057
>
> *kafkatest.tests.core.upgrade_test.TestUpgrade#test_upgradeArguments:{
> "compression_types": [ "zstd" ], "from_kafka_version": "2.4.1",
> "to_message_format_version": null}*
> Fails with the same error of
> *`TimeoutError('Producer failed to produce messages for 20s.')`*
> *kafkatest.tests.core.upgrade_test.TestUpgrade#test_upgradeArguments:{
> "compression_types": [ "lz4" ], "from_kafka_version": "3.0.2",
> "to_message_format_version": null}*
> Fails with the same error of *`TimeoutError('Producer failed to produce
> messages for 20s.')`*
>
> I have scheduled a re-run of this test here ->
> https://jenkins.confluent.io/job/system-test-kafka-branch-builder/6058/
>
> On Fri, Feb 16, 2024 at 12:15 PM Vedarth Sharma 
> wrote:
>
>> Hey Stanislav,
>>
>> Thanks for the release candidate.
>>
>> +1 (non-binding)
>>
>> I tested and verified the docker image artifact apache/kafka:3.7.0-rc4:-
>> - verified create topic, produce messages and consume messages flow when
>> running the docker image with
>> - default configs
>> - configs provided via env variables
>> - configs provided via file input
>> - verified the html documentation for docker image.
>> - ran the example docker compose files successfully.
>>
>> All looks good for the docker image artifact!
>>
>> Thanks and regards,
>> Vedarth
>>
>>
>> On Thu, Feb 15, 2024 at 10:58 PM Mickael Maison > >
>> wrote:
>>
>> > Hi Stanislav,
>> >
>> > Thanks for running the release.
>> >
>> > I did the following testing:
>> > - verified the check sums and signatures
>> > - ran ZooKeeper and KRaft quickstarts with Scala 2.13 binaries
>> > - ran a successful migration from ZooKeeper to KRaft
>> >
>> > We seem to be missing the upgrade notes for 3.7.0 in the docs. See
>> > https://kafka.apache.org/37/documentation.html#upgrade that still
>> > points to 3.6.0
>> > Before voting I'd like to see results from the system tests too.
>> >
>> > Thanks,
>> > Mickael
>> >
>> > On Thu, Feb 15, 2024 at 6:06 PM Andrew Schofield
>> >  wrote:
>> > >
>> > > +1 (non-binding). I used the staged binaries with Scala 2.13. I tried
>> > the new group coordinator
>> > > and consumer group protocol which is included with the Early Access
>> > release of KIP-848.
>> > > Also verified the availability of the new APIs. All working as
>> expected.
>> > >
>> > > Thanks,
>> > > Andrew
>> > >
>> > > > On 15 Feb 2024, at 05:07, Paolo Patierno 
>> > wrote:
>> > > >
>> > > > +1 (non-binding). I used the staged binaries with Scala 2.13 and
>> mostly
>> > > > focused on the ZooKeeper to KRaft migration with multiple tests.
>> > Everything
>> > > > works fine.
>> > > >
>> > > > Thanks
>> > > > Paolo
>> > > >
>> > > > On Mon, 12 Feb 2024, 22:06 Jakub Scholz,  wrote:
>> > > >
>> > > >> +1 (non-binding). I used the staged binaries with Scala 2.13 and
>> the
>> > staged
>> > > >> Maven artifacts to run my tests. All seems to work fine. Thanks.
>> > > >>
>> > > >> Jakub
>> > > >>
>> > > >> On Fri, Feb 9, 2024 at 4:20 PM Stanislav Kozlovski
>> > > >>  wrote:
>> > > >>
>> > > >>> Hello Kafka users, developers and client-developers,
>> > > >>>
>> > > >>> This is the second candidate we are considering for release of
>> Apache
>> > > >> Kafka
>> > > >>> 3.7.0.
>> > > >>>
>> > > >>> Major changes include:
>> > > >>> - Early Access to KIP-848 - the next generation of the consumer
>> > rebalance
>> > > >>> protocol
>> > > >>> - Early Access to KIP-858: Adding JBOD support to KRaft
>> > > >>> - KIP-714: Observability into Client metrics via a standardized
>> > interface
>> > > >>>
>> > > >>> Release notes for the 3.7.0 release:
>> > > >>>
>> > > >>>
>> > > >>
>> >
>> https://home.apache.o

Seeking Assistance with MM2 Rest API Configuration - Encountering 404 Error

2024-02-18 Thread aaron ai
Kafka Development Team:

I have enabled the internal REST API for the dedicated mode by setting
*dedicated.mode.enable.internal.rest* to true in my MM2 configuration
since KIP-710:
Full support for distributed mode in dedicated MirrorMaker 2.0 clusters
.
However, when I attempt to make requests to the REST API on localhost:8083,
I consistently receive a 404 Not Found error. This behavior has left me
uncertain whether I'm facing a configuration issue or if this might be
indicative of a bug within the system.

BTW, I found a similar question on Stack Overflow
.
My Kafka Version: 3.6.0

Here is my configuration:
clusters = A, B
A.bootstrap.servers = 10.1.1.168:9092
B.bootstrap.servers = 10.1.1.168:9592
A->B.enabled = true
A->B.topics = .*
groups=.*
groups.exclude = test-.*
replication.factor=1
checkpoints.topic.replication.factor=1
heartbeats.topic.replication.factor=1
offset-syncs.topic.replication.factor=1
offset.storage.replication.factor=1
status.storage.replication.factor=1
config.storage.replication.factor=1
sync.topic.acls.enabled = true
emit.heartbeats.interval.seconds = 5
sync.group.offsets.enabled=true
sync.topic.configs.enabled=true
dedicated.mode.enable.internal.rest=true

and the HTTP response.
[image: image.png]
[image: image.png]

I eagerly await your response and appreciate any feedback you might have on
this matter.


Re: [VOTE] 3.7.0 RC4

2024-02-18 Thread Stanislav Kozlovski
The latest system test build completed successfully -
https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1708250728--apache--3.7--02197edaaa/2024-02-18--001./2024-02-18--001./report.html

*System tests are therefore all good*. We just have some flakes

On Sun, Feb 18, 2024 at 10:45 AM Stanislav Kozlovski 
wrote:

> The upgrade test passed ->
> https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1708103771--apache--3.7--bb6990114b/2024-02-16--001./2024-02-16--001./report.html
>
> The replica verification test succeeded in ZK mode, but failed in
> ISOLATED_KRAFT. It just seems to be very flaky.
> https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1708100119--apache--3.7--bb6990114b/2024-02-16--001./2024-02-16--001./report.html
>
> Scheduling another run in
> https://jenkins.confluent.io/job/system-test-kafka-branch-builder/6062/
>
> On Fri, Feb 16, 2024 at 6:39 PM Stanislav Kozlovski <
> stanis...@confluent.io> wrote:
>
>> Thanks all for the help in verifying.
>>
>> I have updated
>> https://gist.github.com/stanislavkozlovski/820976fc7bfb5f4dcdf9742fd96a9982
>> with the system tests.
>> There were two builds ran, and across those - the following tests failed
>> two times in a row:
>>
>>
>> *kafkatest.tests.tools.replica_verification_test.ReplicaVerificationToolTest#test_replica_lagsArguments:{
>> "metadata_quorum": "ZK"}*Fails with the same error of
>> *`TimeoutError('Timed out waiting to reach non-zero number of replica
>> lags.')`*
>> I have scheduled a re-run of this specific test here ->
>> https://jenkins.confluent.io/job/system-test-kafka-branch-builder/6057
>>
>> *kafkatest.tests.core.upgrade_test.TestUpgrade#test_upgradeArguments:{
>> "compression_types": [ "zstd" ], "from_kafka_version": "2.4.1",
>> "to_message_format_version": null}*
>> Fails with the same error of
>> *`TimeoutError('Producer failed to produce messages for 20s.')`*
>> *kafkatest.tests.core.upgrade_test.TestUpgrade#test_upgradeArguments:{
>> "compression_types": [ "lz4" ], "from_kafka_version": "3.0.2",
>> "to_message_format_version": null}*
>> Fails with the same error of *`TimeoutError('Producer failed to produce
>> messages for 20s.')`*
>>
>> I have scheduled a re-run of this test here ->
>> https://jenkins.confluent.io/job/system-test-kafka-branch-builder/6058/
>>
>> On Fri, Feb 16, 2024 at 12:15 PM Vedarth Sharma 
>> wrote:
>>
>>> Hey Stanislav,
>>>
>>> Thanks for the release candidate.
>>>
>>> +1 (non-binding)
>>>
>>> I tested and verified the docker image artifact apache/kafka:3.7.0-rc4:-
>>> - verified create topic, produce messages and consume messages flow when
>>> running the docker image with
>>> - default configs
>>> - configs provided via env variables
>>> - configs provided via file input
>>> - verified the html documentation for docker image.
>>> - ran the example docker compose files successfully.
>>>
>>> All looks good for the docker image artifact!
>>>
>>> Thanks and regards,
>>> Vedarth
>>>
>>>
>>> On Thu, Feb 15, 2024 at 10:58 PM Mickael Maison <
>>> mickael.mai...@gmail.com>
>>> wrote:
>>>
>>> > Hi Stanislav,
>>> >
>>> > Thanks for running the release.
>>> >
>>> > I did the following testing:
>>> > - verified the check sums and signatures
>>> > - ran ZooKeeper and KRaft quickstarts with Scala 2.13 binaries
>>> > - ran a successful migration from ZooKeeper to KRaft
>>> >
>>> > We seem to be missing the upgrade notes for 3.7.0 in the docs. See
>>> > https://kafka.apache.org/37/documentation.html#upgrade that still
>>> > points to 3.6.0
>>> > Before voting I'd like to see results from the system tests too.
>>> >
>>> > Thanks,
>>> > Mickael
>>> >
>>> > On Thu, Feb 15, 2024 at 6:06 PM Andrew Schofield
>>> >  wrote:
>>> > >
>>> > > +1 (non-binding). I used the staged binaries with Scala 2.13. I tried
>>> > the new group coordinator
>>> > > and consumer group protocol which is included with the Early Access
>>> > release of KIP-848.
>>> > > Also verified the availability of the new APIs. All working as
>>> expected.
>>> > >
>>> > > Thanks,
>>> > > Andrew
>>> > >
>>> > > > On 15 Feb 2024, at 05:07, Paolo Patierno >> >
>>> > wrote:
>>> > > >
>>> > > > +1 (non-binding). I used the staged binaries with Scala 2.13 and
>>> mostly
>>> > > > focused on the ZooKeeper to KRaft migration with multiple tests.
>>> > Everything
>>> > > > works fine.
>>> > > >
>>> > > > Thanks
>>> > > > Paolo
>>> > > >
>>> > > > On Mon, 12 Feb 2024, 22:06 Jakub Scholz,  wrote:
>>> > > >
>>> > > >> +1 (non-binding). I used the staged binaries with Scala 2.13 and
>>> the
>>> > staged
>>> > > >> Maven artifacts to run my tests. All seems to work fine. Thanks.
>>> > > >>
>>> > > >> Jakub
>>> > > >>
>>> > > >> On Fri, Feb 9, 2024 at 4:20 PM Stanislav Kozlovski
>>> > > >>  wrote:
>>> > > >>
>>> > > >>> Hello Kafka users, developers and clien

Re: Seeking Assistance with MM2 Rest API Configuration - Encountering 404 Error

2024-02-18 Thread Greg Harris
Hi Aaron,

This is the expected behavior of KIP-710. The relevant section reproduced
here:

> Enabling a single Connect REST server in the MirrorMaker 2.0 node, only
supporting the internal Connect endpoints.

The endpoints that are typically exposed in a distributed Connect for
creating connectors, getting status, etc are not enabled in dedicated mode,
even with the new configuration. Only the internal endpoints that workers
use to communicate with one another are enabled.

If you want the other endpoints for getting the status, reconfiguring
connectors, etc, I would recommend switching to the connect distributed
mode and creating the MM2 connectors via the REST API.

Hope this helps!
Greg

On Sun, Feb 18, 2024 at 2:00 AM aaron ai  wrote:

> Kafka Development Team:
>
> I have enabled the internal REST API for the dedicated mode by setting
> *dedicated.mode.enable.internal.rest* to true in my MM2 configuration
> since KIP-710: Full support for distributed mode in dedicated MirrorMaker
> 2.0 clusters
> .
> However, when I attempt to make requests to the REST API on localhost:8083,
> I consistently receive a 404 Not Found error. This behavior has left me
> uncertain whether I'm facing a configuration issue or if this might be
> indicative of a bug within the system.
>
> BTW, I found a similar question on Stack Overflow
> .
> My Kafka Version: 3.6.0
>
> Here is my configuration:
> clusters = A, B
> A.bootstrap.servers = 10.1.1.168:9092
> B.bootstrap.servers = 10.1.1.168:9592
> A->B.enabled = true
> A->B.topics = .*
> groups=.*
> groups.exclude = test-.*
> replication.factor=1
> checkpoints.topic.replication.factor=1
> heartbeats.topic.replication.factor=1
> offset-syncs.topic.replication.factor=1
> offset.storage.replication.factor=1
> status.storage.replication.factor=1
> config.storage.replication.factor=1
> sync.topic.acls.enabled = true
> emit.heartbeats.interval.seconds = 5
> sync.group.offsets.enabled=true
> sync.topic.configs.enabled=true
> dedicated.mode.enable.internal.rest=true
>
> and the HTTP response.
> [image: image.png]
> [image: image.png]
>
> I eagerly await your response and appreciate any feedback you might have
> on this matter.
>


[jira] [Created] (KAFKA-16277) CooperativeStickyAssignor does not spread topics evenly among consumer group

2024-02-18 Thread Cameron Redpath (Jira)
Cameron Redpath created KAFKA-16277:
---

 Summary: CooperativeStickyAssignor does not spread topics evenly 
among consumer group
 Key: KAFKA-16277
 URL: https://issues.apache.org/jira/browse/KAFKA-16277
 Project: Kafka
  Issue Type: Bug
Reporter: Cameron Redpath
 Attachments: image-2024-02-19-13-00-28-306.png

Consider the following scenario:

`topic-1`: 12 partitions

`topic-2`: 12 partitions

 

Of note, `topic-1` gets approximately 10 times more messages through it than 
`topic-2`. 

 

Both of these topics are consumed by a single application, single consumer 
group, which scales under load. Each member of the consumer group subscribes to 
both topics. The `partition.assignment.strategy` being used is 
`org.apache.kafka.clients.consumer.CooperativeStickyAssignor`. The application 
may start with one consumer. It consumes all partitions from both topics.

 

The problem begins when the application scales up to two consumers. What is 
seen is that all partitions from `topic-1` go to one consumer, and all 
partitions from `topic-2` go to the other consumer. In the case with one topic 
receiving more messages than the other, this results in a very imbalanced group 
where one consumer is receiving 10x the traffic of the other due to partition 
assignment.

 

This is the issue being seen in our cluster at the moment. See this graph of 
the number of messages being processed by each consumer as the group scales 
from one to four consumers:

!image-2024-02-19-13-00-28-306.png|width=537,height=612!

Things to note from this graphic:
 * With two consumers, the partitions for a topic all go to a single consumer 
each
 * With three consumers, the partitions for a topic are split between two 
consumers each
 * With four consumers, the partitions for a topic are split between three 
consumers each

 

With regard to the number of _partitions_ being assigned to each consumer, the 
group is balanced. However, the assignment appears to be biased so that 
partitions from the same topic go to the same consumer. In our scenario, this 
leads to very bad partition assignment.

 

I question if the behaviour of the assignor should be revised, so that each 
topic has its partitions maximally spread across all available members of the 
consumer group. In the above scenario, this would result in much more even 
distribution of load. The behaviour would then be:
 * With two consumers, 6 partitions from each topic go to each consumer
 * With three consumers, 4 partitions from each topic go to each consumer
 * With four consumers, 3 partitions from each topic go to each consumer

 

Of note, we only saw this behaviour after migrating to the 
`CooperativeStickyAssignor`. It was not an issue with the default partition 
assignment strategy.

 

It is possible this may be intended behaviour. In which case, what is the 
preferred workaround for our scenario? Our current workaround if we decide to 
go ahead with the update to `CooperativeStickyAssignor` may be to limit our 
consumers so they only subscribe to one topic, and have two consumer threads 
per instance of the application.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)