[jira] [Created] (KAFKA-17237) [rack-aware assignors] Rebalance is triggered every time a broker isn't reported from a metadata call
Emanuele Sabellico created KAFKA-17237: -- Summary: [rack-aware assignors] Rebalance is triggered every time a broker isn't reported from a metadata call Key: KAFKA-17237 URL: https://issues.apache.org/jira/browse/KAFKA-17237 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 3.8.0, 3.5.0 Reporter: Emanuele Sabellico Attachments: test.log When configuring a client for rack-awareness to enable FFF and rack-aware assignors, a rebalance is triggered every time a broker disappears from a Metadata response, such as during a cluster roll. That happens because after KIP 881 metadata appears as changed given the set of racks is different (brokers that are down have no info about the rack) *How to reproduce* * Enable *client.rack* on the client and *broker.rack* on the cluster * Create a topic with replicas on all the nodes * Subscribe to that topic on the client * Stop one of the brokers * Observe a rebalance is triggered Attached is a log reproducing the issue in Java client. A few lines showing the rejoin requests {noformat} [2024-08-01 15:09:07,472] INFO [Consumer clientId=consumer-test_racks-1, groupId=test_racks] Request joining group due to: cached metadata has changed from (version4: {test_new=[racks=[null, 1b, 1c]]}) at the beginning of the rebalance to (version5: {test_new=[racks=[1a, 1b, 1c]]}) (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2024-08-01 15:10:38,689] INFO [Consumer clientId=consumer-test_racks-1, groupId=test_racks] Request joining group due to: cached metadata has changed from (version6: {test_new=[racks=[1a, 1b, 1c]]}) at the beginning of the rebalance to (version42: {test_new=[racks=[null, 1a, 1c]]}) (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2024-08-01 15:11:04,106] INFO [Consumer clientId=consumer-test_racks-1, groupId=test_racks] Request joining group due to: cached metadata has changed from (version43: {test_new=[racks=[1a, 1b, 1c]]}) at the beginning of the rebalance to (version45: {test_new=[racks=[null, 1a, 1b]]}) (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) {noformat} Same happens in librdkafka as reported in this issue [https://github.com/confluentinc/librdkafka/issues/4742] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15989) Upgrade existing generic group to consumer group
Emanuele Sabellico created KAFKA-15989: -- Summary: Upgrade existing generic group to consumer group Key: KAFKA-15989 URL: https://issues.apache.org/jira/browse/KAFKA-15989 Project: Kafka Issue Type: Sub-task Reporter: Emanuele Sabellico It should be possible to upgrade an existing generic group to a new consumer group, in case it was using either the previous generic protocol or manual partition assignment and commit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15997) Ensure fairness in the uniform assignor
Emanuele Sabellico created KAFKA-15997: -- Summary: Ensure fairness in the uniform assignor Key: KAFKA-15997 URL: https://issues.apache.org/jira/browse/KAFKA-15997 Project: Kafka Issue Type: Sub-task Reporter: Emanuele Sabellico Fairness has to be ensured in uniform assignor as it was in cooperative-sticky one. There's this test 0113 subtest u_multiple_subscription_changes in librdkafka where 8 consumers are subscribing to the same topic, and it's verifying that all of them are getting 2 partitions assigned. But with new protocol it seems two consumers get assigned 3 partitions and 1 has zero partitions. The test doesn't configure any client.rack. {code:java} [0113_cooperative_rebalance /478.183s] Consumer assignments (subscription_variation 0) (stabilized) (no rebalance cb): [0113_cooperative_rebalance /478.183s] Consumer C_0#consumer-3 assignment (2): rdkafkatest_rnd24419cc75e59d8de_0113u_1 [5] (2000msgs), rdkafkatest_rnd24419cc75e59d8de_0113u_1 [8] (4000msgs) [0113_cooperative_rebalance /478.183s] Consumer C_1#consumer-4 assignment (3): rdkafkatest_rnd24419cc75e59d8de_0113u_1 [0] (1000msgs), rdkafkatest_rnd24419cc75e59d8de_0113u_1 [3] (2000msgs), rdkafkatest_rnd24419cc75e59d8de_0113u_1 [13] (1000msgs) [0113_cooperative_rebalance /478.184s] Consumer C_2#consumer-5 assignment (2): rdkafkatest_rnd24419cc75e59d8de_0113u_1 [6] (1000msgs), rdkafkatest_rnd24419cc75e59d8de_0113u_1 [10] (2000msgs) [0113_cooperative_rebalance /478.184s] Consumer C_3#consumer-6 assignment (2): rdkafkatest_rnd24419cc75e59d8de_0113u_1 [7] (1000msgs), rdkafkatest_rnd24419cc75e59d8de_0113u_1 [9] (2000msgs) [0113_cooperative_rebalance /478.184s] Consumer C_4#consumer-7 assignment (2): rdkafkatest_rnd24419cc75e59d8de_0113u_1 [11] (1000msgs), rdkafkatest_rnd24419cc75e59d8de_0113u_1 [14] (3000msgs) [0113_cooperative_rebalance /478.184s] Consumer C_5#consumer-8 assignment (3): rdkafkatest_rnd24419cc75e59d8de_0113u_1 [1] (2000msgs), rdkafkatest_rnd24419cc75e59d8de_0113u_1 [2] (2000msgs), rdkafkatest_rnd24419cc75e59d8de_0113u_1 [4] (1000msgs) [0113_cooperative_rebalance /478.184s] Consumer C_6#consumer-9 assignment (0): [0113_cooperative_rebalance /478.184s] Consumer C_7#consumer-10 assignment (2): rdkafkatest_rnd24419cc75e59d8de_0113u_1 [12] (1000msgs), rdkafkatest_rnd24419cc75e59d8de_0113u_1 [15] (1000msgs) [0113_cooperative_rebalance /478.184s] 16/32 partitions assigned [0113_cooperative_rebalance /478.184s] Consumer C_0#consumer-3 has 2 assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions [0113_cooperative_rebalance /478.184s] Consumer C_1#consumer-4 has 3 assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions [0113_cooperative_rebalance /478.184s] Consumer C_2#consumer-5 has 2 assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions [0113_cooperative_rebalance /478.184s] Consumer C_3#consumer-6 has 2 assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions [0113_cooperative_rebalance /478.184s] Consumer C_4#consumer-7 has 2 assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions [0113_cooperative_rebalance /478.184s] Consumer C_5#consumer-8 has 3 assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions [0113_cooperative_rebalance /478.184s] Consumer C_6#consumer-9 has 0 assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions [0113_cooperative_rebalance /478.184s] Consumer C_7#consumer-10 has 2 assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions [ /479.057s] 1 test(s) running: 0113_cooperative_rebalance [ /480.057s] 1 test(s) running: 0113_cooperative_rebalance [ /481.057s] 1 test(s) running: 0113_cooperative_rebalance [0113_cooperative_rebalance /482.498s] TEST FAILURE ### Test "0113_cooperative_rebalance (u_multiple_subscription_changes:2390: use_rebalance_cb: 0, subscription_variation: 0)" failed at test.c:1243:check_test_timeouts() at Thu Dec 7 15:52:15 2023: ### Test 0113_cooperative_rebalance (u_multiple_subscription_changes:2390: use_rebalance_cb: 0, subscription_variation: 0) timed out (timeout set to 480 seconds) ./run-test.sh: line 62: 3512920 Killed $TEST $ARGS ### ### Test ./test-runner in bare mode FAILED! (return code 137) ### ###{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16147) Partition is assigned to two members at the same time
Emanuele Sabellico created KAFKA-16147: -- Summary: Partition is assigned to two members at the same time Key: KAFKA-16147 URL: https://issues.apache.org/jira/browse/KAFKA-16147 Project: Kafka Issue Type: Sub-task Reporter: Emanuele Sabellico While running test 0113 of librdkafka, subtest _u_multiple_subscription_changes_ have received this error saying that a partition is assigned to two members at the same time. {code:java} Error: C_6#consumer-9 is assigned rdkafkatest_rnd550f20623daba04c_0113u_2 [0] which is already assigned to consumer C_5#consumer-8 {code} I've reconstructed this sequence: C_5 SUBSCRIBES TO T1 {code:java} %7|1705403451.561|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 6, group instance id "(null)", current assignment "", subscribe topics "rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]"{code} C_5 ASSIGNMENT CHANGES TO T1-P7, T1-P8, T1-P12 {code:java} [2024-01-16 12:10:51,562] INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] [GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw transitioned from CurrentAssignment(memberEpoch=6, previousMemberEpoch=0, targetMemberEpoch=6, state=assigning, assignedPartitions={}, partitionsPendingRevocation={}, partitionsPendingAssignment={IKXGrFR1Rv-Qes7Ummas6A=[3, 12]}) to CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, targetMemberEpoch=14, state=stable, assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, partitionsPendingRevocation={}, partitionsPendingAssignment={}). (org.apache.kafka.coordinator.group.GroupMetadataManager)C_5 RECEIVES TARGET ASSIGNMENT %7|1705403451.565|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat response received target assignment "(null)(IKXGrFR1Rv+Qes7Ummas6A)[7], (null)(IKXGrFR1Rv+Qes7Ummas6A)[8], (null)(IKXGrFR1Rv+Qes7Ummas6A)[12]"{code} C_5 ACKS TARGET ASSIGNMENT {code:java} %7|1705403451.566|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id "NULL", current assignment "rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[7], rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[8], rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[12]", subscribe topics "rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]" %7|1705403451.567|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat response received target assignment "(null)(IKXGrFR1Rv+Qes7Ummas6A)[7], (null)(IKXGrFR1Rv+Qes7Ummas6A)[8], (null)(IKXGrFR1Rv+Qes7Ummas6A)[12]"{code} C_5 SUBSCRIBES TO T1,T2: T1 partitions are revoked, 5 T2 partitions are pending {code:java} %7|1705403452.612|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id "NULL", current assignment "NULL", subscribe topics "rdkafkatest_rnd550f20623daba04c_0113u_2((null))[-1], rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]" [2024-01-16 12:10:52,615] INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] [GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw updated its subscribed topics to: [rdkafkatest_rnd550f20623daba04c_0113u_2, rdkafkatest_rnd5a91902462d61c2e_0113u_1]. (org.apache.kafka.coordinator.group.GroupMetadataManager) [2024-01-16 12:10:52,616] INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] [GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw transitioned from CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, targetMemberEpoch=14, state=stable, assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, partitionsPendingRevocation={}, partitionsPendingAssignment={}) to CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, targetMemberEpoch=16, state=revoking, assignedPartitions={}, partitionsPendingRevocation={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, partitionsPendingAssignment={EnZMikZURKiUoxZf0rozaA=[0, 1, 2, 8, 9]}). (org.apache.kafka.coordinator.group.GroupMetadataManager) %7|1705403452.618|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat response received target assignment ""{code} C_5 FINISHES REVOCATION {code:java} %7|1705403452.618|CGRPJOINSTATE|C_5#consumer-8| [thrd:main]: Group "rdkafkatest_rnd53b4eb0c2de343_0113u" changed join state wait-assign-call -> steady (state up)C_5 ACKS REVOCATION, RECEIVES T2-P0,T2-P1,T2-P2 %7|1705403452.618|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id "rdkafkatest_rnd53b4eb0c2de34
[jira] [Created] (KAFKA-16310) ListOffsets doesn't report the offset with maxTimestamp anymore
Emanuele Sabellico created KAFKA-16310: -- Summary: ListOffsets doesn't report the offset with maxTimestamp anymore Key: KAFKA-16310 URL: https://issues.apache.org/jira/browse/KAFKA-16310 Project: Kafka Issue Type: Bug Affects Versions: 3.7.0 Reporter: Emanuele Sabellico The last one is reported instead. A test in librdkafka (0081/do_test_ListOffsets) is failing an it's checking that the offset with the max timestamp is the middle one and not the last one. The tests is passing with 3.6.0 and previous versions -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16320) CreateTopics, DeleteTopics and CreatePartitions differences between Zookeeper and KRaft
Emanuele Sabellico created KAFKA-16320: -- Summary: CreateTopics, DeleteTopics and CreatePartitions differences between Zookeeper and KRaft Key: KAFKA-16320 URL: https://issues.apache.org/jira/browse/KAFKA-16320 Project: Kafka Issue Type: Bug Affects Versions: 3.7.0 Reporter: Emanuele Sabellico Test number 0081 with these operations is failing in librdkafka when using KRaft but not when using Zookeeper. The test sets the operation timeout to 0 and expects that those operations are executed asynchronously. The returned err was REQUEST_TIMED_OUT and it was converted to NO_ERROR if operation timeout is <= 0. With KRaft instead NO_ERROR is returned, but the topics aren't created or deleted. Also passing an invalid configuration option it's returning NO_ERROR instead of INVALID_CONFIG, that is what happens in Zookeeper or with KRaft if operation timeout is > 0. https://github.com/confluentinc/librdkafka/blob/a6d85bdbc1023b1a5477b8befe516242c3e182f6/tests/0081-admin.c#L5174C9-L5174C29 {code:java} /* For non-blocking CreateTopicsRequests the broker * will returned REQUEST_TIMED_OUT for topics * that were triggered for creation - * we hide this error code from the application * since the topic creation is in fact in progress. */ if (error_code == RD_KAFKA_RESP_ERR_REQUEST_TIMED_OUT && rd_kafka_confval_get_int(&rko_req->rko_u.admin_request .options.operation_timeout) <= 0) { error_code = RD_KAFKA_RESP_ERR_NO_ERROR; this_errstr = NULL; } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16320) CreateTopics, DeleteTopics and CreatePartitions differences between Zookeeper and KRaft
[ https://issues.apache.org/jira/browse/KAFKA-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emanuele Sabellico resolved KAFKA-16320. Resolution: Not A Problem > CreateTopics, DeleteTopics and CreatePartitions differences between Zookeeper > and KRaft > --- > > Key: KAFKA-16320 > URL: https://issues.apache.org/jira/browse/KAFKA-16320 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Emanuele Sabellico >Assignee: Chia-Ping Tsai >Priority: Minor > > Test number 0081 with these operations is failing in librdkafka when using > KRaft but not when using Zookeeper. The test sets the operation timeout to 0 > and expects that those operations are executed asynchronously. The returned > err was REQUEST_TIMED_OUT and it was converted to NO_ERROR if operation > timeout is <= 0. > With KRaft instead NO_ERROR is returned, but the topics aren't created or > deleted. > Also passing an invalid configuration option it's returning NO_ERROR instead > of INVALID_CONFIG, that is what happens in Zookeeper or with KRaft if > operation timeout is > 0. > https://github.com/confluentinc/librdkafka/blob/a6d85bdbc1023b1a5477b8befe516242c3e182f6/tests/0081-admin.c#L5174C9-L5174C29 > {code:java} > /* For non-blocking CreateTopicsRequests the broker > * will returned REQUEST_TIMED_OUT for topics > * that were triggered for creation - > * we hide this error code from the application > * since the topic creation is in fact in progress. */ > if (error_code == RD_KAFKA_RESP_ERR_REQUEST_TIMED_OUT && > rd_kafka_confval_get_int(&rko_req->rko_u.admin_request > .options.operation_timeout) <= > 0) { > error_code = RD_KAFKA_RESP_ERR_NO_ERROR; > this_errstr = NULL; > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-19444) SASL GSSAPI now working with librdkafka and AK 4.x
Emanuele Sabellico created KAFKA-19444: -- Summary: SASL GSSAPI now working with librdkafka and AK 4.x Key: KAFKA-19444 URL: https://issues.apache.org/jira/browse/KAFKA-19444 Project: Kafka Issue Type: Bug Affects Versions: 4.0.0 Reporter: Emanuele Sabellico When testing librdkafka with AK 4.0 we see that SASL GSSAPI isn't working. The feature is missing because it's incorrectly checking for JoinGroup v0 only and not v0+. When testing librdkafka versions with 4.0 we missed this case so JoinGroup v0 and v1 were removed. A [fix|https://github.com/confluentinc/librdkafka/pull/5131] is already merged in librdkafka and will be released in v2.11.0. For rest of users the RPC versions should be added again to avoid having to upgrade the clients. -- This message was sent by Atlassian Jira (v8.20.10#820010)