[jira] [Created] (KAFKA-17237) [rack-aware assignors] Rebalance is triggered every time a broker isn't reported from a metadata call

2024-08-01 Thread Emanuele Sabellico (Jira)
Emanuele Sabellico created KAFKA-17237:
--

 Summary: [rack-aware assignors] Rebalance is triggered every time 
a broker isn't reported from a metadata call
 Key: KAFKA-17237
 URL: https://issues.apache.org/jira/browse/KAFKA-17237
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 3.8.0, 3.5.0
Reporter: Emanuele Sabellico
 Attachments: test.log

When configuring a client for rack-awareness to enable FFF and rack-aware 
assignors, a rebalance is triggered every time a broker disappears from a 
Metadata response, such as during a cluster roll.
That happens because after KIP 881 metadata appears as changed given the set of 
racks is different (brokers that are down have no info about the rack)

*How to reproduce*
 * Enable *client.rack* on the client and *broker.rack* on the cluster
 * Create a topic with replicas on all the nodes
 * Subscribe to that topic on the client
 * Stop one of the brokers
 * Observe a rebalance is triggered

Attached is a log reproducing the issue in Java client. A few lines showing the 
rejoin requests
{noformat}
[2024-08-01 15:09:07,472] INFO [Consumer clientId=consumer-test_racks-1, 
groupId=test_racks] Request joining group due to: cached metadata has changed 
from (version4: {test_new=[racks=[null, 1b, 1c]]}) at the beginning of the 
rebalance to (version5: {test_new=[racks=[1a, 1b, 1c]]}) 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2024-08-01 15:10:38,689] INFO [Consumer clientId=consumer-test_racks-1, 
groupId=test_racks] Request joining group due to: cached metadata has changed 
from (version6: {test_new=[racks=[1a, 1b, 1c]]}) at the beginning of the 
rebalance to (version42: {test_new=[racks=[null, 1a, 1c]]}) 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2024-08-01 15:11:04,106] INFO [Consumer clientId=consumer-test_racks-1, 
groupId=test_racks] Request joining group due to: cached metadata has changed 
from (version43: {test_new=[racks=[1a, 1b, 1c]]}) at the beginning of the 
rebalance to (version45: {test_new=[racks=[null, 1a, 1b]]}) 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
{noformat}
Same happens in librdkafka as reported in this issue
[https://github.com/confluentinc/librdkafka/issues/4742]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15989) Upgrade existing generic group to consumer group

2023-12-08 Thread Emanuele Sabellico (Jira)
Emanuele Sabellico created KAFKA-15989:
--

 Summary: Upgrade existing generic group to consumer group
 Key: KAFKA-15989
 URL: https://issues.apache.org/jira/browse/KAFKA-15989
 Project: Kafka
  Issue Type: Sub-task
Reporter: Emanuele Sabellico


It should be possible to upgrade an existing generic group to a new consumer 
group, in case it was using either the previous generic protocol or manual 
partition assignment and commit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15997) Ensure fairness in the uniform assignor

2023-12-12 Thread Emanuele Sabellico (Jira)
Emanuele Sabellico created KAFKA-15997:
--

 Summary: Ensure fairness in the uniform assignor
 Key: KAFKA-15997
 URL: https://issues.apache.org/jira/browse/KAFKA-15997
 Project: Kafka
  Issue Type: Sub-task
Reporter: Emanuele Sabellico


 

 

Fairness has to be ensured in uniform assignor as it was in cooperative-sticky 
one.

There's this test 0113 subtest u_multiple_subscription_changes in librdkafka 
where 8 consumers are subscribing to the same topic, and it's verifying that 
all of them are getting 2 partitions assigned. But with new protocol it seems 
two consumers get assigned 3 partitions and 1 has zero partitions. The test 
doesn't configure any client.rack.


{code:java}
[0113_cooperative_rebalance  /478.183s] Consumer assignments 
(subscription_variation 0) (stabilized) (no rebalance cb):
[0113_cooperative_rebalance  /478.183s] Consumer C_0#consumer-3 assignment (2): 
rdkafkatest_rnd24419cc75e59d8de_0113u_1 [5] (2000msgs), 
rdkafkatest_rnd24419cc75e59d8de_0113u_1 [8] (4000msgs)
[0113_cooperative_rebalance  /478.183s] Consumer C_1#consumer-4 assignment (3): 
rdkafkatest_rnd24419cc75e59d8de_0113u_1 [0] (1000msgs), 
rdkafkatest_rnd24419cc75e59d8de_0113u_1 [3] (2000msgs), 
rdkafkatest_rnd24419cc75e59d8de_0113u_1 [13] (1000msgs)
[0113_cooperative_rebalance  /478.184s] Consumer C_2#consumer-5 assignment (2): 
rdkafkatest_rnd24419cc75e59d8de_0113u_1 [6] (1000msgs), 
rdkafkatest_rnd24419cc75e59d8de_0113u_1 [10] (2000msgs)
[0113_cooperative_rebalance  /478.184s] Consumer C_3#consumer-6 assignment (2): 
rdkafkatest_rnd24419cc75e59d8de_0113u_1 [7] (1000msgs), 
rdkafkatest_rnd24419cc75e59d8de_0113u_1 [9] (2000msgs)
[0113_cooperative_rebalance  /478.184s] Consumer C_4#consumer-7 assignment (2): 
rdkafkatest_rnd24419cc75e59d8de_0113u_1 [11] (1000msgs), 
rdkafkatest_rnd24419cc75e59d8de_0113u_1 [14] (3000msgs)
[0113_cooperative_rebalance  /478.184s] Consumer C_5#consumer-8 assignment (3): 
rdkafkatest_rnd24419cc75e59d8de_0113u_1 [1] (2000msgs), 
rdkafkatest_rnd24419cc75e59d8de_0113u_1 [2] (2000msgs), 
rdkafkatest_rnd24419cc75e59d8de_0113u_1 [4] (1000msgs)
[0113_cooperative_rebalance  /478.184s] Consumer C_6#consumer-9 assignment (0): 
[0113_cooperative_rebalance  /478.184s] Consumer C_7#consumer-10 assignment 
(2): rdkafkatest_rnd24419cc75e59d8de_0113u_1 [12] (1000msgs), 
rdkafkatest_rnd24419cc75e59d8de_0113u_1 [15] (1000msgs)
[0113_cooperative_rebalance  /478.184s] 16/32 partitions assigned
[0113_cooperative_rebalance  /478.184s] Consumer C_0#consumer-3 has 2 assigned 
partitions (1 subscribed topic(s)), expecting 2 assigned partitions
[0113_cooperative_rebalance  /478.184s] Consumer C_1#consumer-4 has 3 assigned 
partitions (1 subscribed topic(s)), expecting 2 assigned partitions
[0113_cooperative_rebalance  /478.184s] Consumer C_2#consumer-5 has 2 assigned 
partitions (1 subscribed topic(s)), expecting 2 assigned partitions
[0113_cooperative_rebalance  /478.184s] Consumer C_3#consumer-6 has 2 assigned 
partitions (1 subscribed topic(s)), expecting 2 assigned partitions
[0113_cooperative_rebalance  /478.184s] Consumer C_4#consumer-7 has 2 assigned 
partitions (1 subscribed topic(s)), expecting 2 assigned partitions
[0113_cooperative_rebalance  /478.184s] Consumer C_5#consumer-8 has 3 assigned 
partitions (1 subscribed topic(s)), expecting 2 assigned partitions
[0113_cooperative_rebalance  /478.184s] Consumer C_6#consumer-9 has 0 assigned 
partitions (1 subscribed topic(s)), expecting 2 assigned partitions
[0113_cooperative_rebalance  /478.184s] Consumer C_7#consumer-10 has 2 assigned 
partitions (1 subscribed topic(s)), expecting 2 assigned partitions
[                      /479.057s] 1 test(s) running: 
0113_cooperative_rebalance
[                      /480.057s] 1 test(s) running: 
0113_cooperative_rebalance
[                      /481.057s] 1 test(s) running: 
0113_cooperative_rebalance
[0113_cooperative_rebalance  /482.498s] TEST FAILURE
### Test "0113_cooperative_rebalance (u_multiple_subscription_changes:2390: 
use_rebalance_cb: 0, subscription_variation: 0)" failed at 
test.c:1243:check_test_timeouts() at Thu Dec  7 15:52:15 2023: ###
Test 0113_cooperative_rebalance (u_multiple_subscription_changes:2390: 
use_rebalance_cb: 0, subscription_variation: 0) timed out (timeout set to 480 
seconds)
./run-test.sh: line 62: 3512920 Killed                  $TEST $ARGS
###
### Test ./test-runner in bare mode FAILED! (return code 137) ###
###{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16147) Partition is assigned to two members at the same time

2024-01-16 Thread Emanuele Sabellico (Jira)
Emanuele Sabellico created KAFKA-16147:
--

 Summary: Partition is assigned to two members at the same time
 Key: KAFKA-16147
 URL: https://issues.apache.org/jira/browse/KAFKA-16147
 Project: Kafka
  Issue Type: Sub-task
Reporter: Emanuele Sabellico


While running test 0113 of librdkafka, subtest 
_u_multiple_subscription_changes_ have received this error saying that a 
partition is assigned to two members at the same time.

{code:java}
Error: C_6#consumer-9 is assigned rdkafkatest_rnd550f20623daba04c_0113u_2 [0] 
which is already assigned to consumer C_5#consumer-8 {code}
I've reconstructed this sequence:

C_5 SUBSCRIBES TO T1

{code:java}
%7|1705403451.561|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id 
"rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 6, group instance id 
"(null)", current assignment "", subscribe topics 
"rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]"{code}

C_5 ASSIGNMENT CHANGES TO T1-P7, T1-P8, T1-P12


{code:java}
[2024-01-16 12:10:51,562] INFO [GroupCoordinator id=1 topic=__consumer_offsets 
partition=7] [GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Member 
RaTCu6RXQH-FiSl95iZzdw transitioned from CurrentAssignment(memberEpoch=6, 
previousMemberEpoch=0, targetMemberEpoch=6, state=assigning, 
assignedPartitions={}, partitionsPendingRevocation={}, 
partitionsPendingAssignment={IKXGrFR1Rv-Qes7Ummas6A=[3, 12]}) to 
CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, targetMemberEpoch=14, 
state=stable, assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, 
partitionsPendingRevocation={}, partitionsPendingAssignment={}). 
(org.apache.kafka.coordinator.group.GroupMetadataManager)C_5 RECEIVES TARGET 
ASSIGNMENT %7|1705403451.565|HEARTBEAT|C_5#consumer-8| [thrd:main]: 
GroupCoordinator/1: Heartbeat response received target assignment 
"(null)(IKXGrFR1Rv+Qes7Ummas6A)[7], (null)(IKXGrFR1Rv+Qes7Ummas6A)[8], 
(null)(IKXGrFR1Rv+Qes7Ummas6A)[12]"{code}


C_5 ACKS TARGET ASSIGNMENT

{code:java}
%7|1705403451.566|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id 
"rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id 
"NULL", current assignment 
"rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[7], 
rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[8], 
rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[12]", subscribe 
topics "rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]" 
%7|1705403451.567|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
Heartbeat response received target assignment 
"(null)(IKXGrFR1Rv+Qes7Ummas6A)[7], (null)(IKXGrFR1Rv+Qes7Ummas6A)[8], 
(null)(IKXGrFR1Rv+Qes7Ummas6A)[12]"{code}


C_5 SUBSCRIBES TO T1,T2: T1 partitions are revoked, 5 T2 partitions are pending 
{code:java}
%7|1705403452.612|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id 
"rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id 
"NULL", current assignment "NULL", subscribe topics 
"rdkafkatest_rnd550f20623daba04c_0113u_2((null))[-1], 
rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]" [2024-01-16 12:10:52,615] 
INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] [GroupId 
rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw updated its 
subscribed topics to: [rdkafkatest_rnd550f20623daba04c_0113u_2, 
rdkafkatest_rnd5a91902462d61c2e_0113u_1]. 
(org.apache.kafka.coordinator.group.GroupMetadataManager) [2024-01-16 
12:10:52,616] INFO [GroupCoordinator id=1 topic=__consumer_offsets partition=7] 
[GroupId rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw 
transitioned from CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, 
targetMemberEpoch=14, state=stable, 
assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, 
partitionsPendingRevocation={}, partitionsPendingAssignment={}) to 
CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, targetMemberEpoch=16, 
state=revoking, assignedPartitions={}, 
partitionsPendingRevocation={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, 
partitionsPendingAssignment={EnZMikZURKiUoxZf0rozaA=[0, 1, 2, 8, 9]}). 
(org.apache.kafka.coordinator.group.GroupMetadataManager) 
%7|1705403452.618|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
Heartbeat response received target assignment ""{code}

C_5 FINISHES REVOCATION

{code:java}
%7|1705403452.618|CGRPJOINSTATE|C_5#consumer-8| [thrd:main]: Group 
"rdkafkatest_rnd53b4eb0c2de343_0113u" changed join state wait-assign-call -> 
steady (state up)C_5 ACKS REVOCATION, RECEIVES T2-P0,T2-P1,T2-P2 
%7|1705403452.618|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id 
"rdkafkatest_rnd53b4eb0c2de34

[jira] [Created] (KAFKA-16310) ListOffsets doesn't report the offset with maxTimestamp anymore

2024-02-28 Thread Emanuele Sabellico (Jira)
Emanuele Sabellico created KAFKA-16310:
--

 Summary: ListOffsets doesn't report the offset with maxTimestamp 
anymore
 Key: KAFKA-16310
 URL: https://issues.apache.org/jira/browse/KAFKA-16310
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Emanuele Sabellico


The last one is reported instead.
A test in librdkafka (0081/do_test_ListOffsets) is failing an it's checking 
that the offset with the max timestamp is the middle one and not the last one. 
The tests is passing with 3.6.0 and previous versions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16320) CreateTopics, DeleteTopics and CreatePartitions differences between Zookeeper and KRaft

2024-03-01 Thread Emanuele Sabellico (Jira)
Emanuele Sabellico created KAFKA-16320:
--

 Summary: CreateTopics, DeleteTopics and CreatePartitions 
differences between Zookeeper and KRaft
 Key: KAFKA-16320
 URL: https://issues.apache.org/jira/browse/KAFKA-16320
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Emanuele Sabellico


Test number 0081 with these operations  is failing in librdkafka when using 
KRaft but not when using Zookeeper. The test sets the operation timeout to 0 
and expects that those operations are executed asynchronously. The returned err 
was REQUEST_TIMED_OUT and it was converted to NO_ERROR if operation timeout is 
<= 0.
With KRaft instead NO_ERROR is returned, but the topics aren't created or 
deleted.
Also passing an invalid configuration option it's returning NO_ERROR instead of 
INVALID_CONFIG, that is what happens in Zookeeper or with KRaft if operation 
timeout is > 0.

https://github.com/confluentinc/librdkafka/blob/a6d85bdbc1023b1a5477b8befe516242c3e182f6/tests/0081-admin.c#L5174C9-L5174C29

{code:java}
/* For non-blocking CreateTopicsRequests the broker
 * will returned REQUEST_TIMED_OUT for topics
 * that were triggered for creation -
 * we hide this error code from the application
 * since the topic creation is in fact in progress. */
if (error_code == RD_KAFKA_RESP_ERR_REQUEST_TIMED_OUT &&
rd_kafka_confval_get_int(&rko_req->rko_u.admin_request
.options.operation_timeout) <=
0) {
error_code  = RD_KAFKA_RESP_ERR_NO_ERROR;
this_errstr = NULL;
}
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16320) CreateTopics, DeleteTopics and CreatePartitions differences between Zookeeper and KRaft

2025-02-17 Thread Emanuele Sabellico (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emanuele Sabellico resolved KAFKA-16320.

Resolution: Not A Problem

> CreateTopics, DeleteTopics and CreatePartitions differences between Zookeeper 
> and KRaft
> ---
>
> Key: KAFKA-16320
> URL: https://issues.apache.org/jira/browse/KAFKA-16320
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Emanuele Sabellico
>Assignee: Chia-Ping Tsai
>Priority: Minor
>
> Test number 0081 with these operations  is failing in librdkafka when using 
> KRaft but not when using Zookeeper. The test sets the operation timeout to 0 
> and expects that those operations are executed asynchronously. The returned 
> err was REQUEST_TIMED_OUT and it was converted to NO_ERROR if operation 
> timeout is <= 0.
> With KRaft instead NO_ERROR is returned, but the topics aren't created or 
> deleted.
> Also passing an invalid configuration option it's returning NO_ERROR instead 
> of INVALID_CONFIG, that is what happens in Zookeeper or with KRaft if 
> operation timeout is > 0.
> https://github.com/confluentinc/librdkafka/blob/a6d85bdbc1023b1a5477b8befe516242c3e182f6/tests/0081-admin.c#L5174C9-L5174C29
> {code:java}
> /* For non-blocking CreateTopicsRequests the broker
>  * will returned REQUEST_TIMED_OUT for topics
>  * that were triggered for creation -
>  * we hide this error code from the application
>  * since the topic creation is in fact in progress. */
> if (error_code == RD_KAFKA_RESP_ERR_REQUEST_TIMED_OUT &&
> rd_kafka_confval_get_int(&rko_req->rko_u.admin_request
> .options.operation_timeout) <=
> 0) {
> error_code  = RD_KAFKA_RESP_ERR_NO_ERROR;
> this_errstr = NULL;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-19444) SASL GSSAPI now working with librdkafka and AK 4.x

2025-06-27 Thread Emanuele Sabellico (Jira)
Emanuele Sabellico created KAFKA-19444:
--

 Summary: SASL GSSAPI now working with librdkafka and AK 4.x
 Key: KAFKA-19444
 URL: https://issues.apache.org/jira/browse/KAFKA-19444
 Project: Kafka
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Emanuele Sabellico


When testing librdkafka with AK 4.0 we see that SASL GSSAPI isn't working.
The feature is missing because it's incorrectly checking for JoinGroup v0 only 
and not v0+.

When testing librdkafka versions with 4.0 we missed this case so JoinGroup v0 
and v1 were removed.

A [fix|https://github.com/confluentinc/librdkafka/pull/5131] is already merged 
in librdkafka and will be released in v2.11.0.
For rest of users the RPC versions should be added again to avoid having to 
upgrade the clients.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)