Consumer Rebalance issue and follow-up on KAFKA-9752

2020-08-09 Thread Dibyendu Bhattacharya
Hi,

This is regards to https://issues.apache.org/jira/browse/KAFKA-9752 issue
where Consumer rebalance can be stuck after new member timeout with old
JoinGroup version.

We have taken the fix, but now see a different issue .

Earlier the ConsumerGroup was stuck in "PendingRebalance" state , which is
not happening now , but now I see members not able to join the group . I
see below logs where members are being removed after session timeout.



[2020-08-09 09:29:00,558] INFO [GroupCoordinator 5]: Pending member XXX in
group YYY  has been removed after session timeout expiration.
(kafka.coordinator.group.GroupCoordinator)

[2020-08-09 09:29:55,856] INFO [GroupCoordinator 5]: Pending member ZZZ in
group YYY  has been removed after session timeout expiration.
(kafka.coordinator.group.GroupCoordinator)



As I see the GroupCoridinator code,  when new member tries to join for
first time,  GroupCoridinator also schedule addPendingMemberExpiration (in
doUnknownJoinGroup call ) with *SessionTimeOut*…



If for some reason , addMemberAndRebalance call takes longer (longer than
SessionTimeOut), and members are still in “Pending” state, the above
addPendingMemberExpiration can remove the pending member and they cannot
join the group. I think that is what is happening.



When for new member , Coordinator is already setting a timeout
in completeAndScheduleNextExpiration(group, member, *NewMemberJoinTimeoutMs*
)



What is the requirement for one more addPendingMemberExpiration task for
new members ?


Is this a possible bug ?


Kafka Simple Consumer API for 0.9

2014-11-14 Thread Dibyendu Bhattacharya
Hi,

Is the Simple Consumer API will change in Kafka 0.9 ?

I can see a Consumer Re-design approach for Kafka 0.9 , which I believe
will not impact any client written using Simple Consumer API . Is that
correct ?


Regards,
Dibyendu


Re: Kafka/Hadoop consumers and producers

2013-08-09 Thread dibyendu . bhattacharya
Hi Ken, 

I am also working on making the Camus fit for Non Avro message for our 
requirement. 

I see you mentioned about this patch 
(https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
 which supports custom data writer for Camus. But this patch is not pulled into 
camus-kafka-0.8 branch. Is there any plan for doing the same ?

Regards, 
Dibyendu


[jira] [Commented] (KAFKA-3083) a soft failure in controller may leave a topic partition in an inconsistent state

2017-04-19 Thread Dibyendu Bhattacharya (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974338#comment-15974338
 ] 

Dibyendu Bhattacharya commented on KAFKA-3083:
--

Hi [~junrao] We also had this issue in Kafka 0.9.x. Any idea when this can be 
fixed. 

> a soft failure in controller may leave a topic partition in an inconsistent 
> state
> -
>
> Key: KAFKA-3083
> URL: https://issues.apache.org/jira/browse/KAFKA-3083
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Mayuresh Gharat
>  Labels: reliability
>
> The following sequence can happen.
> 1. Broker A is the controller and is in the middle of processing a broker 
> change event. As part of this process, let's say it's about to shrink the isr 
> of a partition.
> 2. Then broker A's session expires and broker B takes over as the new 
> controller. Broker B sends the initial leaderAndIsr request to all brokers.
> 3. Broker A continues by shrinking the isr of the partition in ZK and sends 
> the new leaderAndIsr request to the broker (say C) that leads the partition. 
> Broker C will reject this leaderAndIsr since the request comes from a 
> controller with an older epoch. Now we could be in a situation that Broker C 
> thinks the isr has all replicas, but the isr stored in ZK is different.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-948) ISR list in LeaderAndISR path not updated for partitions when Broker (which is not leader) is down

2013-06-19 Thread Dibyendu Bhattacharya (JIRA)
Dibyendu Bhattacharya created KAFKA-948:
---

 Summary: ISR list in LeaderAndISR path not updated for partitions 
when Broker (which is not leader) is down
 Key: KAFKA-948
 URL: https://issues.apache.org/jira/browse/KAFKA-948
 Project: Kafka
  Issue Type: Bug
  Components: controller
Affects Versions: 0.8
Reporter: Dibyendu Bhattacharya
Assignee: Neha Narkhede


When the broker which is the leader for a partition is down, the ISR list in 
the LeaderAndISR path is updated. But if the broker , which is not a leader of 
the partition is down, the ISR list is not getting updated. This is an issues 
because ISR list contains the stale entry.

This issue I found in kafka-0.8.0-beta1-candidate1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-948) ISR list in LeaderAndISR path not updated for partitions when Broker (which is not leader) is down

2013-06-19 Thread Dibyendu Bhattacharya (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688871#comment-13688871
 ] 

Dibyendu Bhattacharya commented on KAFKA-948:
-

I have tested a Fix, which seems to be working fine.

For ReplicaStateMachine.scala, the handleStateChange method for offlineReplica 
case, the currLeaderIsrAndControllerEpoch.leaderAndIsr.isr contains the stale 
entry. A call to updateLeaderAndIsrCache() before will solve this issue. Shall 
I go ahead and make a pull request with this fix ?

Dibyendu

> ISR list in LeaderAndISR path not updated for partitions when Broker (which 
> is not leader) is down
> --
>
> Key: KAFKA-948
> URL: https://issues.apache.org/jira/browse/KAFKA-948
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8
>    Reporter: Dibyendu Bhattacharya
>Assignee: Neha Narkhede
>
> When the broker which is the leader for a partition is down, the ISR list in 
> the LeaderAndISR path is updated. But if the broker , which is not a leader 
> of the partition is down, the ISR list is not getting updated. This is an 
> issues because ISR list contains the stale entry.
> This issue I found in kafka-0.8.0-beta1-candidate1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira