[ 
https://issues.apache.org/jira/browse/KAFKA-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-2841:
-----------------------------------
    Description: 
If the coordinator receives a leaderAndIsr request which includes a higher 
leader epoch for one of the partitions that it owns, then it will reload the 
offset/metadata for that partition again. This can happen because the leader 
epoch is incremented for ISR changes which do not result in a new leader for 
the partition. Currently, the coordinator replaces cached metadata values 
blindly on reloading, which can result in weird behavior such as unexpected 
session timeouts or request timeouts while rebalancing.

To fix this, we need to check that the group being loaded has a higher 
generation than the cached value before replacing it. Also, if we have to 
replace a cached value (which shouldn't happen except when loading), we need to 
be very careful to ensure that any active delayed operations won't affect the 
group. 

  was:
If the coordinator receives a leaderAndIsr request which includes a higher 
leader epoch for one of the partitions that it owns, then it will reload the 
offset/metadata from the offsets topic again. This can happen because the 
leader epoch is incremented for ISR changes which do not result in a new leader 
for the partition. Currently, the coordinator replaces cached metadata values 
blindly on reloading, which can result in weird behavior such as unexpected 
session timeouts or request timeouts while rebalancing.

To fix this, we need to check that the group being loaded has a higher 
generation than the cached value before replacing it. Also, if we have to 
replace a cached value (which shouldn't happen except when loading), we need to 
be very careful to ensure that any active delayed operations won't affect the 
group. 


> Group metadata cache loading is not safe when reloading a partition
> -------------------------------------------------------------------
>
>                 Key: KAFKA-2841
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2841
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.9.0.0
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Blocker
>
> If the coordinator receives a leaderAndIsr request which includes a higher 
> leader epoch for one of the partitions that it owns, then it will reload the 
> offset/metadata for that partition again. This can happen because the leader 
> epoch is incremented for ISR changes which do not result in a new leader for 
> the partition. Currently, the coordinator replaces cached metadata values 
> blindly on reloading, which can result in weird behavior such as unexpected 
> session timeouts or request timeouts while rebalancing.
> To fix this, we need to check that the group being loaded has a higher 
> generation than the cached value before replacing it. Also, if we have to 
> replace a cached value (which shouldn't happen except when loading), we need 
> to be very careful to ensure that any active delayed operations won't affect 
> the group. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to