[ 
https://issues.apache.org/jira/browse/KAFKA-8526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16891170#comment-16891170
 ] 

ASF GitHub Bot commented on KAFKA-8526:
---------------------------------------

hachikuji commented on pull request #6969: KAFKA-8526: logdir fallback on 
getOrCreateLog
URL: https://github.com/apache/kafka/pull/6969
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Broker may select a failed dir for new replica even in the presence of other 
> live dirs
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-8526
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8526
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 1.1.1, 2.0.1, 2.1.1, 2.3.0, 2.2.1
>            Reporter: Anna Povzner
>            Assignee: Igor Soarez
>            Priority: Major
>
> Suppose a broker is configured with multiple log dirs. One of the log dirs 
> fails, but there is no load on that dir, so the broker does not know about 
> the failure yet, _i.e._, the failed dir is still in LogManager#_liveLogDirs. 
> Suppose a new topic gets created, and the controller chooses the broker with 
> failed log dir to host one of the replicas. The broker gets LeaderAndIsr 
> request with isNew flag set. LogManager#getOrCreateLog() selects a log dir 
> for the new replica from _liveLogDirs, then one two things can happen:
> 1) getAbsolutePath can fail, in which case getOrCreateLog will throw an 
> IOException
> 2) Creating directory for new the replica log may fail (_e.g._, if directory 
> becomes read-only, so getAbsolutePath worked). 
> In both cases, the selected dir will be marked offline (which is correct). 
> However, LeaderAndIsr will return an error and replica will be marked 
> offline, even though the broker may have other live dirs. 
> *Proposed solution*: Broker should retry selecting a dir for the new replica, 
> if initially selected dir threw an IOException when trying to create a 
> directory for the new replica. We should be able to do that in 
> LogManager#getOrCreateLog() method, but keep in mind that 
> logDirFailureChannel.maybeAddOfflineLogDir does not synchronously removes the 
> dir from _liveLogDirs. So, it makes sense to select initial dir by calling 
> LogManager#nextLogDir (current implementation), but if we fail to create log 
> on that dir, one approach is to select next dir from _liveLogDirs in 
> round-robin fashion (until we get to initial log dir – the case where all 
> dirs failed).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to