[ https://issues.apache.org/jira/browse/KAFKA-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272492#comment-14272492 ]
jaikiran pai commented on KAFKA-1853: ------------------------------------- Hi [~jkreps], you are right about the Java File instance being more of a name/path rather than a real opened/closed resource. I was aware of that part but I misread the stacktrace (from the user mailing list) and formed a wrong theory about how that leak could arise. So let me correct myself and explain what I think are 2 different issues: 1. A failed rename of FileMessageSet leaving it with an incorrect file name. The fix to that is straightforward and is what you already noted where you just do a check before updating the reference to the File instance {code} def renameTo(f: File): Boolean = { val success = this.file.renameTo(f) if(success) this.file = f success } {code} 2. Cleaning up expired LogSegment(s) has an potential to leak (opened) FileChannel resources. I think this is what the user is seeing in that mail discussion. The problem, IMO, is in asyncDeleteSegment of Log (https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L728) which looks like this: {code} /** * Perform an asynchronous delete on the given file if it exists (otherwise do nothing) * @throws KafkaStorageException if the file can't be renamed and still exists */ private def asyncDeleteSegment(segment: LogSegment) { segment.changeFileSuffixes("", Log.DeletedFileSuffix) def deleteSeg() { info("Deleting segment %d from log %s.".format(segment.baseOffset, name)) segment.delete() } scheduler.schedule("delete-file", deleteSeg, delay = config.fileDeleteDelayMs) } {code} The first thing this call does is to rename the backing file of the LogSegment and after that it schedules it for a segment delete. The LogSegment delete involves internally involves a deletion of the file from the file system as well as closing the open FileChannel resource. When the first step of renaming fails with an exception, the LogSegment deletion is never scheduled and that leaves the FileChannel open and leaking (since these expired segments will never be used anymore). Of course, the backing files too aren't deleted from the filesystem (but that's a relatively smaller problem). I think a way to fix this open FileChannel resource is would be something like this: {code} diff --git a/core/src/main/scala/kafka/log/Log.scala b/core/src/main/scala/kafka/log/Log.scala index 024506c..35b0736 100644 --- a/core/src/main/scala/kafka/log/Log.scala +++ b/core/src/main/scala/kafka/log/Log.scala @@ -730,6 +730,9 @@ class Log(val dir: File, * @throws KafkaStorageException if the file can't be renamed and still exists */ private def asyncDeleteSegment(segment: LogSegment) { + // close the segment (resources) first since we no longer use the segment for read/writes + segment.close() + // mark the segment as deleted by renaming with an appropriate marker suffix segment.changeFileSuffixes("", Log.DeletedFileSuffix) def deleteSeg() { info("Deleting segment %d from log %s.".format(segment.baseOffset, name)) {code} This change first intentionally closes the resources associated with the LogSegment, since that segment anyway has expired and is on the verge of being deleted. Once the open resource is closed, even if the rename fails or the real file deletion from the filesystem fails, we don't end up holding on to open resources. I've checked that a LogSegment.delete of an already closed LogSegment doesn't cause any problems. I'll update the review board with this new patch so that you and others can review and add your thoughts. > Unsuccessful suffix rename attempt of LogSegment can leak files and also > leave the LogSegment in an invalid state > ----------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-1853 > URL: https://issues.apache.org/jira/browse/KAFKA-1853 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.8.1.1 > Reporter: jaikiran pai > Fix For: 0.8.3 > > > As noted in this discussion in the user mailing list > http://mail-archives.apache.org/mod_mbox/kafka-users/201501.mbox/%3C54AE3661.8080007%40gmail.com%3E > an unsuccessful attempt at renaming the underlying files of a LogSegment can > lead to file leaks and also leave the LogSegment in an invalid state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)