[
https://issues.apache.org/jira/browse/KAFKA-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272492#comment-14272492
]
jaikiran pai commented on KAFKA-1853:
-------------------------------------
Hi [~jkreps], you are right about the Java File instance being more of a
name/path rather than a real opened/closed resource. I was aware of that part
but I misread the stacktrace (from the user mailing list) and formed a wrong
theory about how that leak could arise. So let me correct myself and explain
what I think are 2 different issues:
1. A failed rename of FileMessageSet leaving it with an incorrect file name.
The fix to that is straightforward and is what you already noted where you just
do a check before updating the reference to the File instance
{code}
def renameTo(f: File): Boolean = {
val success = this.file.renameTo(f)
if(success)
this.file = f
success
}
{code}
2. Cleaning up expired LogSegment(s) has an potential to leak (opened)
FileChannel resources. I think this is what the user is seeing in that mail
discussion. The problem, IMO, is in asyncDeleteSegment of Log
(https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L728)
which looks like this:
{code}
/**
* Perform an asynchronous delete on the given file if it exists (otherwise do
nothing)
* @throws KafkaStorageException if the file can't be renamed and still exists
*/
private def asyncDeleteSegment(segment: LogSegment) {
segment.changeFileSuffixes("", Log.DeletedFileSuffix)
def deleteSeg() {
info("Deleting segment %d from log %s.".format(segment.baseOffset,
name))
segment.delete()
}
scheduler.schedule("delete-file", deleteSeg, delay =
config.fileDeleteDelayMs)
}
{code}
The first thing this call does is to rename the backing file of the LogSegment
and after that it schedules it for a segment delete. The LogSegment delete
involves internally involves a deletion of the file from the file system as
well as closing the open FileChannel resource.
When the first step of renaming fails with an exception, the LogSegment
deletion is never scheduled and that leaves the FileChannel open and leaking
(since these expired segments will never be used anymore). Of course, the
backing files too aren't deleted from the filesystem (but that's a relatively
smaller problem). I think a way to fix this open FileChannel resource is would
be something like this:
{code}
diff --git a/core/src/main/scala/kafka/log/Log.scala
b/core/src/main/scala/kafka/log/Log.scala
index 024506c..35b0736 100644
--- a/core/src/main/scala/kafka/log/Log.scala
+++ b/core/src/main/scala/kafka/log/Log.scala
@@ -730,6 +730,9 @@ class Log(val dir: File,
* @throws KafkaStorageException if the file can't be renamed and still
exists
*/
private def asyncDeleteSegment(segment: LogSegment) {
+ // close the segment (resources) first since we no longer use the segment
for read/writes
+ segment.close()
+ // mark the segment as deleted by renaming with an appropriate marker
suffix
segment.changeFileSuffixes("", Log.DeletedFileSuffix)
def deleteSeg() {
info("Deleting segment %d from log %s.".format(segment.baseOffset, name))
{code}
This change first intentionally closes the resources associated with the
LogSegment, since that segment anyway has expired and is on the verge of being
deleted. Once the open resource is closed, even if the rename fails or the real
file deletion from the filesystem fails, we don't end up holding on to open
resources. I've checked that a LogSegment.delete of an already closed
LogSegment doesn't cause any problems.
I'll update the review board with this new patch so that you and others can
review and add your thoughts.
> Unsuccessful suffix rename attempt of LogSegment can leak files and also
> leave the LogSegment in an invalid state
> -----------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-1853
> URL: https://issues.apache.org/jira/browse/KAFKA-1853
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.8.1.1
> Reporter: jaikiran pai
> Fix For: 0.8.3
>
>
> As noted in this discussion in the user mailing list
> http://mail-archives.apache.org/mod_mbox/kafka-users/201501.mbox/%3C54AE3661.8080007%40gmail.com%3E
> an unsuccessful attempt at renaming the underlying files of a LogSegment can
> lead to file leaks and also leave the LogSegment in an invalid state.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)