[ 
https://issues.apache.org/jira/browse/KAFKA-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272492#comment-14272492
 ] 

jaikiran pai commented on KAFKA-1853:
-------------------------------------

Hi [~jkreps], you are right about the Java File instance being more of a 
name/path rather than a real opened/closed resource. I was aware of that part 
but I misread the stacktrace (from the user mailing list) and formed a wrong 
theory about how that leak could arise. So let me correct myself and explain 
what I think are 2 different issues:

1. A failed rename of FileMessageSet leaving it with an incorrect file name. 
The fix to that is straightforward and is what you already noted where you just 
do a check before updating the reference to the File instance

{code}
 def renameTo(f: File): Boolean = {
    val success = this.file.renameTo(f)
    if(success)
      this.file = f
    success
  }
{code}

2.  Cleaning up expired LogSegment(s) has an potential to leak (opened) 
FileChannel resources. I think this is what the user is seeing in that mail 
discussion. The problem, IMO, is in asyncDeleteSegment of Log 
(https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L728)
 which looks like this:
{code}
/**
* Perform an asynchronous delete on the given file if it exists (otherwise do 
nothing)
* @throws KafkaStorageException if the file can't be renamed and still exists
*/
private def asyncDeleteSegment(segment: LogSegment) {
        segment.changeFileSuffixes("", Log.DeletedFileSuffix)
        def deleteSeg() {
        info("Deleting segment %d from log %s.".format(segment.baseOffset, 
name))
        segment.delete()
        }
        scheduler.schedule("delete-file", deleteSeg, delay = 
config.fileDeleteDelayMs)
}
{code}
The first thing this call does is to rename the backing file of the LogSegment 
and after that it schedules it for a segment delete. The LogSegment delete 
involves internally involves a deletion of the file from the file system as 
well as closing the open FileChannel resource.

When the first step of renaming fails with an exception, the LogSegment 
deletion is never scheduled and that leaves the FileChannel open and leaking 
(since these expired segments will never be used anymore). Of course, the 
backing files too aren't deleted from the filesystem (but that's a relatively 
smaller problem). I think a way to fix this open FileChannel resource is would 
be something like this:

{code}
diff --git a/core/src/main/scala/kafka/log/Log.scala 
b/core/src/main/scala/kafka/log/Log.scala
index 024506c..35b0736 100644
--- a/core/src/main/scala/kafka/log/Log.scala
+++ b/core/src/main/scala/kafka/log/Log.scala
@@ -730,6 +730,9 @@ class Log(val dir: File,
    * @throws KafkaStorageException if the file can't be renamed and still 
exists 
    */
   private def asyncDeleteSegment(segment: LogSegment) {
+    // close the segment (resources) first since we no longer use the segment 
for read/writes
+    segment.close()
+    // mark the segment as deleted by renaming with an appropriate marker 
suffix
     segment.changeFileSuffixes("", Log.DeletedFileSuffix)
     def deleteSeg() {
       info("Deleting segment %d from log %s.".format(segment.baseOffset, name))
{code}
This change first intentionally closes the resources associated with the 
LogSegment, since that segment anyway has expired and is on the verge of being 
deleted. Once the open resource is closed, even if the rename fails or the real 
file deletion from the filesystem fails, we don't end up holding on to open 
resources. I've checked that a LogSegment.delete of an already closed 
LogSegment doesn't cause any problems.

I'll update the review board with this new patch so that you and others can 
review and add your thoughts.



> Unsuccessful suffix rename attempt of LogSegment can leak files and also 
> leave the LogSegment in an invalid state
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1853
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1853
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.1.1
>            Reporter: jaikiran pai
>             Fix For: 0.8.3
>
>
> As noted in this discussion in the user mailing list 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201501.mbox/%3C54AE3661.8080007%40gmail.com%3E
>  an unsuccessful attempt at renaming the underlying files of a LogSegment can 
> lead to file leaks and also leave the LogSegment in an invalid state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to