Having looked at the logs the user posted, I don't think this specific issue has to do with /tmp path.

However, now that the /tmp path is being discussed, I think it's a good idea that we default the Kafka logs to a certain folder. As Jay notes, it makes it very easy to just download and start the servers without having to fiddle with the configs when you are just starting out. Having said that, when I started out with Kafka, I found /tmp to be a odd place to default the path to. I expected them to be defaulted to a folder within the Kafka install. Somewhere like KAFKA_INSTALL_FOLDER/data/kafka-logs/ folder. Is that something we should do?

-Jaikiran
On Monday 26 January 2015 12:23 AM, Jay Kreps wrote:
Hmm, but I don't think tmp gets cleaned while the server is running...

The reason for using tmp was because we don't know which directory they
will use and we don't want them to have to edit configuration for the
simple "out of the box" getting started tutorial. I actually do think that
is important. Maybe an intermediate step we could do is just call out this
setting in the quickstart so people know where data is going and know they
need to configure it later...

-Jay

On Sun, Jan 25, 2015 at 9:32 AM, Joe Stein <joe.st...@stealth.ly> wrote:

This feels like another type of symptom from people using /tmp/ for their
logs.  Perosnally, I would rather use /mnt/data or something and if that
doesn't exist on their machine we can exception, or no default and force
set it.

/*******************************************
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop
********************************************/
On Jan 25, 2015 11:37 AM, "Jay Kreps" <jay.kr...@gmail.com> wrote:

I think you are right, good catch. It could be that this user deleted the
files manually, but I wonder if there isn't some way that is a Kafka
bug--e.g. if multiple types of retention policies kick in at the same
time
do we synchronize that properly?

-Jay

On Sat, Jan 24, 2015 at 9:26 PM, Jaikiran Pai <jai.forums2...@gmail.com>
wrote:

Hi Jay,

I spent some more time over this today and went back to the original
thread which brought up the issue with file leaks [1]. I think that
output
of lsof in that logs has a very important hint:

/home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_
topic_ypgsearch_yellowpageV2-0/00000000000068818668.log (deleted) java
8446 root 725u REG 253,2 536910838 26087364

/home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_
topic_ypgsearch_yellowpageV2-0/00000000000069457098.log (deleted) java
8446 root 726u REG 253,2 536917902 26087368

Notice the "(deleted)" text in that output. The last time I looked at
that
output, I thought it was the user who had added that "deleted" text to
help
us understand that problem. But today I read up on the output format of
lsof and it turns out that it's lsof which itself adds that hint
whenever a
file has already been deleted possibly by a different process but some
other process is still holding on to open resources of that (deleted)
file
[2].

So in the context of the issue that we are discussing and the way Kafka
deals with async deletes (i.e. first by attempting a rename of the
log/index files), I think this all makes sense now. So what I think is
happening is, some (other?) process (not sure what/why) has already
deleted
the log file that Kafka is using for the LogSegment. The LogSegment
however
still has open FileChannel resource on that deleted file (and that's
why
the open file descriptor is held on and shows up in that output). Now
Kafka, at some point in time, triggers an async delete of the
LogSegment,
which involves a file rename of that (already deleted) log file. The
rename
fails (because the original file path isn't there anymore). As a
result,
we
end up throwing that "failed to rename, KafkaStorageException" and thus
leave behind the open FileChannel to continue being open forever (till
the
Kafka program exits).

So I think we should:

1) Find what/why deletes that underlying log file(s). I'll add a reply
to
that original mail discussion asking the user if he can provide more
details.
2) Handle this case and close the FileChannel. The patch that's been
uploaded to review board https://reviews.apache.org/r/29755/ does
that.
The "immediate delete" on failure to rename, involves (safely) closing
the
open FileChannel and (safely) deleting the (possibly non-existent)
file.
By the way, this entire thing can be easily reproduced by running the
following program which first creates a file and open a filechannel to
that
file and then waits for the user to delete that file externally (I used
the
rm command) and then go and tries to rename that deleted file, which
then
fails. In between each of these steps, you can run the lsof command
externally to see the open file resources (I used 'lsof | grep
test.log'):
     public static void main(String[] args) throws Exception {
         // Open a file and file channel for read/write
         final File originalLogFile = new
File("/home/jaikiran/deleteme/test.log");
// change this path relevantly if you plan to run it
         final FileChannel fileChannel = new
RandomAccessFile(originalLogFile,
"rw").getChannel();
         System.out.println("Opened file channel to " +
originalLogFile);
         // wait for underlying file to be deleted externally
         System.out.println("Waiting for the " + originalLogFile + " to
be
deleted externally. Press any key after the file is deleted");
         System.in.read();
         // wait for the user to check the lsof output
         System.out.println(originalLogFile + " seems to have been
deleted
externally, check lsof command output to see open file resources.");
         System.out.println("Press any key to try renaming this already
deleted file, from the program");
         System.in.read();
         // try rename
         final File fileToRenameTo = new File(originalLogFile.getPath()
+
".deleted");
         System.out.println("Trying to rename " + originalLogFile + "
to "
+ fileToRenameTo);
         final boolean renamedSucceeded = originalLogFile.renameTo(
fileToRenameTo);
         if (renamedSucceeded) {
             System.out.println("Rename SUCCEEDED. Renamed file exists?
"
+
fileToRenameTo.exists());
         } else {
             System.out.println("FAILED to rename file " +
originalLogFile
+ " to " + fileToRenameTo);
         }
         // wait for the user to check the lsof output, after our rename
failed
         System.out.println("Check the lsof output and press any key to
close the open file channel to a deleted file");
         System.in.read();
         // close the file channel
         fileChannel.close();
         // let user check the lsof output one final time. This time he
won't see open file resources from this program
         System.out.println("File channel closed. Check the lsof output
and
press any key to terminate the program");
         System.in.read();
         // all done, exit
         System.out.println("Program will terminate");
     }



[1] http://mail-archives.apache.org/mod_mbox/kafka-users/
201501.mbox/%3CCAA4R6b-7gSbPp5_ebGpwYyNibDAwE_%2BwoE%
2BKbiMuU27-2j%2BLkg%40mail.gmail.com%3E
[2] http://unixhelp.ed.ac.uk/CGI/man-cgi?lsof+8


-Jaikiran

On Saturday 24 January 2015 11:12 PM, Jay Kreps wrote:

Hey guys,

Jaikiran posted a patch on KAFKA-1853 to improve the handling of
failures
during delete.
https://issues.apache.org/jira/browse/KAFKA-1853

The core problem here is that we are doing File.rename() as part of
the
delete sequence which returns false if the rename failed. Or file
delete
sequence is something like the following:
1. Remove the file from the index so no new reads can begin on it
2. Rename the file to xyz.deleted so that if we crash it will get
cleaned
up
3. Schedule a task to delete the file in 30 seconds or so when any
in-progress reads have likely completed. The goal here is to avoid
errors
on in progress reads but also avoid locking on all reads.

The question is what to do when rename fails? Previously if this
happened
we actually didn't pay attention and would fail to delete the file
entirely. This patch changes it so that if the rename fails we log an
error
and force an immediate delete.

I think this is the right thing to do, but I guess the real question
is
why would rename fail? Some possibilities:
http://stackoverflow.com/questions/2372374/why-would-a-
file-rename-fail-in-java

An alternative would be to treat this as a filesystem error and
shutdown
as we do elsewhere.

Thoughts?

-Jay



Reply via email to