I think that most packages already default log.dir to something more reasonable.
On Mon, Jan 26, 2015 at 1:06 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > Having a relative path and keeping data under /data in the kafka distro > would make sense. This would require some reworking of the shell scripts, > though, as I think right now you an actually run Kafka from any directory > and the cwd of the process will be whatever directory you start from. If we > have a relative path in the config then the working directory will HAVE to > be the kafka directory. This works for the simple download case but may > making some packaging stuff harder for other use cases. > > -Jay > > On Mon, Jan 26, 2015 at 5:54 AM, Jaikiran Pai <jai.forums2...@gmail.com> > wrote: > >> Having looked at the logs the user posted, I don't think this specific >> issue has to do with /tmp path. >> >> However, now that the /tmp path is being discussed, I think it's a good >> idea that we default the Kafka logs to a certain folder. As Jay notes, it >> makes it very easy to just download and start the servers without having to >> fiddle with the configs when you are just starting out. Having said that, >> when I started out with Kafka, I found /tmp to be a odd place to default >> the path to. I expected them to be defaulted to a folder within the Kafka >> install. Somewhere like KAFKA_INSTALL_FOLDER/data/kafka-logs/ folder. Is >> that something we should do? >> >> -Jaikiran >> >> On Monday 26 January 2015 12:23 AM, Jay Kreps wrote: >> >>> Hmm, but I don't think tmp gets cleaned while the server is running... >>> >>> The reason for using tmp was because we don't know which directory they >>> will use and we don't want them to have to edit configuration for the >>> simple "out of the box" getting started tutorial. I actually do think that >>> is important. Maybe an intermediate step we could do is just call out this >>> setting in the quickstart so people know where data is going and know they >>> need to configure it later... >>> >>> -Jay >>> >>> On Sun, Jan 25, 2015 at 9:32 AM, Joe Stein <joe.st...@stealth.ly> wrote: >>> >>> This feels like another type of symptom from people using /tmp/ for their >>>> logs. Perosnally, I would rather use /mnt/data or something and if that >>>> doesn't exist on their machine we can exception, or no default and force >>>> set it. >>>> >>>> /******************************************* >>>> Joe Stein >>>> Founder, Principal Consultant >>>> Big Data Open Source Security LLC >>>> http://www.stealth.ly >>>> Twitter: @allthingshadoop >>>> ********************************************/ >>>> On Jan 25, 2015 11:37 AM, "Jay Kreps" <jay.kr...@gmail.com> wrote: >>>> >>>> I think you are right, good catch. It could be that this user deleted >>>>> the >>>>> files manually, but I wonder if there isn't some way that is a Kafka >>>>> bug--e.g. if multiple types of retention policies kick in at the same >>>>> >>>> time >>>> >>>>> do we synchronize that properly? >>>>> >>>>> -Jay >>>>> >>>>> On Sat, Jan 24, 2015 at 9:26 PM, Jaikiran Pai <jai.forums2...@gmail.com >>>>> > >>>>> wrote: >>>>> >>>>> Hi Jay, >>>>>> >>>>>> I spent some more time over this today and went back to the original >>>>>> thread which brought up the issue with file leaks [1]. I think that >>>>>> >>>>> output >>>>> >>>>>> of lsof in that logs has a very important hint: >>>>>> >>>>>> /home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_ >>>>>> topic_ypgsearch_yellowpageV2-0/00000000000068818668.log (deleted) java >>>>>> 8446 root 725u REG 253,2 536910838 26087364 >>>>>> >>>>>> /home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_ >>>>>> topic_ypgsearch_yellowpageV2-0/00000000000069457098.log (deleted) java >>>>>> 8446 root 726u REG 253,2 536917902 26087368 >>>>>> >>>>>> Notice the "(deleted)" text in that output. The last time I looked at >>>>>> >>>>> that >>>>> >>>>>> output, I thought it was the user who had added that "deleted" text to >>>>>> >>>>> help >>>>> >>>>>> us understand that problem. But today I read up on the output format of >>>>>> lsof and it turns out that it's lsof which itself adds that hint >>>>>> >>>>> whenever a >>>>> >>>>>> file has already been deleted possibly by a different process but some >>>>>> other process is still holding on to open resources of that (deleted) >>>>>> >>>>> file >>>>> >>>>>> [2]. >>>>>> >>>>>> So in the context of the issue that we are discussing and the way Kafka >>>>>> deals with async deletes (i.e. first by attempting a rename of the >>>>>> log/index files), I think this all makes sense now. So what I think is >>>>>> happening is, some (other?) process (not sure what/why) has already >>>>>> >>>>> deleted >>>>> >>>>>> the log file that Kafka is using for the LogSegment. The LogSegment >>>>>> >>>>> however >>>>> >>>>>> still has open FileChannel resource on that deleted file (and that's >>>>>> >>>>> why >>>> >>>>> the open file descriptor is held on and shows up in that output). Now >>>>>> Kafka, at some point in time, triggers an async delete of the >>>>>> >>>>> LogSegment, >>>> >>>>> which involves a file rename of that (already deleted) log file. The >>>>>> >>>>> rename >>>>> >>>>>> fails (because the original file path isn't there anymore). As a >>>>>> >>>>> result, >>>> >>>>> we >>>>> >>>>>> end up throwing that "failed to rename, KafkaStorageException" and thus >>>>>> leave behind the open FileChannel to continue being open forever (till >>>>>> >>>>> the >>>>> >>>>>> Kafka program exits). >>>>>> >>>>>> So I think we should: >>>>>> >>>>>> 1) Find what/why deletes that underlying log file(s). I'll add a reply >>>>>> >>>>> to >>>> >>>>> that original mail discussion asking the user if he can provide more >>>>>> details. >>>>>> 2) Handle this case and close the FileChannel. The patch that's been >>>>>> uploaded to review board https://reviews.apache.org/r/29755/ does >>>>>> >>>>> that. >>>> >>>>> The "immediate delete" on failure to rename, involves (safely) closing >>>>>> >>>>> the >>>>> >>>>>> open FileChannel and (safely) deleting the (possibly non-existent) >>>>>> >>>>> file. >>>> >>>>> By the way, this entire thing can be easily reproduced by running the >>>>>> following program which first creates a file and open a filechannel to >>>>>> >>>>> that >>>>> >>>>>> file and then waits for the user to delete that file externally (I used >>>>>> >>>>> the >>>>> >>>>>> rm command) and then go and tries to rename that deleted file, which >>>>>> >>>>> then >>>> >>>>> fails. In between each of these steps, you can run the lsof command >>>>>> externally to see the open file resources (I used 'lsof | grep >>>>>> >>>>> test.log'): >>>>> >>>>>> public static void main(String[] args) throws Exception { >>>>>> // Open a file and file channel for read/write >>>>>> final File originalLogFile = new >>>>>> >>>>> File("/home/jaikiran/deleteme/test.log"); >>>>> >>>>>> // change this path relevantly if you plan to run it >>>>>> final FileChannel fileChannel = new >>>>>> >>>>> RandomAccessFile(originalLogFile, >>>>> >>>>>> "rw").getChannel(); >>>>>> System.out.println("Opened file channel to " + >>>>>> >>>>> originalLogFile); >>>> >>>>> // wait for underlying file to be deleted externally >>>>>> System.out.println("Waiting for the " + originalLogFile + " to >>>>>> >>>>> be >>>> >>>>> deleted externally. Press any key after the file is deleted"); >>>>>> System.in.read(); >>>>>> // wait for the user to check the lsof output >>>>>> System.out.println(originalLogFile + " seems to have been >>>>>> >>>>> deleted >>>> >>>>> externally, check lsof command output to see open file resources."); >>>>>> System.out.println("Press any key to try renaming this already >>>>>> deleted file, from the program"); >>>>>> System.in.read(); >>>>>> // try rename >>>>>> final File fileToRenameTo = new File(originalLogFile.getPath() >>>>>> >>>>> + >>>> >>>>> ".deleted"); >>>>>> System.out.println("Trying to rename " + originalLogFile + " >>>>>> >>>>> to " >>>> >>>>> + fileToRenameTo); >>>>>> final boolean renamedSucceeded = originalLogFile.renameTo( >>>>>> fileToRenameTo); >>>>>> if (renamedSucceeded) { >>>>>> System.out.println("Rename SUCCEEDED. Renamed file exists? >>>>>> >>>>> " >>>> >>>>> + >>>>> >>>>>> fileToRenameTo.exists()); >>>>>> } else { >>>>>> System.out.println("FAILED to rename file " + >>>>>> >>>>> originalLogFile >>>> >>>>> + " to " + fileToRenameTo); >>>>>> } >>>>>> // wait for the user to check the lsof output, after our >>>>>> rename >>>>>> failed >>>>>> System.out.println("Check the lsof output and press any key to >>>>>> close the open file channel to a deleted file"); >>>>>> System.in.read(); >>>>>> // close the file channel >>>>>> fileChannel.close(); >>>>>> // let user check the lsof output one final time. This time he >>>>>> won't see open file resources from this program >>>>>> System.out.println("File channel closed. Check the lsof output >>>>>> >>>>> and >>>>> >>>>>> press any key to terminate the program"); >>>>>> System.in.read(); >>>>>> // all done, exit >>>>>> System.out.println("Program will terminate"); >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> [1] http://mail-archives.apache.org/mod_mbox/kafka-users/ >>>>>> 201501.mbox/%3CCAA4R6b-7gSbPp5_ebGpwYyNibDAwE_%2BwoE% >>>>>> 2BKbiMuU27-2j%2BLkg%40mail.gmail.com%3E >>>>>> [2] http://unixhelp.ed.ac.uk/CGI/man-cgi?lsof+8 >>>>>> >>>>>> >>>>>> -Jaikiran >>>>>> >>>>>> On Saturday 24 January 2015 11:12 PM, Jay Kreps wrote: >>>>>> >>>>>> Hey guys, >>>>>>> >>>>>>> Jaikiran posted a patch on KAFKA-1853 to improve the handling of >>>>>>> >>>>>> failures >>>>> >>>>>> during delete. >>>>>>> https://issues.apache.org/jira/browse/KAFKA-1853 >>>>>>> >>>>>>> The core problem here is that we are doing File.rename() as part of >>>>>>> >>>>>> the >>>> >>>>> delete sequence which returns false if the rename failed. Or file >>>>>>> >>>>>> delete >>>> >>>>> sequence is something like the following: >>>>>>> 1. Remove the file from the index so no new reads can begin on it >>>>>>> 2. Rename the file to xyz.deleted so that if we crash it will get >>>>>>> >>>>>> cleaned >>>>> >>>>>> up >>>>>>> 3. Schedule a task to delete the file in 30 seconds or so when any >>>>>>> in-progress reads have likely completed. The goal here is to avoid >>>>>>> >>>>>> errors >>>>> >>>>>> on in progress reads but also avoid locking on all reads. >>>>>>> >>>>>>> The question is what to do when rename fails? Previously if this >>>>>>> >>>>>> happened >>>>> >>>>>> we actually didn't pay attention and would fail to delete the file >>>>>>> entirely. This patch changes it so that if the rename fails we log an >>>>>>> >>>>>> error >>>>> >>>>>> and force an immediate delete. >>>>>>> >>>>>>> I think this is the right thing to do, but I guess the real question >>>>>>> >>>>>> is >>>> >>>>> why would rename fail? Some possibilities: >>>>>>> http://stackoverflow.com/questions/2372374/why-would-a- >>>>>>> file-rename-fail-in-java >>>>>>> >>>>>>> An alternative would be to treat this as a filesystem error and >>>>>>> >>>>>> shutdown >>>> >>>>> as we do elsewhere. >>>>>>> >>>>>>> Thoughts? >>>>>>> >>>>>>> -Jay >>>>>>> >>>>>>> >>>>>>> >>