It's seems a good idea . use /tmp/kafka-log as default dir maybe delete by other people and very unsafe .
2015-01-29 14:26 GMT+08:00 Jaikiran Pai <jai.forums2...@gmail.com>: > I have created a JIRA for this proposed change https://issues.apache.org/ > jira/browse/KAFKA-1906 and uploaded a patch for review > https://reviews.apache.org/r/30403/ > > -Jaikiran > > On Tuesday 27 January 2015 02:36 AM, Jay Kreps wrote: > >> Having a relative path and keeping data under /data in the kafka distro >> would make sense. This would require some reworking of the shell scripts, >> though, as I think right now you an actually run Kafka from any directory >> and the cwd of the process will be whatever directory you start from. If >> we >> have a relative path in the config then the working directory will HAVE to >> be the kafka directory. This works for the simple download case but may >> making some packaging stuff harder for other use cases. >> >> -Jay >> >> On Mon, Jan 26, 2015 at 5:54 AM, Jaikiran Pai <jai.forums2...@gmail.com> >> wrote: >> >> Having looked at the logs the user posted, I don't think this specific >>> issue has to do with /tmp path. >>> >>> However, now that the /tmp path is being discussed, I think it's a good >>> idea that we default the Kafka logs to a certain folder. As Jay notes, it >>> makes it very easy to just download and start the servers without having >>> to >>> fiddle with the configs when you are just starting out. Having said that, >>> when I started out with Kafka, I found /tmp to be a odd place to default >>> the path to. I expected them to be defaulted to a folder within the Kafka >>> install. Somewhere like KAFKA_INSTALL_FOLDER/data/kafka-logs/ folder. Is >>> that something we should do? >>> >>> -Jaikiran >>> >>> On Monday 26 January 2015 12:23 AM, Jay Kreps wrote: >>> >>> Hmm, but I don't think tmp gets cleaned while the server is running... >>>> >>>> The reason for using tmp was because we don't know which directory they >>>> will use and we don't want them to have to edit configuration for the >>>> simple "out of the box" getting started tutorial. I actually do think >>>> that >>>> is important. Maybe an intermediate step we could do is just call out >>>> this >>>> setting in the quickstart so people know where data is going and know >>>> they >>>> need to configure it later... >>>> >>>> -Jay >>>> >>>> On Sun, Jan 25, 2015 at 9:32 AM, Joe Stein <joe.st...@stealth.ly> >>>> wrote: >>>> >>>> This feels like another type of symptom from people using /tmp/ for >>>> their >>>> >>>>> logs. Perosnally, I would rather use /mnt/data or something and if >>>>> that >>>>> doesn't exist on their machine we can exception, or no default and >>>>> force >>>>> set it. >>>>> >>>>> /******************************************* >>>>> Joe Stein >>>>> Founder, Principal Consultant >>>>> Big Data Open Source Security LLC >>>>> http://www.stealth.ly >>>>> Twitter: @allthingshadoop >>>>> ********************************************/ >>>>> On Jan 25, 2015 11:37 AM, "Jay Kreps" <jay.kr...@gmail.com> wrote: >>>>> >>>>> I think you are right, good catch. It could be that this user deleted >>>>> >>>>>> the >>>>>> files manually, but I wonder if there isn't some way that is a Kafka >>>>>> bug--e.g. if multiple types of retention policies kick in at the same >>>>>> >>>>>> time >>>>> >>>>> do we synchronize that properly? >>>>>> >>>>>> -Jay >>>>>> >>>>>> On Sat, Jan 24, 2015 at 9:26 PM, Jaikiran Pai < >>>>>> jai.forums2...@gmail.com >>>>>> wrote: >>>>>> >>>>>> Hi Jay, >>>>>> >>>>>>> I spent some more time over this today and went back to the original >>>>>>> thread which brought up the issue with file leaks [1]. I think that >>>>>>> >>>>>>> output >>>>>> >>>>>> of lsof in that logs has a very important hint: >>>>>>> >>>>>>> /home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_ >>>>>>> topic_ypgsearch_yellowpageV2-0/00000000000068818668.log (deleted) >>>>>>> java >>>>>>> 8446 root 725u REG 253,2 536910838 26087364 >>>>>>> >>>>>>> /home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_ >>>>>>> topic_ypgsearch_yellowpageV2-0/00000000000069457098.log (deleted) >>>>>>> java >>>>>>> 8446 root 726u REG 253,2 536917902 26087368 >>>>>>> >>>>>>> Notice the "(deleted)" text in that output. The last time I looked at >>>>>>> >>>>>>> that >>>>>> >>>>>> output, I thought it was the user who had added that "deleted" text >>>>>>> to >>>>>>> >>>>>>> help >>>>>> >>>>>> us understand that problem. But today I read up on the output format >>>>>>> of >>>>>>> lsof and it turns out that it's lsof which itself adds that hint >>>>>>> >>>>>>> whenever a >>>>>> >>>>>> file has already been deleted possibly by a different process but >>>>>>> some >>>>>>> other process is still holding on to open resources of that (deleted) >>>>>>> >>>>>>> file >>>>>> >>>>>> [2]. >>>>>>> >>>>>>> So in the context of the issue that we are discussing and the way >>>>>>> Kafka >>>>>>> deals with async deletes (i.e. first by attempting a rename of the >>>>>>> log/index files), I think this all makes sense now. So what I think >>>>>>> is >>>>>>> happening is, some (other?) process (not sure what/why) has already >>>>>>> >>>>>>> deleted >>>>>> >>>>>> the log file that Kafka is using for the LogSegment. The LogSegment >>>>>>> >>>>>>> however >>>>>> >>>>>> still has open FileChannel resource on that deleted file (and that's >>>>>>> >>>>>>> why >>>>>> the open file descriptor is held on and shows up in that output). Now >>>>>> >>>>>>> Kafka, at some point in time, triggers an async delete of the >>>>>>> >>>>>>> LogSegment, >>>>>> which involves a file rename of that (already deleted) log file. The >>>>>> rename >>>>>> >>>>>> fails (because the original file path isn't there anymore). As a >>>>>>> >>>>>>> result, >>>>>> we >>>>>> >>>>>> end up throwing that "failed to rename, KafkaStorageException" and >>>>>>> thus >>>>>>> leave behind the open FileChannel to continue being open forever >>>>>>> (till >>>>>>> >>>>>>> the >>>>>> >>>>>> Kafka program exits). >>>>>>> >>>>>>> So I think we should: >>>>>>> >>>>>>> 1) Find what/why deletes that underlying log file(s). I'll add a >>>>>>> reply >>>>>>> >>>>>>> to >>>>>> that original mail discussion asking the user if he can provide more >>>>>> >>>>>>> details. >>>>>>> 2) Handle this case and close the FileChannel. The patch that's been >>>>>>> uploaded to review board https://reviews.apache.org/r/29755/ does >>>>>>> >>>>>>> that. >>>>>> The "immediate delete" on failure to rename, involves (safely) closing >>>>>> the >>>>>> >>>>>> open FileChannel and (safely) deleting the (possibly non-existent) >>>>>>> >>>>>>> file. >>>>>> By the way, this entire thing can be easily reproduced by running the >>>>>> >>>>>>> following program which first creates a file and open a filechannel >>>>>>> to >>>>>>> >>>>>>> that >>>>>> >>>>>> file and then waits for the user to delete that file externally (I >>>>>>> used >>>>>>> >>>>>>> the >>>>>> >>>>>> rm command) and then go and tries to rename that deleted file, which >>>>>>> >>>>>>> then >>>>>> fails. In between each of these steps, you can run the lsof command >>>>>> >>>>>>> externally to see the open file resources (I used 'lsof | grep >>>>>>> >>>>>>> test.log'): >>>>>> >>>>>> public static void main(String[] args) throws Exception { >>>>>>> // Open a file and file channel for read/write >>>>>>> final File originalLogFile = new >>>>>>> >>>>>>> File("/home/jaikiran/deleteme/test.log"); >>>>>> >>>>>> // change this path relevantly if you plan to run it >>>>>>> final FileChannel fileChannel = new >>>>>>> >>>>>>> RandomAccessFile(originalLogFile, >>>>>> >>>>>> "rw").getChannel(); >>>>>>> System.out.println("Opened file channel to " + >>>>>>> >>>>>>> originalLogFile); >>>>>> // wait for underlying file to be deleted externally >>>>>> >>>>>>> System.out.println("Waiting for the " + originalLogFile + >>>>>>> " to >>>>>>> >>>>>>> be >>>>>> deleted externally. Press any key after the file is deleted"); >>>>>> >>>>>>> System.in.read(); >>>>>>> // wait for the user to check the lsof output >>>>>>> System.out.println(originalLogFile + " seems to have been >>>>>>> >>>>>>> deleted >>>>>> externally, check lsof command output to see open file resources."); >>>>>> >>>>>>> System.out.println("Press any key to try renaming this >>>>>>> already >>>>>>> deleted file, from the program"); >>>>>>> System.in.read(); >>>>>>> // try rename >>>>>>> final File fileToRenameTo = new >>>>>>> File(originalLogFile.getPath() >>>>>>> >>>>>>> + >>>>>> ".deleted"); >>>>>> >>>>>>> System.out.println("Trying to rename " + originalLogFile + >>>>>>> " >>>>>>> >>>>>>> to " >>>>>> + fileToRenameTo); >>>>>> >>>>>>> final boolean renamedSucceeded = originalLogFile.renameTo( >>>>>>> fileToRenameTo); >>>>>>> if (renamedSucceeded) { >>>>>>> System.out.println("Rename SUCCEEDED. Renamed file >>>>>>> exists? >>>>>>> >>>>>>> " >>>>>> + >>>>>> >>>>>> fileToRenameTo.exists()); >>>>>>> } else { >>>>>>> System.out.println("FAILED to rename file " + >>>>>>> >>>>>>> originalLogFile >>>>>> + " to " + fileToRenameTo); >>>>>> >>>>>>> } >>>>>>> // wait for the user to check the lsof output, after our >>>>>>> rename >>>>>>> failed >>>>>>> System.out.println("Check the lsof output and press any >>>>>>> key to >>>>>>> close the open file channel to a deleted file"); >>>>>>> System.in.read(); >>>>>>> // close the file channel >>>>>>> fileChannel.close(); >>>>>>> // let user check the lsof output one final time. This >>>>>>> time he >>>>>>> won't see open file resources from this program >>>>>>> System.out.println("File channel closed. Check the lsof >>>>>>> output >>>>>>> >>>>>>> and >>>>>> >>>>>> press any key to terminate the program"); >>>>>>> System.in.read(); >>>>>>> // all done, exit >>>>>>> System.out.println("Program will terminate"); >>>>>>> } >>>>>>> >>>>>>> >>>>>>> >>>>>>> [1] http://mail-archives.apache.org/mod_mbox/kafka-users/ >>>>>>> 201501.mbox/%3CCAA4R6b-7gSbPp5_ebGpwYyNibDAwE_%2BwoE% >>>>>>> 2BKbiMuU27-2j%2BLkg%40mail.gmail.com%3E >>>>>>> [2] http://unixhelp.ed.ac.uk/CGI/man-cgi?lsof+8 >>>>>>> >>>>>>> >>>>>>> -Jaikiran >>>>>>> >>>>>>> On Saturday 24 January 2015 11:12 PM, Jay Kreps wrote: >>>>>>> >>>>>>> Hey guys, >>>>>>> >>>>>>>> Jaikiran posted a patch on KAFKA-1853 to improve the handling of >>>>>>>> >>>>>>>> failures >>>>>>> during delete. >>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1853 >>>>>>>> >>>>>>>> The core problem here is that we are doing File.rename() as part of >>>>>>>> >>>>>>>> the >>>>>>> >>>>>> delete sequence which returns false if the rename failed. Or file >>>>>> >>>>>>> delete >>>>>>> >>>>>> sequence is something like the following: >>>>>> >>>>>>> 1. Remove the file from the index so no new reads can begin on it >>>>>>>> 2. Rename the file to xyz.deleted so that if we crash it will get >>>>>>>> >>>>>>>> cleaned >>>>>>> up >>>>>>> >>>>>>>> 3. Schedule a task to delete the file in 30 seconds or so when any >>>>>>>> in-progress reads have likely completed. The goal here is to avoid >>>>>>>> >>>>>>>> errors >>>>>>> on in progress reads but also avoid locking on all reads. >>>>>>> >>>>>>>> The question is what to do when rename fails? Previously if this >>>>>>>> >>>>>>>> happened >>>>>>> we actually didn't pay attention and would fail to delete the file >>>>>>> >>>>>>>> entirely. This patch changes it so that if the rename fails we log >>>>>>>> an >>>>>>>> >>>>>>>> error >>>>>>> and force an immediate delete. >>>>>>> >>>>>>>> I think this is the right thing to do, but I guess the real question >>>>>>>> >>>>>>>> is >>>>>>> >>>>>> why would rename fail? Some possibilities: >>>>>> >>>>>>> http://stackoverflow.com/questions/2372374/why-would-a- >>>>>>>> file-rename-fail-in-java >>>>>>>> >>>>>>>> An alternative would be to treat this as a filesystem error and >>>>>>>> >>>>>>>> shutdown >>>>>>> >>>>>> as we do elsewhere. >>>>>> >>>>>>> Thoughts? >>>>>>>> >>>>>>>> -Jay >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> > -- long is the way and hard that out of Hell leads up to light