The failure really means that the filesystem is configured incorrectly (from the link). In such circumstances it is best to fail and let the operations/admins know instead of working around it.
On 1/24/15 9:42 AM, "Jay Kreps" <jay.kr...@gmail.com> wrote: >Hey guys, > >Jaikiran posted a patch on KAFKA-1853 to improve the handling of failures >during delete. >https://issues.apache.org/jira/browse/KAFKA-1853 > >The core problem here is that we are doing File.rename() as part of the >delete sequence which returns false if the rename failed. Or file delete >sequence is something like the following: >1. Remove the file from the index so no new reads can begin on it >2. Rename the file to xyz.deleted so that if we crash it will get cleaned >up >3. Schedule a task to delete the file in 30 seconds or so when any >in-progress reads have likely completed. The goal here is to avoid errors >on in progress reads but also avoid locking on all reads. > >The question is what to do when rename fails? Previously if this happened >we actually didn't pay attention and would fail to delete the file >entirely. This patch changes it so that if the rename fails we log an >error >and force an immediate delete. > >I think this is the right thing to do, but I guess the real question is >why >would rename fail? Some possibilities: >http://stackoverflow.com/questions/2372374/why-would-a-file-rename-fail-in >-java > >An alternative would be to treat this as a filesystem error and shutdown >as >we do elsewhere. > >Thoughts? > >-Jay