The failure really means that the filesystem is configured incorrectly
(from the link). In such circumstances it is best to fail and let the
operations/admins know instead of working around it.

On 1/24/15 9:42 AM, "Jay Kreps" <jay.kr...@gmail.com> wrote:

>Hey guys,
>
>Jaikiran posted a patch on KAFKA-1853 to improve the handling of failures
>during delete.
>https://issues.apache.org/jira/browse/KAFKA-1853
>
>The core problem here is that we are doing File.rename() as part of the
>delete sequence which returns false if the rename failed. Or file delete
>sequence is something like the following:
>1. Remove the file from the index so no new reads can begin on it
>2. Rename the file to xyz.deleted so that if we crash it will get cleaned
>up
>3. Schedule a task to delete the file in 30 seconds or so when any
>in-progress reads have likely completed. The goal here is to avoid errors
>on in progress reads but also avoid locking on all reads.
>
>The question is what to do when rename fails? Previously if this happened
>we actually didn't pay attention and would fail to delete the file
>entirely. This patch changes it so that if the rename fails we log an
>error
>and force an immediate delete.
>
>I think this is the right thing to do, but I guess the real question is
>why
>would rename fail? Some possibilities:
>http://stackoverflow.com/questions/2372374/why-would-a-file-rename-fail-in
>-java
>
>An alternative would be to treat this as a filesystem error and shutdown
>as
>we do elsewhere.
>
>Thoughts?
>
>-Jay

Reply via email to