[ 
https://issues.apache.org/jira/browse/FLINK-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735088#comment-15735088
 ] 

ASF GitHub Bot commented on FLINK-5300:
---------------------------------------

Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/2970
  
    I like the idea.
    I am wondering how expensive getting the array of `FileStatus` for all 
files in the directory is. HDFS in Hadoop 2 has the option to get a 
`ContentSummary` that has the number of files in a directory. I assume that 
this is more lightweight.
    
    We could extend Flink's FileSystem class to also offer something like that 
and then use that method.
    
    If we decide to not do that, it would be good to put the repeated logic for 
"delete if empty" into a utility function.


> FileStateHandle#discard & FsCheckpointStateOutputStream#close tries to delete 
> non-empty directory
> -------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-5300
>                 URL: https://issues.apache.org/jira/browse/FLINK-5300
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.2.0, 1.1.3
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Critical
>
> Flink's behaviour to delete {{FileStateHandles}} and closing 
> {{FsCheckpointStateOutputStream}} always triggers a delete operation on the 
> parent directory. Often this call will fail because the directory still 
> contains some other files.
> A user reported that the SRE of their Hadoop cluster noticed this behaviour 
> in the logs. It might be more system friendly if we first checked whether the 
> directory is empty or not. This would prevent many error message to appear in 
> the Hadoop logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to