Angel Barragán created FLINK-16544:
--------------------------------------

             Summary: Flink FileSystem for web.uploadDir
                 Key: FLINK-16544
                 URL: https://issues.apache.org/jira/browse/FLINK-16544
             Project: Flink
          Issue Type: Improvement
          Components: API / Core
    Affects Versions: 1.10.0
            Reporter: Angel Barragán


Currently the configuration properties "web.upload.dir" and "web.upload.dir" 
only supports paths on the local filesystem. When we deploy Flink under another 
cluster environment like yarn, it is more useful to be able to configure those 
directories to be on HDFS, so the size and maintenance tasks are easier, than 
trying to find out on which node yarn has launched the Jobmanager task, and 
manage the upload directory there.

In my concrete case, I found this management (let's say disadvantage) creating 
an AWS EMR cluster with Flink, where the default configuration creates this 
directory under /tmp on the local filesystem of the CORE node where the 
JobManager is deployed by Yarn. We found that EMR cluster is also configured to 
fully empty /tmp on a month basis, removing the upload directory for Flink, and 
in that case makigng Flink to fail when you try to submit a new Job. We had to 
recreate the directory manually.

The first solution I tried is to change the above configuration properties to 
use hdfs like we did with configuration property "state.checkpoints.dir", and 
we found it doesn't work on yarn environment. So I checked Flink code to see 
how this configuration is being used and found it is the local file system.

I think, that this solution would be an improvement on the management for Flink 
when running on another Cluster environment where we can use a shared storage 
like HDFS or S3.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to