[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time

Haohui Mai (JIRA) Wed, 01 Feb 2017 14:37:33 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849048#comment-15849048
 ]


Haohui Mai commented on FLINK-5668:
-----------------------------------

Please allow me to fill in some of the contexts here.

The request is to have Flink support alternative filesystems (e.g., S3) in 
Flink-on-YARN so that our mission critical jobs can survive unavailability of 
HDFS. Flink-on-YARN still depends on the underlying distributed file systems to 
implement high availability and reliability requirements. This jira has no 
intentions of changing the current mechanisms in Flink.

You are right on that YARN itself depends on a distributed file system to 
function correctly. It works well with HDFS, but in general it also works with 
any filesystems that implement the `FileSystem` API in Hadoop. There are 
multiple deployments in production that run YARN on S3.

Essentially we would like to take the approach of FLINK-5631 in a more 
comprehensive way -- in many places the Flink-on-YARN implementation simply 
takes the default file system from YARN. In fact the {{Path}} objects specify 
the filesystem, it would be great to teach Flink to recognize the {{Path}} 
objects properly just as what FLINK-5631 has done, so that it becomes possible 
to run Flink-on-YARN on alternative filesystems such as S3.

Does it make sense to you [~StephanEwen]?

> Reduce dependency on HDFS at job startup time
> ---------------------------------------------
>
>                 Key: FLINK-5668
>                 URL: https://issues.apache.org/jira/browse/FLINK-5668
>             Project: Flink
>          Issue Type: Improvement
>          Components: YARN
>            Reporter: Bill Liu
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When create a Flink cluster on Yarn,  JobManager depends on  HDFS to share  
> taskmanager-conf.yaml  with TaskManager.
> It's better to share the taskmanager-conf.yaml  on JobManager Web server 
> instead of HDFS, which could reduce the HDFS dependency  at job startup.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time

Reply via email to