[jira] [Commented] (FLINK-5706) Implement Flink's own S3 filesystem

Steve Loughran (JIRA) Mon, 06 Mar 2017 04:46:15 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897209#comment-15897209
 ]


Steve Loughran commented on FLINK-5706:
---------------------------------------

If you look at where object stores are most trouble in the Hadoop code, it's 
making them pretend to be a filesystem, with things like rename & recursive 
delete, things which we developers expect to have specific behaviours and 
failure modes.

For object stores, I'd consider moving away from an FS abstraction, and having 
something just for them: no directories, just patterns, a limited set of verbs, 
explicit notions of http headers, etc. Trouble there is that a lowest common 
denominator becomes a limiting factor multipart puts are the key to committers 
in S3, leases that for Azure, etc. Which is why HADOOP-9965 stalled. That and 
the lack of enthusiasm for a "new" storage API from projects downstream like 
Hive. Maybe it's time to revisit that.

> Implement Flink's own S3 filesystem
> -----------------------------------
>
>                 Key: FLINK-5706
>                 URL: https://issues.apache.org/jira/browse/FLINK-5706
>             Project: Flink
>          Issue Type: New Feature
>          Components: filesystem-connector
>            Reporter: Stephan Ewen
>
> As part of the effort to make Flink completely independent from Hadoop, Flink 
> needs its own S3 filesystem implementation. Currently Flink relies on 
> Hadoop's S3a and S3n file systems.
> An own S3 file system can be implemented using the AWS SDK. As the basis of 
> the implementation, the Hadoop File System can be used (Apache Licensed, 
> should be okay to reuse some code as long as we do a proper attribution).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5706) Implement Flink's own S3 filesystem

Reply via email to