[ https://issues.apache.org/jira/browse/FLINK-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897209#comment-15897209 ]
Steve Loughran commented on FLINK-5706: --------------------------------------- If you look at where object stores are most trouble in the Hadoop code, it's making them pretend to be a filesystem, with things like rename & recursive delete, things which we developers expect to have specific behaviours and failure modes. For object stores, I'd consider moving away from an FS abstraction, and having something just for them: no directories, just patterns, a limited set of verbs, explicit notions of http headers, etc. Trouble there is that a lowest common denominator becomes a limiting factor multipart puts are the key to committers in S3, leases that for Azure, etc. Which is why HADOOP-9965 stalled. That and the lack of enthusiasm for a "new" storage API from projects downstream like Hive. Maybe it's time to revisit that. > Implement Flink's own S3 filesystem > ----------------------------------- > > Key: FLINK-5706 > URL: https://issues.apache.org/jira/browse/FLINK-5706 > Project: Flink > Issue Type: New Feature > Components: filesystem-connector > Reporter: Stephan Ewen > > As part of the effort to make Flink completely independent from Hadoop, Flink > needs its own S3 filesystem implementation. Currently Flink relies on > Hadoop's S3a and S3n file systems. > An own S3 file system can be implemented using the AWS SDK. As the basis of > the implementation, the Hadoop File System can be used (Apache Licensed, > should be okay to reuse some code as long as we do a proper attribution). -- This message was sent by Atlassian JIRA (v6.3.15#6346)