[jira] [Commented] (FLINK-19481) Add support for a flink native GCS FileSystem

Ben Augarten (Jira) Mon, 03 May 2021 09:10:10 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-19481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338455#comment-17338455
 ]


Ben Augarten commented on FLINK-19481:
--------------------------------------

Hey Robert and Galen, I appreciate you both weighing in. I got a chance to read 
through Galen's PR briefly and it does seem like it's mostly concerned with 
adding support for the RecoverableWriter interface, and their implementation of 
the RecoverableWriter interface does not have any explicit hadoop dependencies. 
So, it seems like their implementation would be useful with either a native or 
hadoop based implementation of the google cloud storage file system.

Our native implementation does have support for RecoverableWriter, but I didn't 
work directly on that and I don't believe it's being used in production right 
now. We've primarily been using our implementation for checkpointing, 
savepointing, and job graph storage.

The two paths forward I see are:

* As Galen proposed keep two separate implementations of the GCS FileSystem, 
one that goes through the hadoop stack and one that uses GCS SDKs, both using 
the shared RecoverableWriter implementations.
* Consolidate down to a native GCS FileSystem implementation, using Galen's 
implementation of the RecoverableWriter.

To me, the second option makes most sense, based on my experience as a user of 
flink and my general impression of the desire to move away from hadoop based 
file systems.

To accomplish that, I think that Galen should continue working on their MR. I 
can open another MR once theirs lands on master, or open an MR on their WIP. 
Though, I'd prefer waiting until outstanding discussions are resolved.

> Add support for a flink native GCS FileSystem
> ---------------------------------------------
>
>                 Key: FLINK-19481
>                 URL: https://issues.apache.org/jira/browse/FLINK-19481
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem, FileSystems
>    Affects Versions: 1.12.0
>            Reporter: Ben Augarten
>            Priority: Minor
>              Labels: auto-deprioritized-major
>
> Currently, GCS is supported but only by using the hadoop connector[1]
>  
> The objective of this improvement is to add support for checkpointing to 
> Google Cloud Storage with the Flink File System,
>  
> This would allow the `gs://` scheme to be used for savepointing and 
> checkpointing. Long term, it would be nice if we could use the GCS FileSystem 
> as a source and sink in flink jobs as well. 
>  
> Long term, I hope that implementing a flink native GCS FileSystem will 
> simplify usage of GCS because the hadoop FileSystem ends up bringing in many 
> unshaded dependencies.
>  
> [1] 
> [https://github.com/GoogleCloudDataproc/hadoop-connectors|https://github.com/GoogleCloudDataproc/hadoop-connectors)]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19481) Add support for a flink native GCS FileSystem

Reply via email to