[ https://issues.apache.org/jira/browse/FLINK-19481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338418#comment-17338418 ]
Galen Warren commented on FLINK-19481: -------------------------------------- Hi all, I'm the author of the other [PR|https://github.com/apache/flink/pull/15599] that relates to Google Cloud Storage. [~xintongsong] has been working with me on this. The main goal of my PR is to add support for the RecoverableWriter interface, so that one can write to GCS via a StreamingFileSink. The file system support goes through the Hadoop stack, as noted above, using Google's [cloud storage connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage]. I have not personally had problems using the GCS connector and the Hadoop stack – it seems to write check/savepoints properly. I also use it to write job manager HA data to GCS, which seems to work fine. However, if we do want to support a native implementation in addition to the Hadoop-based one, we could approach it similarly to what has been done for S3, i.e. have a shared base project (flink-gs-fs-base?) and then projects for each of the implementations ( flink-gs-fs-hadoop and flink-gs-fs-native?). The recoverable-writer code could go into the shared project so that both of the implementations could use it (assuming that the native implementation doesn't already have a recoverable-writer implementation). I'll defer to the Flink experts on whether that's a worthwhile effort or not. At this point, from my perspective, it wouldn't be that much work to rework the project structure to support this. > Add support for a flink native GCS FileSystem > --------------------------------------------- > > Key: FLINK-19481 > URL: https://issues.apache.org/jira/browse/FLINK-19481 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem, FileSystems > Affects Versions: 1.12.0 > Reporter: Ben Augarten > Priority: Minor > Labels: auto-deprioritized-major > > Currently, GCS is supported but only by using the hadoop connector[1] > > The objective of this improvement is to add support for checkpointing to > Google Cloud Storage with the Flink File System, > > This would allow the `gs://` scheme to be used for savepointing and > checkpointing. Long term, it would be nice if we could use the GCS FileSystem > as a source and sink in flink jobs as well. > > Long term, I hope that implementing a flink native GCS FileSystem will > simplify usage of GCS because the hadoop FileSystem ends up bringing in many > unshaded dependencies. > > [1] > [https://github.com/GoogleCloudDataproc/hadoop-connectors|https://github.com/GoogleCloudDataproc/hadoop-connectors)] -- This message was sent by Atlassian Jira (v8.3.4#803005)