[ https://issues.apache.org/jira/browse/FLINK-33694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Pohl resolved FLINK-33694. ----------------------------------- Fix Version/s: 1.19.0 1.17.3 1.18.2 Assignee: Patrick Lucas Resolution: Fixed master: [a41229b24d82e8c561350c42d8a98dfb865c3f69|https://github.com/apache/flink/commit/a41229b24d82e8c561350c42d8a98dfb865c3f69] 1.18: [846ab49afd20ecf49fe76e18dd3e9b41143bf207|https://github.com/apache/flink/commit/846ab49afd20ecf49fe76e18dd3e9b41143bf207] 1.17: [257c526d6ae404f4598aeb2b9efa85674df2e6cd|https://github.com/apache/flink/commit/257c526d6ae404f4598aeb2b9efa85674df2e6cd] > GCS filesystem does not respect gs.storage.root.url config option > ----------------------------------------------------------------- > > Key: FLINK-33694 > URL: https://issues.apache.org/jira/browse/FLINK-33694 > Project: Flink > Issue Type: Bug > Components: FileSystems > Affects Versions: 1.18.0, 1.17.2 > Reporter: Patrick Lucas > Assignee: Patrick Lucas > Priority: Major > Labels: gcs, pull-request-available > Fix For: 1.19.0, 1.17.3, 1.18.2 > > > The GCS FileSystem's RecoverableWriter implementation uses the GCS SDK > directly rather than going through Hadoop. While support has been added to > configure credentials correctly based on the standard Hadoop implementation > configuration, no other options are passed through to the underlying client. > Because this only affects the RecoverableWriter-related codepaths, it can > result in very surprising differing behavior whether the FileSystem is being > used as a source or a sink—while a {{{}gs://{}}}-URI FileSource may work > fine, a {{{}gs://{}}}-URI FileSink may not work at all. > We use [fake-gcs-server|https://github.com/fsouza/fake-gcs-server] in > testing, and so we override the Hadoop GCS FileSystem config option > {{{}gs.storage.root.url{}}}. However, because this option is not considered > when creating the GCS client for the RecoverableWriter codepath, in a > FileSink the GCS FileSystem attempts to write to the real GCS service rather > than fake-gcs-server. At the same time, a FileSource works as expected, > reading from fake-gcs-server. > The fix should be fairly straightforward, reading the {{gs.storage.root.url}} > config option from the Hadoop FileSystem config in > [{{GSFileSystemOptions}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemOptions.java#L30] > and, if set, passing it to {{storageOptionsBuilder}} in > [{{GSFileSystemFactory}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemFactory.java]. > The only workaround for this is to build a custom flink-gs-fs-hadoop JAR with > a patch and use it as a plugin. -- This message was sent by Atlassian Jira (v8.20.10#820010)