[ 
https://issues.apache.org/jira/browse/FLINK-35232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840938#comment-17840938
 ] 

Vikas M commented on FLINK-35232:
---------------------------------

Thanks for the reply [~galenwarren].

In this case, we see errors from the `RecoverableWriter support leveraging the 
Google Java library directly`.

Sample error stack traces we see in our Flink deployment:


h4. Path 1: FileCommitter.commit -> 
GSRecoverableWriterCommitter.commitAfterRecovery
```
h4. com.google.cloud.storage.StorageException: Read timed out
    at 
com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:233)
    at 
com.google.cloud.storage.spi.v1.HttpStorageRpc.compose(HttpStorageRpc.java:625)
    at 
com.google.cloud.storage.StorageImpl.lambda$compose$20(StorageImpl.java:482)
    at 
com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
    at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
    at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
    at com.google.cloud.storage.Retrying.run(Retrying.java:51)
    at com.google.cloud.storage.StorageImpl.run(StorageImpl.java:1374)
    at com.google.cloud.storage.StorageImpl.compose(StorageImpl.java:480)
    at 
org.apache.flink.fs.gs.storage.GSBlobStorageImpl.compose(GSBlobStorageImpl.java:158)
    at 
org.apache.flink.fs.gs.writer.GSRecoverableWriterCommitter.composeBlobs(GSRecoverableWriterCommitter.java:158)
    at 
org.apache.flink.fs.gs.writer.GSRecoverableWriterCommitter.writeFinalBlob(GSRecoverableWriterCommitter.java:189)
    at 
org.apache.flink.fs.gs.writer.GSRecoverableWriterCommitter.commitAfterRecovery(GSRecoverableWriterCommitter.java:109)
    at 
org.apache.flink.streaming.api.functions.sink.filesystem.OutputStreamBasedPartFileWriter$OutputStreamBasedPendingFile.commitAfterRecovery(OutputStreamBasedPartFileWriter.java:350)
    at 
org.apache.flink.connector.file.sink.committer.FileCommitter.commit(FileCommitter.java:62)
```

Path 2: FileWriter.write
```
com.google.cloud.storage.StorageException: Read timed out
    at 
com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:233)
    at 
com.google.cloud.storage.spi.v1.HttpStorageRpc.open(HttpStorageRpc.java:949)
    at 
com.google.cloud.storage.ResumableMedia.lambda$null$0(ResumableMedia.java:37)
    at 
com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
    at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
    at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
    at com.google.cloud.storage.Retrying.run(Retrying.java:51)
    at 
com.google.cloud.storage.ResumableMedia.lambda$startUploadForBlobInfo$1(ResumableMedia.java:34)
    at 
com.google.cloud.storage.BlobWriteChannel$Builder.build(BlobWriteChannel.java:315)
    at com.google.cloud.storage.StorageImpl.writer(StorageImpl.java:577)
    at com.google.cloud.storage.StorageImpl.writer(StorageImpl.java:547)
    at com.google.cloud.storage.StorageImpl.writer(StorageImpl.java:95)
    at 
org.apache.flink.fs.gs.storage.GSBlobStorageImpl.writeBlob(GSBlobStorageImpl.java:64)
    at 
org.apache.flink.fs.gs.writer.GSRecoverableFsDataOutputStream.createWriteChannel(GSRecoverableFsDataOutputStream.java:229)
    at 
org.apache.flink.fs.gs.writer.GSRecoverableFsDataOutputStream.write(GSRecoverableFsDataOutputStream.java:152)
    at 
org.apache.flink.formats.parquet.PositionOutputStreamAdapter.write(PositionOutputStreamAdapter.java:58)

```

Similar to path1, we do see these errors in `FileWriter.prepareCommit` as well..

> Support for retry settings on GCS connector
> -------------------------------------------
>
>                 Key: FLINK-35232
>                 URL: https://issues.apache.org/jira/browse/FLINK-35232
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem
>    Affects Versions: 1.15.3, 1.16.2, 1.17.1, 1.19.0, 1.18.1
>            Reporter: Vikas M
>            Assignee: Ravi Singh
>            Priority: Major
>
> https://issues.apache.org/jira/browse/FLINK-32877 is tracking ability to 
> specify transport options in GCS connector. While setting the params enabled 
> here reduced read timeouts, we still see 503 errors leading to Flink job 
> restarts.
> Thus, in this ticket, we want to specify additional retry settings as noted 
> in 
> [https://cloud.google.com/storage/docs/retry-strategy#customize-retries.|https://cloud.google.com/storage/docs/retry-strategy#customize-retries]
> We want 
> [these|https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings#methods]
>  methods available for Flink users so that they can customize their 
> deployment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to