[ 
https://issues.apache.org/jira/browse/FLINK-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284476#comment-17284476
 ] 

Galen Warren commented on FLINK-11838:
--------------------------------------

{quote}Maybe for the first step, it's good enough to simply do all the 
composing and deleting at the end. We can try to optimize it later if we indeed 
see a performance problem in composing and deleting the temporary blobs.
{quote}
I'm fine to go either way here. I've already put something together locally 
that allows for composing both at persist and commit, but it would be simple to 
revert to just doing it at commit. Maybe you can take a look when we get to the 
code phase to see what you think? If it's not obvious which is better, I 
suppose we could also control that – "compose on persist" – via a Flink option.

Are you comfortable with the approach now? If so, I'll work on getting the code 
together in order to update the PR.

Thanks!

> Create RecoverableWriter for GCS
> --------------------------------
>
>                 Key: FLINK-11838
>                 URL: https://issues.apache.org/jira/browse/FLINK-11838
>             Project: Flink
>          Issue Type: New Feature
>          Components: Connectors / FileSystem
>    Affects Versions: 1.8.0
>            Reporter: Fokko Driesprong
>            Assignee: Galen Warren
>            Priority: Major
>              Labels: pull-request-available, usability
>             Fix For: 1.13.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> GCS supports the resumable upload which we can use to create a Recoverable 
> writer similar to the S3 implementation:
> https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload
> After using the Hadoop compatible interface: 
> https://github.com/apache/flink/pull/7519
> We've noticed that the current implementation relies heavily on the renaming 
> of the files on the commit: 
> https://github.com/apache/flink/blob/master/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L233-L259
> This is suboptimal on an object store such as GCS. Therefore we would like to 
> implement a more GCS native RecoverableWriter 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to