Hi Piotr, Thanks for the proposal. It's meaningful to speed up the state download. I get into some questions:
1. What is the semantic of `canCopyPath`? Should it be associated with a specific destination path? e.g. It can be copied to local, but not to the remote FS. 2. Is the existing interface `DuplicatingFileSystem` feasible/enough for this case? 3. Will the interface extracting introduce a break change? Best, Zakelly On Thu, May 2, 2024 at 6:50 PM Aleksandr Pilipenko <z3d...@gmail.com> wrote: > Hi Piotr, > > Thanks for the proposal. > How adding a s5cmd will affect memory footprint? Since this is a native > binary, memory consumption will not be controlled by JVM or Flink. > > Thanks, > Aleksandr > > On Thu, 2 May 2024 at 11:12, Hong Liang <h...@apache.org> wrote: > > > Hi Piotr, > > > > Thanks for the FLIP! Nice to see work to improve the filesystem > > performance. +1 to future work to improve the upload speed as well. This > > would be useful for jobs with large state and high Async checkpointing > > times. > > > > Some thoughts on the configuration, it might be good for us to introduce > 2x > > points of configurability for future proofing: > > 1/ Configure the implementation of PathsCopyingFileSystem used, maybe by > > config, or by ServiceResources (this would allow us to use this for > > alternative clouds/Implement S3 SDKv2 support if we want this in the > > future). Also this could be used as a feature flag to determine if we > > should be using this new native file copy support. > > 2/ Configure the location of the s5cmd binary (version control etc.), as > > you have mentioned in the FLIP. > > > > Regards, > > Hong > > > > > > On Thu, May 2, 2024 at 9:40 AM Muhammet Orazov > > <mor+fl...@morazow.com.invalid> wrote: > > > > > Hey Piotr, > > > > > > Thanks for the proposal! It would be great improvement! > > > > > > Some questions from my side: > > > > > > > In order to configure s5cmd Flink’s user would need > > > > to specify path to the s5cmd binary. > > > > > > Could you please also add the configuration property > > > for this? An example showing how users would set this > > > parameter would be helpful. > > > > > > Would this affect any filesystem connectors that use > > > FileSystem[1][2] dependencies? > > > > > > Best, > > > Muhammet > > > > > > [1]: > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/s3/ > > > [2]: > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/ > > > > > > On 2024-04-30 13:15, Piotr Nowojski wrote: > > > > Hi all! > > > > > > > > I would like to put under discussion: > > > > > > > > FLIP-444: Native file copy support > > > > https://cwiki.apache.org/confluence/x/rAn9EQ > > > > > > > > This proposal aims to speed up Flink recovery times, by speeding up > > > > state > > > > download times. However in the future, the same mechanism could be > also > > > > used to speed up state uploading (checkpointing/savepointing). > > > > > > > > I'm curious to hear your thoughts. > > > > > > > > Best, > > > > Piotrek > > > > > >