Hi Tamir, Thanks for providing the information. I don't know of a current solution right now, perhaps some other user has an idea, but I do find your input valuable for future improvements with regards to the S3 Client in Hadoop.
Best regards, Martijn On Fri, 19 Nov 2021 at 09:21, Tamir Sagi <tamir.s...@niceactimize.com> wrote: > Hey Martijn, > > sorry for late respond. > > We wanted to replace the default client with our custom S3 client and not > use the AmazonS3Client provided by the plugin. > > We used Flink-s3-fs-hadoop v1.12.2 and for our needs we had to upgrade to > v1.14.0 [1]. > > AmazonS3 client factory is initialized[2] - if the property > "fs.s3a.s3.client.factory.impl" [3] is not provided the default factory is > created [4] which provides AmazonS3Client - which does not support what we > need. > I know that both the property and the factory interface are annotated with > > @InterfaceAudience.Private > @InterfaceStability.Unstable > from very early version. > > but we found this solution cleaner than extend the whole class and > override the #setAmazonS3Client method. > > Bottom line, all we had to do was to create our own implementation for > S3ClientFactory interface [5] > and add to flink-conf.yaml : s3.s3.client.factory.impl: <our factory > canonical name> . > place both the plugin and our artifact(with Factory and client impl) under > ${FLINK_HOME}/plugins/s3 > > One important note: Flink-s3-fs-hadoop plugin includes the whole > com.amazonaws.s3 source code, to avoid plugin class loader issues, we > needed to remove the aws-s3-java-sdk dependency and provide the plugin > dependency with scope "provided". > If the jobs needs to do some work with S3,then shading com.amazonaws was > also necessary. > > [1] > https://mvnrepository.com/artifact/org.apache.flink/flink-s3-fs-hadoop/1.14.0 > > [2] > https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L264-L266 > > [3] > https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java#L366-L369 > > [4] > https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java#L66 > > [5] > https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ClientFactory.java > > Best, > Tamir. > > ------------------------------ > *From:* Martijn Visser <mart...@ververica.com> > *Sent:* Wednesday, October 13, 2021 8:28 PM > *To:* Tamir Sagi <tamir.s...@niceactimize.com>; user@flink.apache.org < > user@flink.apache.org> > *Subject:* Re: Replacing S3 Client in Hadoop plugin > > > *EXTERNAL EMAIL* > > > Hi, > > Could you elaborate on why you would like to replace the S3 client? > > Best regards, > > Martijn > > On Wed, 13 Oct 2021 at 17:18, Tamir Sagi <tamir.s...@niceactimize.com> > wrote: > > I found the dependency > > <dependency> > <groupId>org.apache.hadoop</groupId> > <artifactId>hadoop-aws</artifactId> > <version>3.3.1</version> > </dependency> > > apparently its possible, there is a method > setAmazonS3Client > > I think I found the solution. > > Thanks. > > Tamir. > > ------------------------------ > *From:* Tamir Sagi <tamir.s...@niceactimize.com> > *Sent:* Wednesday, October 13, 2021 5:44 PM > *To:* user@flink.apache.org <user@flink.apache.org> > *Subject:* Replacing S3 Client in Hadoop plugin > > Hey community. > > I would like to know if there is any way to replace the S3 client in > Hadoop plugin[1] to a custom client(AmazonS3). > > I did notice that Hadoop plugin supports replacing the implementation of > S3AFileSystem using > "fs.s3a.impl" (in flink-conf.yaml it will be "s3.impl") but not the client > itself [2] > > <property> > <name>fs.s3a.impl</name> > <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value> > <description>The implementation class of the S3A Filesystem</description> > </property> > > I delved into Hadoop plugin source code [3] , the Client itself is of type > AmazonS3Client and cannot be replaced (for example) with a client of > type AmazonS3EncryptionV2. > > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins > Amazon S3 | Apache Flink > <https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins> > Entropy injection for S3 file systems # The bundled S3 file systems > (flink-s3-fs-presto and flink-s3-fs-hadoop) support entropy > injection.Entropy injection is a technique to improve the scalability of > AWS S3 buckets through adding some random characters near the beginning of > the key. > ci.apache.org > > [2] > https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html > <https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A> > [3] > https://github.com/apache/hadoop/blob/master/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java > > Hadoop-AWS module: Integration with Amazon Web Services - Apache Hadoop > <https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#:~:text=property%3E%0A%0A%3Cproperty%3E%0A%20%20%3Cname%3E-,fs.s3a.impl,-%3C/name%3E%0A%20%20%3Cvalue%3Eorg> > Overview. Apache Hadoop’s hadoop-aws module provides support for AWS > integration. applications to easily use this support.. To include the S3A > client in Apache Hadoop’s default classpath: Make sure > thatHADOOP_OPTIONAL_TOOLS in hadoop-env.sh includes hadoop-aws in its list > of optional modules to add in the classpath.. For client side interaction, > you can declare that relevant JARs must be ... > hadoop.apache.org > Thank you, > > Best, > Tamir. > > > Confidentiality: This communication and any attachments are intended for > the above-named persons only and may be confidential and/or legally > privileged. Any opinions expressed in this communication are not > necessarily those of NICE Actimize. If this communication has come to you > in error you must take no action based on it, nor must you copy or show it > to anyone; please delete/destroy and inform the sender by e-mail > immediately. > Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. > Viruses: Although we have taken steps toward ensuring that this e-mail and > attachments are free from any virus, we advise that in keeping with good > computing practice the recipient should ensure they are actually virus free. > > > Confidentiality: This communication and any attachments are intended for > the above-named persons only and may be confidential and/or legally > privileged. Any opinions expressed in this communication are not > necessarily those of NICE Actimize. If this communication has come to you > in error you must take no action based on it, nor must you copy or show it > to anyone; please delete/destroy and inform the sender by e-mail > immediately. > Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. > Viruses: Although we have taken steps toward ensuring that this e-mail and > attachments are free from any virus, we advise that in keeping with good > computing practice the recipient should ensure they are actually virus free. > > > Confidentiality: This communication and any attachments are intended for > the above-named persons only and may be confidential and/or legally > privileged. Any opinions expressed in this communication are not > necessarily those of NICE Actimize. If this communication has come to you > in error you must take no action based on it, nor must you copy or show it > to anyone; please delete/destroy and inform the sender by e-mail > immediately. > Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. > Viruses: Although we have taken steps toward ensuring that this e-mail and > attachments are free from any virus, we advise that in keeping with good > computing practice the recipient should ensure they are actually virus free. >