Dockerised Flink 1.8 with Hadoop S3 FS support

Lorenzo Nicora Thu, 02 Jul 2020 03:06:09 -0700

Hi

I need to set up a dockerized *session cluster* using Flink *1.8.2* for
development and troubleshooting. We are bound to 1.8.2 as we are deploying
to AWS Kinesis Data Analytics for Flink.


I am using an image based on the semi-official flink:1.8-scala_2.11
I need to add to my dockerized cluster support for S3 Hadoop File System
(s3a://), we have on KDA out of the box.

Note I do not want to add dependencies to the job directly, as I want to
deploy locally exactly the same JAR I deploy to KDA.

Flink 1.8 docs [1] say  is supported out of the box but does not look to be
the case for dockerised version.
I am getting "Could not find a file system implementation for scheme 's3a'"
and "Hadoop is not in the classpath/dependencies".
I assume I need to create a customised docker image,
extending flink:1.8-scala_2.11, but I do not understand how to add support
for S3 Hadoop FS.

Can someone please point me in the right direction? Docs or examples?


[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/filesystems.html


Lorenzo

Dockerised Flink 1.8 with Hadoop S3 FS support

Reply via email to