Re: Correct way to reference Hadoop dependencies in Flink 1.4.0

lrao Wed, 14 Mar 2018 18:26:38 -0700

Hi,

I am running this on a Hadoop-free cluster (i.e. no YARN etc.). I have the 
following dependencies packaged in my user application JAR:


aws-java-sdk 1.7.4
flink-hadoop-fs 1.4.0
flink-shaded-hadoop2 1.4.0
flink-connector-filesystem_2.11 1.4.0
hadoop-common 2.7.4
hadoop-aws 2.7.4

I have also tried the following conf:
classloader.resolve-order: parent-first
fs.hdfs.hadoopconf: /srv/hadoop/hadoop-2.7.5/etc/hadoop

But no luck. Anything else I could be missing?

On 2018/03/14 18:57:47, Francesco Ciuci <francesco.ci...@gmail.com> wrote: 
> Hi,
> 
> You do not just need the hadoop dependencies in the jar but you need to
> have the hadoop file system running in your machine/cluster.
> 
> Regards
> 
> On 14 March 2018 at 18:38, l...@lyft.com <l...@lyft.com> wrote:
> 
> > I'm trying to use a BucketingSink to write files to S3 in my Flink job.
> >
> > I have the Hadoop dependencies I need packaged in my user application jar.
> > However, on running the job I get the following error (from the
> > taskmanager):
> >
> > java.lang.RuntimeException: Error while creating FileSystem when
> > initializing the state of the BucketingSink.
> >         at org.apache.flink.streaming.connectors.fs.bucketing.
> > BucketingSink.initializeState(BucketingSink.java:358)
> >         at org.apache.flink.streaming.util.functions.
> > StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
> >         at org.apache.flink.streaming.util.functions.
> > StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:
> > 160)
> >         at org.apache.flink.streaming.api.operators.
> > AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.
> > java:96)
> >         at org.apache.flink.streaming.api.operators.
> > AbstractStreamOperator.initializeState(AbstractStreamOperator.java:259)
> >         at org.apache.flink.streaming.runtime.tasks.StreamTask.
> > initializeOperators(StreamTask.java:694)
> >         at org.apache.flink.streaming.runtime.tasks.StreamTask.
> > initializeState(StreamTask.java:682)
> >         at org.apache.flink.streaming.runtime.tasks.StreamTask.
> > invoke(StreamTask.java:253)
> >         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
> >         at java.lang.Thread.run(Thread.java:748)
> > Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
> > Could not find a file system implementation for scheme 's3a'. The scheme is
> > not directly supported by Flink and no Hadoop file system to support this
> > scheme could be loaded.
> >         at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(
> > FileSystem.java:405)
> >         at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
> >         at org.apache.flink.streaming.connectors.fs.bucketing.
> > BucketingSink.createHadoopFileSystem(BucketingSink.java:1125)
> >         at org.apache.flink.streaming.connectors.fs.bucketing.
> > BucketingSink.initFileSystem(BucketingSink.java:411)
> >         at org.apache.flink.streaming.connectors.fs.bucketing.
> > BucketingSink.initializeState(BucketingSink.java:355)
> >         ... 9 common frames omitted
> > Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
> > Hadoop is not in the classpath/dependencies.
> >         at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(
> > UnsupportedSchemeFactory.java:64)
> >         at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(
> > FileSystem.java:401)
> >         ... 13 common frames omitted
> >
> > What's the right way to do this?
> >
>

Re: Correct way to reference Hadoop dependencies in Flink 1.4.0

Reply via email to