Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

Senthil Kumar Thu, 23 Jan 2020 11:10:31 -0800

Could you tell us how to turn on debug level logs?

We attempted this (on driver)


sudo stop hadoop-yarn-resourcemanager

followed the instructions here
https://stackoverflow.com/questions/27853974/how-to-set-debug-log-level-for-resourcemanager

and

sudo start hadoop-yarn-resourcemanager

but we still don’t see any debug level logs

Any further info is much appreciated!


From: Aaron Langford <aaron.langfor...@gmail.com>
Date: Tuesday, January 21, 2020 at 10:54 AM
To: Senthil Kumar <senthi...@vmware.com>
Cc: Yang Wang <danrtsey...@gmail.com>, "user@flink.apache.org" 
<user@flink.apache.org>
Subject: Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

Senthil,

One of the key steps in debugging this for me was enabling debug level logs on 
my cluster, and then looking at the logs in the resource manager. The failure 
you are after happens before the exceptions you have reported here. When your 
Flink application is starting, it will attempt to load various file system 
implementations. You can see which ones it successfully loaded when you have 
the debug level of logs configured. You will have to do some digging, but this 
is a good place to start. Try to discover if your application is indeed loading 
the s3 file system, or if that is not happening. You should be able to find the 
file system implementations that were loaded by searching for the string "Added 
file system".

Also, do you mind sharing the bootstrap action script that you are using to get 
the s3 file system in place?

Aaron

On Tue, Jan 21, 2020 at 8:39 AM Senthil Kumar 
<senthi...@vmware.com<mailto:senthi...@vmware.com>> wrote:
Yang, I appreciate your help! Please let me know if I can provide with any 
other info.

I resubmitted my executable jar file as a step to the flink EMR and here’s are 
all the  exceptions. I see two of them.

I fished them out of /var/log/Hadoop/<STEP-ID>/syslog


2020-01-21 16:31:37,587 ERROR 
org.apache.flink.streaming.runtime.tasks.StreamTask (Split Reader: Custom File 
Source -> Sink: Unnamed (11/16)): Error during di

sposal of stream operator.

java.lang.NullPointerException

        at 
org.apache.flink.streaming.api.functions.source.ContinuousFileReaderOperator.dispose(ContinuousFileReaderOperator.java:165)

        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:582)

        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:481)

        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)

        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)

        at java.lang.Thread.run(Thread.java:748)



2020-01-21 16:31:37,591 INFO org.apache.flink.runtime.taskmanager.Task (Split 
Reader: Custom File Source -> Sink: Unnamed (8/16)): Split Reader: Custom File 
Source -> Sink: Unnamed (8/16) (865a5a078c3e40cbe2583afeaa0c601e) switched from 
RUNNING to FAILED.

java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only 
supported for HDFS and for Hadoop version 2.7 or newer

        at 
org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)

        at 
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)

        at 
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)

        at 
org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)

        at 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink$RowFormatBuilder.createBuckets(StreamingFileSink.java:242)

        at 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.initializeState(StreamingFileSink.java:327)

        at 
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)

        at 
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)

        at 
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)

        at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:281)

        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:878)

        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:392)

        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)

        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)

        at java.lang.Thread.run(Thread.java:748)


From: Yang Wang <danrtsey...@gmail.com<mailto:danrtsey...@gmail.com>>
Date: Saturday, January 18, 2020 at 7:58 PM
To: Senthil Kumar <senthi...@vmware.com<mailto:senthi...@vmware.com>>
Cc: "user@flink.apache.org<mailto:user@flink.apache.org>" 
<user@flink.apache.org<mailto:user@flink.apache.org>>
Subject: Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

I think this exception is not because the hadoop version isn't high enough.
It seems that the "s3" URI scheme could not be recognized by 
`S3FileSystemFactory`. So it fallbacks to
the `HadoopFsFactory`.

Could you share the debug level jobmanager/taskmanger logs so that we could 
confirm whether the
classpath and FileSystem are loaded correctly.



Best,
Yang

Senthil Kumar <senthi...@vmware.com<mailto:senthi...@vmware.com>> 于2020年1月17日周五 
下午10:57写道：

Hello all,



Newbie here!



We are running in Amazon EMR with the following installed in the EMR Software 
Configuration

Hadoop 2.8.5

JupyterHub 1.0.0

Ganglia 3.7.2

Hive 2.3.6

Flink 1.9.0



I am trying to get a Streaming job from one S3 bucket into an another S3 bucket 
using the StreamingFileSink



I got the infamous exception:

Caused by: java.lang.UnsupportedOperationException: Recoverable writers on 
Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer



According to this, I needed to install flink-s3-fs-hadoop-1.9.0.jar in 
/usr/lib/flink/lib

https://stackoverflow.com/questions/55517566/amazon-emr-while-submitting-job-for-apache-flink-getting-error-with-hadoop-recov<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F55517566%2Famazon-emr-while-submitting-job-for-apache-flink-getting-error-with-hadoop-recov&data=02%7C01%7Csenthilku%40vmware.com%7Cabaac5f372264cf958d508d79e9af63a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637152260726359794&sdata=JVyu9uX4wJDY9COlE%2Bqk3ZIJ0qy6HPxCr68YfMMKglU%3D&reserved=0>



That did not work.



Further googling, revealed for Flink 1.9.0 and above:  (according to this)

https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.apache.org%2Fprojects%2Fflink%2Fflink-docs-stable%2Fops%2Ffilesystems%2F&data=02%7C01%7Csenthilku%40vmware.com%7Cabaac5f372264cf958d508d79e9af63a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637152260726359794&sdata=2KkpnnV8LpLXSflyaoP7AbL%2BE8gCNvjIxtI1BoI7d%2Fk%3D&reserved=0>



it seems that I need to install the jar file in the plugins directory 
(/usr/lib/flink/plugins/s3-fs-hadoop)



That did not work either.



At this point, I am not sure what to do and would appreciate some help!



Cheers

Kumar

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

Reply via email to