There's one more filesystem integration failure that I have found. My job
on a toy dataset succeeds, but Flink log contains the following message:
2016-04-07 18:10:01,339 ERROR
org.apache.flink.api.common.io.DelimitedInputFormat           - Unexpected
problen while getting the file statistics for file 's3://...':
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
java.lang.RuntimeException: java.lang.RuntimeException:
java.lang.ClassNotFoundException: Class
com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
        at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2227)
        at
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.getHadoopWrapperClassNameForFileSystem(HadoopFileSystem.java:460)
        at
org.apache.flink.core.fs.FileSystem.getHadoopWrapperClassNameForFileSystem(FileSystem.java:352)
        at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:280)
        at
org.apache.flink.api.common.io.DelimitedInputFormat.getStatistics(DelimitedInputFormat.java:293)
        at
org.apache.flink.api.common.io.DelimitedInputFormat.getStatistics(DelimitedInputFormat.java:45)
        at
org.apache.flink.optimizer.dag.DataSourceNode.computeOperatorSpecificDefaultEstimates(DataSourceNode.java:166)
        at
org.apache.flink.optimizer.dag.OptimizerNode.computeOutputEstimates(OptimizerNode.java:588)
        at
org.apache.flink.optimizer.traversals.IdAndEstimatesVisitor.postVisit(IdAndEstimatesVisitor.java:61)
        at
org.apache.flink.optimizer.traversals.IdAndEstimatesVisitor.postVisit(IdAndEstimatesVisitor.java:32)
        at
org.apache.flink.optimizer.dag.DataSourceNode.accept(DataSourceNode.java:250)
        at
org.apache.flink.optimizer.dag.SingleInputNode.accept(SingleInputNode.java:515)
        at
org.apache.flink.optimizer.dag.DataSinkNode.accept(DataSinkNode.java:248)
        at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:477)
        at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:398)
        at
org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:228)
        at
org.apache.flink.client.program.Client.getOptimizedPlan(Client.java:567)
        at
org.apache.flink.client.program.Client.runBlocking(Client.java:314)
        at
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:60)
        at
org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:855)
        at
org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:638)
        at
com.whitepages.data.flink.FaithResolution$.pipeline(FaithResolution.scala:100)
        at
com.whitepages.data.flink.FaithResolution$$anonfun$main$1.apply(FaithResolution.scala:39)
        at
com.whitepages.data.flink.FaithResolution$$anonfun$main$1.apply(FaithResolution.scala:39)
        at scala.Option.foreach(Option.scala:257)
        at
com.whitepages.data.flink.FaithResolution$.main(FaithResolution.scala:39)
        at
com.whitepages.data.flink.FaithResolution.main(FaithResolution.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:505)
        at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:403)
        at
org.apache.flink.client.program.Client.runBlocking(Client.java:248)
        at
org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:866)
        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:333)
        at
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1189)
        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1239)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
        at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
        at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2219)
        ... 37 more
Caused by: java.lang.ClassNotFoundException: Class
com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
        at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
        at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
        ... 38 more

I assume this may be a big problem if run on large datasets as there will
be no information for optimizer. I tried to change EMRFS to NativeS3
driver, but get the same error, which is surprising. I expected
NativeS3FileSystem to be in the classpath since it ships with Flink runtime.

Thanks,
Timur


On Wed, Apr 6, 2016 at 2:10 AM, Ufuk Celebi <u...@apache.org> wrote:

> Yes, for sure.
>
> I added some documentation for AWS here:
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/aws.html
>
> Would be nice to update that page with your pull request. :-)
>
> – Ufuk
>
>
> On Wed, Apr 6, 2016 at 4:58 AM, Chiwan Park <chiwanp...@apache.org> wrote:
> > Hi Timur,
> >
> > Great! Bootstrap action for Flink is good for AWS users. I think the
> bootstrap action scripts would be placed in `flink-contrib` directory.
> >
> > If you want, one of people in PMC of Flink will be assign FLINK-1337 to
> you.
> >
> > Regards,
> > Chiwan Park
> >
> >> On Apr 6, 2016, at 3:36 AM, Timur Fayruzov <timur.fairu...@gmail.com>
> wrote:
> >>
> >> I had a guide like that.
> >>
> >
>

Reply via email to