Re: Error while reading from hadoop sequence file

Stefan Richter Tue, 11 Dec 2018 03:15:35 -0800

Hi,

I am a bit confused by the explanation, the exception that you mentioned, is it 
happening in the first code snippet ( with the TypeInformation.of(…)) or the 
second one? From looking into the code, I would expect the exception can only 
happen in the second snippet (without TypeInformation) but I am also wondering 
what the exception is for the first snippet then, because from the code I think 
the exception cannot be the same but something different, see:


https://github.com/apache/flink/blob/70b2029f8a3d4ca2d3cb7bd7fddac9bb5b3e8f07/flink-java/src/main/java/org/apache/flink/api/java/ExecutionEnvironment.java#L551
 
<https://github.com/apache/flink/blob/70b2029f8a3d4ca2d3cb7bd7fddac9bb5b3e8f07/flink-java/src/main/java/org/apache/flink/api/java/ExecutionEnvironment.java#L551>

Vs

https://github.com/apache/flink/blob/70b2029f8a3d4ca2d3cb7bd7fddac9bb5b3e8f07/flink-java/src/main/java/org/apache/flink/api/java/ExecutionEnvironment.java#L577
 
<https://github.com/apache/flink/blob/70b2029f8a3d4ca2d3cb7bd7fddac9bb5b3e8f07/flink-java/src/main/java/org/apache/flink/api/java/ExecutionEnvironment.java#L577>

Can you please clarify? I would expect that it should work once you call the 
method and provide the type info, or else what exactly is the exception there.

Best,
Stefan

> On 10. Dec 2018, at 13:35, Akshay Mendole <akshaymend...@gmail.com> wrote:
> 
> Hi,
>    I have been facing issues while trying to read from a hdfs sequence file.
> 
> This is my code snippet
> DataSource<Tuple2<Text, Text>> input = env
>     .createInput(HadoopInputs.readSequenceFile(Text.class, Text.class, 
> ravenDataDir),
>         TypeInformation.of(new TypeHint<Tuple2<Text, Text>>() {
>         }));
> 
> Upon executing this in yarn cluster mode, I am getting following error
> The type returned by the input format could not be automatically determined. 
> Please specify the TypeInformation of the produced type explicitly by using 
> the 'createInput(InputFormat, TypeInformation)' method instead.
>       
> org.apache.flink.api.java.ExecutionEnvironment.createInput(ExecutionEnvironment.java:551)
>       flipkart.EnrichementFlink.main(EnrichementFlink.java:31)
> 
> 
> When I add the TypeInformation myself as follows, I run into the same issue.
> DataSource<Tuple2<Text, Text>> input = env
>     .createInput(HadoopInputs.readSequenceFile(Text.class, Text.class, 
> ravenDataDir));
> 
> 
> 
> When I add these libraries in the lib folder, 
> flink-hadoop-compatibility_2.11-1.7.0.jar
> 
> 
> the error changes to this
> 
> java.lang.NoClassDefFoundError: 
> org/apache/flink/api/common/typeutils/TypeSerializerSnapshot
>       at 
> org.apache.flink.api.java.typeutils.WritableTypeInfo.createSerializer(WritableTypeInfo.java:111)
>       at 
> org.apache.flink.api.java.typeutils.TupleTypeInfo.createSerializer(TupleTypeInfo.java:107)
>       at 
> org.apache.flink.api.java.typeutils.TupleTypeInfo.createSerializer(TupleTypeInfo.java:52)
>       at 
> org.apache.flink.optimizer.postpass.JavaApiPostPass.createSerializer(JavaApiPostPass.java:283)
>       at 
> org.apache.flink.optimizer.postpass.JavaApiPostPass.traverseChannel(JavaApiPostPass.java:252)
>       at 
> org.apache.flink.optimizer.postpass.JavaApiPostPass.traverse(JavaApiPostPass.java:97)
>       at 
> org.apache.flink.optimizer.postpass.JavaApiPostPass.postPass(JavaApiPostPass.java:81)
>       at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:527)
>       at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:399)
>       at 
> org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:379)
>       at 
> org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:906)
>       at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:473)
>       at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
> 
> 
> Can someone help me resolve this issue?
> 
> Thanks,
> Akshay
> 
> 
>

Re: Error while reading from hadoop sequence file

Reply via email to