[ 
https://issues.apache.org/jira/browse/FLINK-6115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938866#comment-15938866
 ] 

Fabian Hueske commented on FLINK-6115:
--------------------------------------

Hi Luke, I agree this is not expected behavior. 

However, it cannot be easily changed at the moment, because we cannot touch the 
serialization format. 
We need to ensure that serialized data can be deserialized across versions due 
to Flink's savepoint feature. There is some work going on to support serializer 
upgrades.
In addition tuples are internally used in many places because they are very 
efficient. 

Serializing POJOs is expensive because fields are read and written using Java 
Reflection. Some time ago was some work on code-generated POJO serializers 
which would have improved the situation.
{{Row}} is an alternative to tuples. It supports null values (by serializing a 
bitmask) and is not restricted to 25 fields (or 22 in case of Scala). 
However, they are not typed with generics, so you have to manually define 
{{RowTypeInfo}}.

Regarding the exception, I agree that the error message could be improved by 
pointing out that a null String array cannot be serialized.
Apart from the message, the stack trace shows that the serialization of a 
String array in a tuple failed.

Best, Fabian

> Need more helpful error message when trying to serialize a tuple with a null 
> field
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-6115
>                 URL: https://issues.apache.org/jira/browse/FLINK-6115
>             Project: Flink
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.0
>            Reporter: Luke Hutchison
>
> When Flink tries to serialize a tuple with a null field, you get the 
> following, which has no information about where in the program the problem 
> occurred (all the stack trace lines are in Flink, not in user code).
> {noformat}
> Exception in thread "main" 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>       at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900)
>       at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
>       at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
>       at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>       at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>       at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>       at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>       at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.IllegalArgumentException: The record must not be null.
>       at 
> org.apache.flink.api.common.typeutils.base.array.StringArraySerializer.serialize(StringArraySerializer.java:73)
>       at 
> org.apache.flink.api.common.typeutils.base.array.StringArraySerializer.serialize(StringArraySerializer.java:33)
>       at 
> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:124)
>       at 
> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:30)
>       at 
> org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:56)
>       at 
> org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:77)
>       at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.sendToTarget(RecordWriter.java:113)
>       at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:88)
>       at 
> org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
>       at 
> org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35)
>       at 
> org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:79)
>       at 
> org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35)
>       at 
> org.apache.flink.api.java.operators.translation.PlanFilterOperator$FlatMapFilter.flatMap(PlanFilterOperator.java:51)
>       at 
> org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:108)
>       at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:490)
>       at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:355)
>       at org.apache.flink.runtime.taskmanager.Task.run(Task.java:655)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The only thing I can tell from this is that it happened somewhere in a 
> flatMap (but I have dozens of them in my code). Surely there's a way to pull 
> out the source file name and line number from the program DAG node when 
> errors like this occur?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to