[ https://issues.apache.org/jira/browse/FLINK-6115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948412#comment-15948412 ]
Luke Hutchison commented on FLINK-6115: --------------------------------------- [~greghogan]: fang yong's comment exactly illustrates the point: whether or not null values are efficient, and whether or not they were a good idea to add to Java in the first place, they are a fundamental part of the language _that is used, today, by millions of programmers, for reasons that go well beyond cases where something went wrong_ -- and not supporting this usage will only ever be a major point of surprise and pain for users of Flink. This is made worse by the fact that code that doesn't crash on small test datasets will start to crash on larger datasets, or in production, once things start to get serialized, as sounds like was experienced by fan yong. It's already an error in Flink to use non-serializable types in parts of the computation graph where objects can be serialized. This is statically checked, for exactly the reason that Flink should not simply find out about the need to serialize a non-serializable object at runtime. Similarly, Flink should not be discovering non-serializable values (nulls) within a serializable object only at runtime. If you strongly believe that tuples should never support nulls, then you should not allow tuple types to be used anywhere a serializable type is required, and enforce this in the computation graph builder/analyzer before graph execution even begins. Of course this will never happen, because tuples are too fundamental and useful -- but so is {{null}}. Ergo, tuples must support nulls. Sorry to belabor the point. > Need more helpful error message when trying to serialize a tuple with a null > field > ---------------------------------------------------------------------------------- > > Key: FLINK-6115 > URL: https://issues.apache.org/jira/browse/FLINK-6115 > Project: Flink > Issue Type: Bug > Components: Core > Affects Versions: 1.2.0 > Reporter: Luke Hutchison > > When Flink tries to serialize a tuple with a null field, you get the > following, which has no information about where in the program the problem > occurred (all the stack trace lines are in Flink, not in user code). > {noformat} > Exception in thread "main" > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.lang.IllegalArgumentException: The record must not be null. > at > org.apache.flink.api.common.typeutils.base.array.StringArraySerializer.serialize(StringArraySerializer.java:73) > at > org.apache.flink.api.common.typeutils.base.array.StringArraySerializer.serialize(StringArraySerializer.java:33) > at > org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:124) > at > org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:30) > at > org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:56) > at > org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:77) > at > org.apache.flink.runtime.io.network.api.writer.RecordWriter.sendToTarget(RecordWriter.java:113) > at > org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:88) > at > org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65) > at > org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35) > at > org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:79) > at > org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35) > at > org.apache.flink.api.java.operators.translation.PlanFilterOperator$FlatMapFilter.flatMap(PlanFilterOperator.java:51) > at > org.apache.flink.runtime.operators.FlatMapDriver.run(FlatMapDriver.java:108) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:490) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:355) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:655) > at java.lang.Thread.run(Thread.java:745) > {noformat} > The only thing I can tell from this is that it happened somewhere in a > flatMap (but I have dozens of them in my code). Surely there's a way to pull > out the source file name and line number from the program DAG node when > errors like this occur? -- This message was sent by Atlassian JIRA (v6.3.15#6346)