That looks pretty much like a bug. As you said, fwd fields annotations are optional and may improve the performance of a program, but never change its semantics (if set correctly).
I'll have a look at it later. Would be great if you could provide some data to reproduce the bug. On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <vasilikikala...@gmail.com> wrote: > Hello to my squirrels, > > I've been getting a NullPointerException for a DeltaIteration program I'm > trying to implement and I could really use your help :-) > It seems that some of the input Tuples of the Join operator that I'm using > to create the next workset / solution set delta are null. > It also seems that adding ForwardedFields annotations solves the issue. > > I managed to reproduce the behavior using the ConnectedComponents example, > by removing the "@ForwardedFieldsFirst("*")" annotation from > the ComponentIdFilter join. > The exception message is the following: > > Caused by: java.lang.NullPointerException > at > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186) > at > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1) > at > > org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198) > at > > org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) > at > > org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139) > at > > org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92) > at > > org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) > at > > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) > at java.lang.Thread.run(Thread.java:745) > > I get this error locally with any sufficiently big dataset (~10000 nodes). > When the annotation is in place, it works without problem. > I also generated the optimizer plans for the two cases: > - with annotation (working): > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b > - without annotation (failing): > https://gist.github.com/vasia/086faa45b980bf7f4c09 > > After visualizing the plans, the main difference I see is that in the > working case, the next workset node and the solution set delta nodes are > merged, while in the failing case they are separate. > > Shouldn't this work with and without annotation (but be more efficient with > the annotation in place)? Or am I missing something here? > > Thanks in advance for any help :)) > > Cheers, > - Vasia. >