Hi Fabian,

I am using the dblp co-authorship dataset from SNAP:
http://snap.stanford.edu/data/com-DBLP.html
I also pushed my slightly modified version of ConnectedComponents, here:
https://github.com/vasia/flink/tree/cc-test. It basically generates the
vertex dataset from the edges, so that you don't need to create it
separately.
The annotation that creates the error is in line #172.

Thanks a lot :))

-Vasia.


On 3 April 2015 at 13:09, Fabian Hueske <fhue...@gmail.com> wrote:

> That looks pretty much like a bug.
>
> As you said, fwd fields annotations are optional and may improve the
> performance of a program, but never change its semantics (if set
> correctly).
>
> I'll have a look at it later.
> Would be great if you could provide some data to reproduce the bug.
> On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <vasilikikala...@gmail.com>
> wrote:
>
> > Hello to my squirrels,
> >
> > I've been getting a NullPointerException for a DeltaIteration program I'm
> > trying to implement and I could really use your help :-)
> > It seems that some of the input Tuples of the Join operator that I'm
> using
> > to create the next workset / solution set delta are null.
> > It also seems that adding ForwardedFields annotations solves the issue.
> >
> > I managed to reproduce the behavior using the ConnectedComponents
> example,
> > by removing the "@ForwardedFieldsFirst("*")" annotation from
> > the ComponentIdFilter join.
> > The exception message is the following:
> >
> > Caused by: java.lang.NullPointerException
> > at
> >
> >
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186)
> > at
> >
> >
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1)
> > at
> >
> >
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198)
> > at
> >
> >
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
> > at
> >
> >
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
> > at
> >
> >
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
> > at
> >
> >
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
> > at
> >
> >
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > I get this error locally with any sufficiently big dataset (~10000
> nodes).
> > When the annotation is in place, it works without problem.
> > I also generated the optimizer plans for the two cases:
> > - with annotation (working):
> > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b
> > - without annotation (failing):
> > https://gist.github.com/vasia/086faa45b980bf7f4c09
> >
> > After visualizing the plans, the main difference I see is that in the
> > working case, the next workset node and the solution set delta nodes are
> > merged, while in the failing case they are separate.
> >
> > Shouldn't this work with and without annotation (but be more efficient
> with
> > the annotation in place)? Or am I missing something here?
> >
> > Thanks in advance for any help :))
> >
> > Cheers,
> > - Vasia.
> >
>

Reply via email to