Hi,

I actually ran into this problem again with a different algorithm :/
Same exception and it looks like getMatchFor() in CompactingHashTable
returns a null record.
Not sure why or why the annotation prevents this from happening. Any
insight is highly welcome :-)

Shall I open an issue so that we don't forget about this?

-Vasia.


On 4 April 2015 at 14:44, Vasiliki Kalavri <vasilikikala...@gmail.com>
wrote:

> Hi Fabian,
>
> thanks for looking into this.
> Let me know if there's anything I can do to help!
>
> Cheers,
> V.
>
> On 3 April 2015 at 22:31, Fabian Hueske <fhue...@gmail.com> wrote:
>
>> Thanks for the nice setup!
>> I could easily reproduce the exception you are facing.
>> But that's the only good news so far :-(
>>
>> I checked the plans and both are valid and should compute the correct
>> result for the program.
>> The split-of solution set delta is required because the it needs to be
>> repartitioned (without the annotation, the optimizer does not know that it
>> is in fact already correctly partitioned). One thing that made me a bit
>> suspicious is that the solution set delta partitioning is marked with a
>> Pipeline-Breaker. The pipeline breaker shouldn't make a semantic
>> difference, but I am not sure if it is really required and also that part
>> of the codebase was recently worked on.
>>
>> So, a closer look and more debugging is necessary to figure out what not
>> working correctly here...
>>
>>
>> 2015-04-03 14:14 GMT+02:00 Vasiliki Kalavri <vasilikikala...@gmail.com>:
>>
>> > Hi Fabian,
>> >
>> > I am using the dblp co-authorship dataset from SNAP:
>> > http://snap.stanford.edu/data/com-DBLP.html
>> > I also pushed my slightly modified version of ConnectedComponents, here:
>> > https://github.com/vasia/flink/tree/cc-test. It basically generates the
>> > vertex dataset from the edges, so that you don't need to create it
>> > separately.
>> > The annotation that creates the error is in line #172.
>> >
>> > Thanks a lot :))
>> >
>> > -Vasia.
>> >
>> >
>> > On 3 April 2015 at 13:09, Fabian Hueske <fhue...@gmail.com> wrote:
>> >
>> > > That looks pretty much like a bug.
>> > >
>> > > As you said, fwd fields annotations are optional and may improve the
>> > > performance of a program, but never change its semantics (if set
>> > > correctly).
>> > >
>> > > I'll have a look at it later.
>> > > Would be great if you could provide some data to reproduce the bug.
>> > > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <
>> vasilikikala...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hello to my squirrels,
>> > > >
>> > > > I've been getting a NullPointerException for a DeltaIteration
>> program
>> > I'm
>> > > > trying to implement and I could really use your help :-)
>> > > > It seems that some of the input Tuples of the Join operator that I'm
>> > > using
>> > > > to create the next workset / solution set delta are null.
>> > > > It also seems that adding ForwardedFields annotations solves the
>> issue.
>> > > >
>> > > > I managed to reproduce the behavior using the ConnectedComponents
>> > > example,
>> > > > by removing the "@ForwardedFieldsFirst("*")" annotation from
>> > > > the ComponentIdFilter join.
>> > > > The exception message is the following:
>> > > >
>> > > > Caused by: java.lang.NullPointerException
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
>> > > > at java.lang.Thread.run(Thread.java:745)
>> > > >
>> > > > I get this error locally with any sufficiently big dataset (~10000
>> > > nodes).
>> > > > When the annotation is in place, it works without problem.
>> > > > I also generated the optimizer plans for the two cases:
>> > > > - with annotation (working):
>> > > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b
>> > > > - without annotation (failing):
>> > > > https://gist.github.com/vasia/086faa45b980bf7f4c09
>> > > >
>> > > > After visualizing the plans, the main difference I see is that in
>> the
>> > > > working case, the next workset node and the solution set delta nodes
>> > are
>> > > > merged, while in the failing case they are separate.
>> > > >
>> > > > Shouldn't this work with and without annotation (but be more
>> efficient
>> > > with
>> > > > the annotation in place)? Or am I missing something here?
>> > > >
>> > > > Thanks in advance for any help :))
>> > > >
>> > > > Cheers,
>> > > > - Vasia.
>> > > >
>> > >
>> >
>>
>
>

Reply via email to