Will do, thanks! On 27 April 2015 at 11:06, Fabian Hueske <fhue...@gmail.com> wrote:
> No, haven't looked at it since my last mail :-( > Both plans (with and without forward fields annotation) look good except > for the suspicious pipeline breaker. > > @Vasia Could you open a JIRA and assign it to me? > I'll have a closer look and try to figure out what's going on. > > > 2015-04-27 10:34 GMT+02:00 Stephan Ewen <se...@apache.org>: > > > I think Fabian looked into this a while back... > > > > @Fabian, do you have any insights what causes this? > > > > > > On Sat, Apr 25, 2015 at 7:46 PM, Vasiliki Kalavri < > > vasilikikala...@gmail.com > > > wrote: > > > > > Hi, > > > > > > I actually ran into this problem again with a different algorithm :/ > > > Same exception and it looks like getMatchFor() in CompactingHashTable > > > returns a null record. > > > Not sure why or why the annotation prevents this from happening. Any > > > insight is highly welcome :-) > > > > > > Shall I open an issue so that we don't forget about this? > > > > > > -Vasia. > > > > > > > > > On 4 April 2015 at 14:44, Vasiliki Kalavri <vasilikikala...@gmail.com> > > > wrote: > > > > > > > Hi Fabian, > > > > > > > > thanks for looking into this. > > > > Let me know if there's anything I can do to help! > > > > > > > > Cheers, > > > > V. > > > > > > > > On 3 April 2015 at 22:31, Fabian Hueske <fhue...@gmail.com> wrote: > > > > > > > >> Thanks for the nice setup! > > > >> I could easily reproduce the exception you are facing. > > > >> But that's the only good news so far :-( > > > >> > > > >> I checked the plans and both are valid and should compute the > correct > > > >> result for the program. > > > >> The split-of solution set delta is required because the it needs to > be > > > >> repartitioned (without the annotation, the optimizer does not know > > that > > > it > > > >> is in fact already correctly partitioned). One thing that made me a > > bit > > > >> suspicious is that the solution set delta partitioning is marked > with > > a > > > >> Pipeline-Breaker. The pipeline breaker shouldn't make a semantic > > > >> difference, but I am not sure if it is really required and also that > > > part > > > >> of the codebase was recently worked on. > > > >> > > > >> So, a closer look and more debugging is necessary to figure out what > > not > > > >> working correctly here... > > > >> > > > >> > > > >> 2015-04-03 14:14 GMT+02:00 Vasiliki Kalavri < > > vasilikikala...@gmail.com > > > >: > > > >> > > > >> > Hi Fabian, > > > >> > > > > >> > I am using the dblp co-authorship dataset from SNAP: > > > >> > http://snap.stanford.edu/data/com-DBLP.html > > > >> > I also pushed my slightly modified version of ConnectedComponents, > > > here: > > > >> > https://github.com/vasia/flink/tree/cc-test. It basically > generates > > > the > > > >> > vertex dataset from the edges, so that you don't need to create it > > > >> > separately. > > > >> > The annotation that creates the error is in line #172. > > > >> > > > > >> > Thanks a lot :)) > > > >> > > > > >> > -Vasia. > > > >> > > > > >> > > > > >> > On 3 April 2015 at 13:09, Fabian Hueske <fhue...@gmail.com> > wrote: > > > >> > > > > >> > > That looks pretty much like a bug. > > > >> > > > > > >> > > As you said, fwd fields annotations are optional and may improve > > the > > > >> > > performance of a program, but never change its semantics (if set > > > >> > > correctly). > > > >> > > > > > >> > > I'll have a look at it later. > > > >> > > Would be great if you could provide some data to reproduce the > > bug. > > > >> > > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" < > > > >> vasilikikala...@gmail.com> > > > >> > > wrote: > > > >> > > > > > >> > > > Hello to my squirrels, > > > >> > > > > > > >> > > > I've been getting a NullPointerException for a DeltaIteration > > > >> program > > > >> > I'm > > > >> > > > trying to implement and I could really use your help :-) > > > >> > > > It seems that some of the input Tuples of the Join operator > that > > > I'm > > > >> > > using > > > >> > > > to create the next workset / solution set delta are null. > > > >> > > > It also seems that adding ForwardedFields annotations solves > the > > > >> issue. > > > >> > > > > > > >> > > > I managed to reproduce the behavior using the > > ConnectedComponents > > > >> > > example, > > > >> > > > by removing the "@ForwardedFieldsFirst("*")" annotation from > > > >> > > > the ComponentIdFilter join. > > > >> > > > The exception message is the following: > > > >> > > > > > > >> > > > Caused by: java.lang.NullPointerException > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186) > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1) > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198) > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139) > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92) > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) > > > >> > > > at java.lang.Thread.run(Thread.java:745) > > > >> > > > > > > >> > > > I get this error locally with any sufficiently big dataset > > (~10000 > > > >> > > nodes). > > > >> > > > When the annotation is in place, it works without problem. > > > >> > > > I also generated the optimizer plans for the two cases: > > > >> > > > - with annotation (working): > > > >> > > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b > > > >> > > > - without annotation (failing): > > > >> > > > https://gist.github.com/vasia/086faa45b980bf7f4c09 > > > >> > > > > > > >> > > > After visualizing the plans, the main difference I see is that > > in > > > >> the > > > >> > > > working case, the next workset node and the solution set delta > > > nodes > > > >> > are > > > >> > > > merged, while in the failing case they are separate. > > > >> > > > > > > >> > > > Shouldn't this work with and without annotation (but be more > > > >> efficient > > > >> > > with > > > >> > > > the annotation in place)? Or am I missing something here? > > > >> > > > > > > >> > > > Thanks in advance for any help :)) > > > >> > > > > > > >> > > > Cheers, > > > >> > > > - Vasia. > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > >