subject:"Issue with parallelism when CoGrouping custom nested data tpye"

Re: Issue with parallelism when CoGrouping custom nested data tpye

2015-09-16 Thread Fabian Hueske

Cool! Always happy to help :-) 2015-09-16 14:41 GMT+02:00 Pieter Hameete : > Fantastic Fabian, that was it :-)! I'm glad it wasn't a more severe/tricky > programming error though I already spent quite some time wondering about > this one. > > Have a nice day! > > - Pieter > > > > 2015-09-16 14:2

Re: Issue with parallelism when CoGrouping custom nested data tpye

2015-09-16 Thread Pieter Hameete

Fantastic Fabian, that was it :-)! I'm glad it wasn't a more severe/tricky programming error though I already spent quite some time wondering about this one. Have a nice day! - Pieter 2015-09-16 14:27 GMT+02:00 Fabian Hueske : > Sorry, I was thinking too complicated. Forget about the methods

Re: Issue with parallelism when CoGrouping custom nested data tpye

2015-09-16 Thread Fabian Hueske

Sorry, I was thinking too complicated. Forget about the methods I mentioned. If you are implementing WritableComparable types, you need to override the hashcode() method. Flink treats WritableComparable types just like Hadoop [1]. DawnData does not implement hashcode() which causes inconsistent ha

Re: Issue with parallelism when CoGrouping custom nested data tpye

2015-09-16 Thread Pieter Hameete

Hi, I havent been able to find the problem yet, and I dont know exactly how to check the methods you suggested to check earlier (extractKeys, getFlatComparators, duplicate) for the Scala API. Do you have some pointers for me on how I can check these myself? In my earlier mail I stated that maps,

Re: Issue with parallelism when CoGrouping custom nested data tpye

2015-09-16 Thread Pieter Hameete

Cheers Till and Fabian for your fast replies, it's much appreciated! I figured something should be wrong with my data type. I have no doubt the CoGroup works just fine :-) Its pointers what to investigate about my datatype what I am looking for. Initially I had problems with serialization causing

Re: Issue with parallelism when CoGrouping custom nested data tpye

2015-09-16 Thread Fabian Hueske

This sound like a problem with your custom type and its (presumably) custom serializers and comparators. I assume it is not an issue of partitioning or sorting because Reduce is working fine, as you reported. CoGroup does also partition and sort data, but compares the elements of two sorted stream

Re: Issue with parallelism when CoGrouping custom nested data tpye

2015-09-16 Thread Till Rohrmann

Hi Pieter, your code doesn't look suspicious at the first glance. Would it be possible for you to post a complete example with data (also possible to include it in the code) to reproduce your problem? Cheers, Till On Wed, Sep 16, 2015 at 10:31 AM, Pieter Hameete wrote: > Dear fellow Flinkers,

Issue with parallelism when CoGrouping custom nested data tpye

2015-09-16 Thread Pieter Hameete

Dear fellow Flinkers, I am implementing queries from the XMark ( http://www.ins.cwi.nl/projects/xmark/) benchmark on Flink using a custom nested data type. Reading the XML data generated by the XMark generator into my custom nested datatype works perfectly, and the queries that I have implemented

Re: Issue with parallelism when CoGrouping custom nested data tpye

Re: Issue with parallelism when CoGrouping custom nested data tpye

Re: Issue with parallelism when CoGrouping custom nested data tpye

Re: Issue with parallelism when CoGrouping custom nested data tpye

Re: Issue with parallelism when CoGrouping custom nested data tpye

Re: Issue with parallelism when CoGrouping custom nested data tpye

Re: Issue with parallelism when CoGrouping custom nested data tpye

Issue with parallelism when CoGrouping custom nested data tpye

8 matches

Site Navigation

Mail list logo

Footer information