Re: how to get rid of duplicate rows group by in DataStream

2016-08-24 Thread Yassine Marzougui
Sorry I mistyped the code, it should be *timeWindow**(Time.minutes(10))* instead of *window**(Time.minutes(10)).* On Wed, Aug 24, 2016 at 9:29 PM, Yassine Marzougui wrote: > Hi subash, > > A stream is infinite, hence it has no notion of "final" count. To get > distinct counts you need to define

Re: how to get rid of duplicate rows group by in DataStream

2016-08-24 Thread Yassine Marzougui
Hi subash, A stream is infinite, hence it has no notion of "final" count. To get distinct counts you need to define a period (= a window [1] ) over which you count elements and emit a result, by adding a winow operator before the reduce. For example the following will emit distinct counts every 10

Re: how to get rid of duplicate rows group by in DataStream

2016-08-24 Thread subash basnet
Hello Kostas, Sorry for late reply. But I couldn't understand how to apply split in datastream, such as in below to get the distinct output stream element with the count after applying group by and reduce. DataStream> gridWithDensity = pointsWithGridCoordinates.map(new AddCountAppender()) .keyBy(

Re: how to get rid of duplicate rows group by in DataStream

2016-08-22 Thread Kostas Kloudas
Hi Subash, You should also split your elements in windows. If not, Flink emits an element for each incoming record. That is why you have: (1,1) (1,2) (1,3) … Kostas > On Aug 22, 2016, at 5:58 PM, subash basnet wrote: > > Hello all, > > I grouped by the input based on it's id to count the n

how to get rid of duplicate rows group by in DataStream

2016-08-22 Thread subash basnet
Hello all, I grouped by the input based on it's id to count the number of elements in each group. DataStream> gridWithCount; Upon printing the above datastream it shows with duplicate rows: Output: (1, 1) (1,2) (2,1) (1,3) (2,2)... Whereas I wanted the distinct rows with final count: Needed O