Re: DataSet: combineGroup/reduceGroup with large number of groups

2017-06-19 Thread Fabian Hueske
> >> Hi, > >> > >> I'm working on a batch job (roughly 10 billion records of input, 10 > >> million groups) that is essentially a 'fold' over each group, that is, I > >> have a function > >> > >> AggregateT addToAggrate(Agg

Re: DataSet: combineGroup/reduceGroup with large number of groups

2017-06-16 Thread Urs Schoenenberger
{...} >> >> and want to fold this over each group in my DataSet. >> >> My understanding is that I cannot use .groupBy(0).reduce(...) since the >> ReduceFunction only supports the case where AggregateT is the same as >> RecordT. >> >> A simple solution

Re: DataSet: combineGroup/reduceGroup with large number of groups

2017-06-16 Thread Fabian Hueske
RecordT record) {...} > > and want to fold this over each group in my DataSet. > > My understanding is that I cannot use .groupBy(0).reduce(...) since the > ReduceFunction only supports the case where AggregateT is the same as > RecordT. > > A simple solution using .reduceGroup(..

DataSet: combineGroup/reduceGroup with large number of groups

2017-06-16 Thread Urs Schoenenberger
y understanding is that I cannot use .groupBy(0).reduce(...) since the ReduceFunction only supports the case where AggregateT is the same as RecordT. A simple solution using .reduceGroup(...) works, but spills all input data in the reduce step, which produces a lot of slow & expensive Disk IO.

Re: reduceGroup

2017-04-20 Thread Till Rohrmann
Hi Mary, the groupBy + reduceGroup works across all partitions of a DataSet. This means that elements from each partition are grouped (creating potentially a new partitioning) and then for each group the reduceGroup function is executed. Cheers, Till On Thu, Apr 20, 2017 at 5:14 PM, Mary m

reduceGroup

2017-04-20 Thread Mary m
Hi If groupeby+reduceGroup is used, does each groupeby+reduceGroup take place on a single partition? If yes, if we have more groups than the partitions, what happens? Cheers,Mary

RE: Multiple keys in reduceGroup ?

2015-10-22 Thread LINZ, Arnaud
Rohrmann [mailto:trohrm...@apache.org] Envoyé : jeudi 22 octobre 2015 13:45 À : user@flink.apache.org Objet : Re: Multiple keys in reduceGroup ? You don’t modify the objects, however, the ReusingKeyGroupedIterator, which is the iterator you have in your reduce function, does. Internally it uses

Re: Multiple keys in reduceGroup ?

2015-10-22 Thread Till Rohrmann
t > move was to turn it off, and it did solved the problem. > > It also increased execution time by 10%, but it’s hard to say if this > overhead is due to the copy or to the change of behavior of the reduceGroup > algorithm once it get the right data. > > > > Since I never

Re: Multiple keys in reduceGroup ?

2015-10-22 Thread Stephan Ewen
With object reuse activated, Flink heavily reuses objects. Each call to the Iterator in the reduceGroup function gives back one of the same two objects, with has been filled with different contents. Your list of all values will effectively only contain two different objects. Further more, the

RE: Multiple keys in reduceGroup ?

2015-10-22 Thread LINZ, Arnaud
Hi, I was using primitive types, and EnableObjectReuse was turned on. My next move was to turn it off, and it did solved the problem. It also increased execution time by 10%, but it’s hard to say if this overhead is due to the copy or to the change of behavior of the reduceGroup algorithm

Re: Multiple keys in reduceGroup ?

2015-10-22 Thread Till Rohrmann
naud > wrote: > > Hello, > > > > > > > > Trying to understand why my code was giving strange results, I’ve ended > up adding “useless” controls in my code and came with what seems to me a > bug. I group my dataset according to a key, but in the reduceGroup funct

Re: Multiple keys in reduceGroup ?

2015-10-22 Thread Aljoscha Krettek
> On Thu, Oct 22, 2015 at 12:20 PM, LINZ, Arnaud > wrote: > Hello, > > > > Trying to understand why my code was giving strange results, I’ve ended up > adding “useless” controls in my code and came with what seems to me a bug. I > group my dataset according to a key, but

Re: Multiple keys in reduceGroup ?

2015-10-22 Thread Stephan Ewen
aud wrote: > Hello, > > > > Trying to understand why my code was giving strange results, I’ve ended up > adding “useless” controls in my code and came with what seems to me a bug. > I group my dataset according to a key, but in the reduceGroup function I am > passed values

Multiple keys in reduceGroup ?

2015-10-22 Thread LINZ, Arnaud
Hello, Trying to understand why my code was giving strange results, I’ve ended up adding “useless” controls in my code and came with what seems to me a bug. I group my dataset according to a key, but in the reduceGroup function I am passed values with different keys. My code has the following

Re: error in reduceGroup operator when changing the Flink version from 0.7 to 0.8

2015-02-24 Thread HungChang
Thank you!This is complete solving the problem. -- View this message in context: http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Error-in-reduceGroup-operator-when-changing-the-Flink-version-from-0-7-to-0-8-tp785p793.html Sent from the Apache Flink (Incubator

Re: error in reduceGroup operator when changing the Flink version from 0.7 to 0.8

2015-02-24 Thread Aljoscha Krettek
The problem is that someone changed how project() works. Sorry for the inconvenience. To make it work, you have to add the type parameter manually, so that the result of project() has the correct type: DataSet numVertices edges.>project(1)).distinct().reduceGroup(new CountVertices()) On Tue,

Re: error in reduceGroup operator when changing the Flink version from 0.7 to 0.8

2015-02-24 Thread HungChang
-reduceGroup-operator-when-changing-the-Flink-version-from-0-7-to-0-8-tp785p789.html Sent from the Apache Flink (Incubator) User Mailing List archive. mailing list archive at Nabble.com.

Re: error in reduceGroup operator when changing the Flink version from 0.7 to 0.8

2015-02-24 Thread Stephan Ewen
0.7 to 0.8, reduceGroup operator gets > the > following error: > > "The method reduceGroup(GroupReduceFunction) in the type > DataSet is not applicable for the arguments > (InDegreeDistribution.CountVertices)" > > Tried to figure out the error but failed to

error in reduceGroup operator when changing the Flink version from 0.7 to 0.8

2015-02-24 Thread HungChang
Hi, when changing the version from 0.7 to 0.8, reduceGroup operator gets the following error: "The method reduceGroup(GroupReduceFunction) in the type DataSet is not applicable for the arguments (InDegreeDistribution.CountVertices)" Tried to figure out the error but failed to fix it.