> >> Hi,
> >>
> >> I'm working on a batch job (roughly 10 billion records of input, 10
> >> million groups) that is essentially a 'fold' over each group, that is, I
> >> have a function
> >>
> >> AggregateT addToAggrate(Agg
{...}
>>
>> and want to fold this over each group in my DataSet.
>>
>> My understanding is that I cannot use .groupBy(0).reduce(...) since the
>> ReduceFunction only supports the case where AggregateT is the same as
>> RecordT.
>>
>> A simple solution
RecordT record) {...}
>
> and want to fold this over each group in my DataSet.
>
> My understanding is that I cannot use .groupBy(0).reduce(...) since the
> ReduceFunction only supports the case where AggregateT is the same as
> RecordT.
>
> A simple solution using .reduceGroup(..
y understanding is that I cannot use .groupBy(0).reduce(...) since the
ReduceFunction only supports the case where AggregateT is the same as
RecordT.
A simple solution using .reduceGroup(...) works, but spills all input
data in the reduce step, which produces a lot of slow & expensive Disk IO.
Hi Mary,
the groupBy + reduceGroup works across all partitions of a DataSet. This
means that elements from each partition are grouped (creating potentially a
new partitioning) and then for each group the reduceGroup function is
executed.
Cheers,
Till
On Thu, Apr 20, 2017 at 5:14 PM, Mary m
Hi If groupeby+reduceGroup is used, does each groupeby+reduceGroup take place
on a single partition? If yes, if we have more groups than the partitions, what
happens?
Cheers,Mary
Rohrmann [mailto:trohrm...@apache.org]
Envoyé : jeudi 22 octobre 2015 13:45
À : user@flink.apache.org
Objet : Re: Multiple keys in reduceGroup ?
You don’t modify the objects, however, the ReusingKeyGroupedIterator, which is
the iterator you have in your reduce function, does. Internally it uses
t
> move was to turn it off, and it did solved the problem.
>
> It also increased execution time by 10%, but it’s hard to say if this
> overhead is due to the copy or to the change of behavior of the reduceGroup
> algorithm once it get the right data.
>
>
>
> Since I never
With object reuse activated, Flink heavily reuses objects. Each call to the
Iterator in the reduceGroup function gives back one of the same two
objects, with has been filled with different contents.
Your list of all values will effectively only contain two different objects.
Further more, the
Hi,
I was using primitive types, and EnableObjectReuse was turned on. My next move
was to turn it off, and it did solved the problem.
It also increased execution time by 10%, but it’s hard to say if this overhead
is due to the copy or to the change of behavior of the reduceGroup algorithm
naud
> wrote:
> > Hello,
> >
> >
> >
> > Trying to understand why my code was giving strange results, I’ve ended
> up adding “useless” controls in my code and came with what seems to me a
> bug. I group my dataset according to a key, but in the reduceGroup funct
> On Thu, Oct 22, 2015 at 12:20 PM, LINZ, Arnaud
> wrote:
> Hello,
>
>
>
> Trying to understand why my code was giving strange results, I’ve ended up
> adding “useless” controls in my code and came with what seems to me a bug. I
> group my dataset according to a key, but
aud
wrote:
> Hello,
>
>
>
> Trying to understand why my code was giving strange results, I’ve ended up
> adding “useless” controls in my code and came with what seems to me a bug.
> I group my dataset according to a key, but in the reduceGroup function I am
> passed values
Hello,
Trying to understand why my code was giving strange results, I’ve ended up
adding “useless” controls in my code and came with what seems to me a bug. I
group my dataset according to a key, but in the reduceGroup function I am
passed values with different keys.
My code has the following
Thank you!This is complete solving the problem.
--
View this message in context:
http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Error-in-reduceGroup-operator-when-changing-the-Flink-version-from-0-7-to-0-8-tp785p793.html
Sent from the Apache Flink (Incubator
The problem is that someone changed how project() works. Sorry for the
inconvenience. To make it work, you have to add the type parameter
manually, so that the result of project() has the correct type:
DataSet numVertices
edges.>project(1)).distinct().reduceGroup(new
CountVertices())
On Tue,
-reduceGroup-operator-when-changing-the-Flink-version-from-0-7-to-0-8-tp785p789.html
Sent from the Apache Flink (Incubator) User Mailing List archive. mailing list
archive at Nabble.com.
0.7 to 0.8, reduceGroup operator gets
> the
> following error:
>
> "The method reduceGroup(GroupReduceFunction) in the type
> DataSet is not applicable for the arguments
> (InDegreeDistribution.CountVertices)"
>
> Tried to figure out the error but failed to
Hi, when changing the version from 0.7 to 0.8, reduceGroup operator gets the
following error:
"The method reduceGroup(GroupReduceFunction) in the type
DataSet is not applicable for the arguments
(InDegreeDistribution.CountVertices)"
Tried to figure out the error but failed to fix it.
19 matches
Mail list logo