Count of Grouped DataSet

nsengupta Sat, 30 Apr 2016 08:46:50 -0700

Hello Flinksters,

What is the most idiomatic way in Flink to get the count of records grouped 
by a Key (the Key can have multiple fields)?


I have referred to this  ticket
<https://issues.apache.org/jira/browse/FLINK-1269>   but because it is still
open, I can't make out what has been the final decision.

Let's say that we have following records (case class or tuple, whatever):

f1,  f2,  f3,  f4
------------------
1,   1,   2,   "A"
1,   1,   2,   "B"
2,   1,   3,   "A"
3,   1,   4,   "C"

I group this DateSet on a composite key of (f2,f3) and then, I need the
count:
([1,2], 2)
([1,3], 1)
([1,4], 1)

I could have gone the way of accepted wisdom of /mapping/ with an extra '1'
for every key and then, /reducing/ with a /sum/ operation, but I think it is
somewhat low-level than what one is expected to do. Spark has this
/countByKey/ operator for such a purpose.

Could someone please nudge me to the right direction?

-- Nirmalya





--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Count-of-Grouped-DataSet-tp6592.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Count of Grouped DataSet

Reply via email to