Re: Does flink support groupByKey([numTasks])

Martin Liesenberg Sat, 12 Sep 2015 03:34:47 -0700

Hi,

as far as I can tell there is no direct equivalent, which is probably due
to the underlying execution models.


I think the desired behaviour can be expressed by something along the lines
of:
stream.groupBy(0).window(Count.of(<size>))
where:
stream is a DataStream<Tuple2<K, V>> and <size> would be the batch size of
your SparkStreaming job.

The window can also be expressed in terms of time which would look
something like this: .window(Time.of(<time>, <time_unit>))

You can find slides on the streaming API at [1] and there is a number of
examples at [2]

best regards,
martin

[1] http://dataartisans.github.io/flink-training/dataStreamBasics/intro.html
[2]
https://github.com/dataArtisans/flink-training-exercises/tree/master/src/main/java/com/dataArtisans/flinkTraining/exercises/dataStreamJava


Liang Chen <chenliang...@huawei.com> schrieb am Sa., 12. Sep. 2015 um
05:53 Uhr:

> Hi
>
> Now i am considering migrate Sparkstreaming case to Flink for comparing
> performance.
>
> Does flink support groupByKey([numTasks]) ,When called on a dataset of (K,
> V) pairs, returns a dataset of (K, Iterable<V>) pairs.
> If it is not exist,  how to use groupBy() to implement the same function?
>
>
>
> --
> View this message in context:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Does-flink-support-groupByKey-numTasks-tp7973.html
> Sent from the Apache Flink Mailing List archive. mailing list archive at
> Nabble.com.
>

Re: Does flink support groupByKey([numTasks])

Reply via email to