Hi Liang, Martin gave a very good answer.
I'd like to add a few words regarding the different stream processing models of Spark and Flink. Spark Streaming is based on mini-batches and each mini-batch is processed like a batch program. Hence a Spark Streaming program is implemented like a batch program. In contrast, Flink pipelines data streams and processes them an infinite sequence of data. Because data streams are infinite, you cannot simply group their data (a group would never be closed). This issue is addressed by using windows which discretize a stream by time or count. Although Spark's mini-batches can be interpreted as a global time windows, it can be tricky to implement exactly the same semantics in Flink because the internal implementations of Spark and Flink differ. Best, Fabian 2015-09-12 12:33 GMT+02:00 Martin Liesenberg <martin.liesenb...@gmail.com>: > Hi, > > as far as I can tell there is no direct equivalent, which is probably due > to the underlying execution models. > > I think the desired behaviour can be expressed by something along the lines > of: > stream.groupBy(0).window(Count.of(<size>)) > where: > stream is a DataStream<Tuple2<K, V>> and <size> would be the batch size of > your SparkStreaming job. > > The window can also be expressed in terms of time which would look > something like this: .window(Time.of(<time>, <time_unit>)) > > You can find slides on the streaming API at [1] and there is a number of > examples at [2] > > best regards, > martin > > [1] > http://dataartisans.github.io/flink-training/dataStreamBasics/intro.html > [2] > > https://github.com/dataArtisans/flink-training-exercises/tree/master/src/main/java/com/dataArtisans/flinkTraining/exercises/dataStreamJava > > > Liang Chen <chenliang...@huawei.com> schrieb am Sa., 12. Sep. 2015 um > 05:53 Uhr: > > > Hi > > > > Now i am considering migrate Sparkstreaming case to Flink for comparing > > performance. > > > > Does flink support groupByKey([numTasks]) ,When called on a dataset of > (K, > > V) pairs, returns a dataset of (K, Iterable<V>) pairs. > > If it is not exist, how to use groupBy() to implement the same function? > > > > > > > > -- > > View this message in context: > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Does-flink-support-groupByKey-numTasks-tp7973.html > > Sent from the Apache Flink Mailing List archive. mailing list archive at > > Nabble.com. > > >