Re: What is the equivalent of Spark RDD is Flink

2015-12-30 Thread Chiwan Park
About question 1, Scheduling once for iterative job is one of factors causing performance difference. Dongwon’s slides [1] would be helpful other factors of performance. [1] http://flink-forward.org/?session=a-comparative-performance-evaluation-of-flink > On Dec 31, 2015, at 9:37 AM, Stephan E

Re: 2015: A Year in Review for Apache Flink

2015-12-30 Thread Chiwan Park
Happy New Year 2016 :) > On Dec 31, 2015, at 11:22 AM, Henry Saputra wrote: > > Dear All, > > It is almost end of 2015 and it has been busy and great year for Apache Flink > =) > > Robert Metzger had posted great blog summarizing Apache Flink grow for > this year: > > https://flink.apache.o

2015: A Year in Review for Apache Flink

2015-12-30 Thread Henry Saputra
Dear All, It is almost end of 2015 and it has been busy and great year for Apache Flink =) Robert Metzger had posted great blog summarizing Apache Flink grow for this year: https://flink.apache.org/news/2015/12/18/a-year-in-review.html Happy New Year everyone and thanks for being part of this

Re: What is the equivalent of Spark RDD is Flink

2015-12-30 Thread Stephan Ewen
Concerning question (2): DataSets in Flink are in most cases not materialized at all, but they represent in-flight data as it is being streamed from one operation to the next (remember, Flink is streaming in its core). So even in a MapReduce style program, the DataSet produced by the Map Function

Re: Flink application with HBase

2015-12-30 Thread Stephan Ewen
The OutputFormats (such as the HBaseOutputFormat) come originally from the DataSet API. The work with DataStream, but the main difference to the SinkFunction is that have no way to let you implement custom checkpointing hooks. Since sinks interact with the outside works (side effect), they are by

Re: Fold vs Reduce in DataStream API

2015-12-30 Thread Brian Chhun
Thanks for the clarification. Is there a resource besides the code that has these kinds of things documented? Understandable if there isn't much out there yet and that these things are still in flux. On Wed, Dec 30, 2015 at 11:14 AM, Aljoscha Krettek wrote: > Yes, this is correct right now. It s

Re: Fold vs Reduce in DataStream API

2015-12-30 Thread Aljoscha Krettek
Yes, this is correct right now. It should not be too hard to add the pre-aggregation behavior for fold, however. > On 30 Dec 2015, at 17:31, Brian Chhun wrote: > > Hi All, > > Are certain considerations when using these functions on windowed streams? > > From reading the code, it looks using

Re: Fold vs Reduce in DataStream API

2015-12-30 Thread Brian Chhun
Hi All, Are certain considerations when using these functions on windowed streams? >From reading the code, it looks using reduce (or another aggregation function) on a windowed stream will pre-aggregate the result value as elements are added to the window, keeping the size of window constant. On