Hi Slim Baltagi, Thank you for the list you mentioned. It will be really helpful. I have gone through few of the materials you have mentioned, like:
1. Benchmarking Streaming Computation Engines at Yahoo! 2. CapitalOne slides in slideshare. 3. Data-artisan article. Based on these I have identified few of the metrics. 1. Number of tuples processed for every second. 2. Measuring throughput by keeping number of tuples/second constant. I'm thinking of comparing: Read/write throughput: I have to figure out a way to compare storm::spout ~flink::env.getstream and storm::ReportBolt ~ flink::sink I'm not sure of it yet. During the seven-week Insight Data Engineering Fellows program we aim to build a data platform to handle large, real-time datasets. Considering the short period we spend at Insight working on a project, I don't consider it to be full blown benchmark study. But I wanted to be careful and would be willing to work further on those lines. I have enrolled for the meet up happening at NYC as I consider it to be great place to gain knowledge on flink. Looking forward for your talk as well as to meet you and discuss the questions I have. Thank you, Vinaya M S On Sat, Jan 23, 2016 at 3:14 PM, Slim Baltagi <sbalt...@gmail.com> wrote: > Hi Vinaya > > 1. Comparing streaming tools ( in this case Storm and Flink) should not be > based on performance benchmarks only! For example, slides 16-36 list over > 96 > criteria, that we identified at Capital One, to compare two streaming tools > http://www.slideshare.net/sbaltagi/flink-vs-spark/17 > > 2. Now, if you are focusing on performance only, I'll suggest a few related > resources: > > - Benchmarking Streaming Computation Engines at Yahoo! > > http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at > December 16, 2015 Code at github: > https://github.com/yahoo/streaming-benchmarks > > - There is some work started by some Flink contributors to create some > performance scripts for Flink, Spark, and MapReduce here: There is Apache > Flink: Performance and Testing > https://github.com/project-flink/flink-perf > > - Some first numbers on performance of streaming jobs with Apache Flink are > here: > > http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/ > under the section: 'Show me the numbers'. Code used is at: > https://github.com/dataArtisans/performance > > - Yangjun Wang is currently working on his Master thesis at Aalto > university > in Helsinki, Finland. The topic of his thesis is about building a standard > benchmark system for streaming processing systems like Apache Storm, Spark > and Flink. Code at github > https://github.com/wangyangjun/StreamBench/tree/master/StreamBench > > 3. I am giving a talk in NYC on Tuesday February 2nd, 2016 on Apache Flink > and I will be touching a bit on benchmarks > > http://www.meetup.com/New-York-City-NYC-Apache-Flink-Meetup/events/228113118/ > You are welcome to attend. > > Thanks > > Slim Baltagi > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Comparison-of-storm-and-flink-tp4468p4469.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >