Re: Creating a representative streaming workload

2015-11-24 Thread Andra Lungu
Hi, Sorry for the ultra-late reply. Another real-life streaming scenario would be the one I am working on: - collecting data from telecom cells in real-time - and filtering out certain information or enriching/correlating (adding additional info based on the parameters received) events - this is

Re: Creating a representative streaming workload

2015-11-18 Thread Robert Metzger
Hey Vasia, I think a very common workload would be an event stream from web servers of an online shop. Usually, these shops have multiple servers, so events arrive out of order. I think there are plenty of different use cases that you can build around that data: - Users perform different actions t

Re: Creating a representative streaming workload

2015-11-16 Thread Nick Dimiduk
All those should apply for streaming too... On Mon, Nov 16, 2015 at 11:06 AM, Vasiliki Kalavri < vasilikikala...@gmail.com> wrote: > Hi, > > thanks Nick and Ovidiu for the links! > > Just to clarify, we're not looking into creating a generic streaming > benchmark. We have quite limited time and r

Re: Creating a representative streaming workload

2015-11-16 Thread Vasiliki Kalavri
Hi, thanks Nick and Ovidiu for the links! Just to clarify, we're not looking into creating a generic streaming benchmark. We have quite limited time and resources for this project. What we want is to decide on a set of 3-4 _common_ streaming applications. To give you an idea, for the batch worklo

Re: Creating a representative streaming workload

2015-11-16 Thread Ovidiu-Cristian MARCU
Regarding Flink vs Spark / Storm you can check here: http://www.sparkbigdata.com/102-spark-blog-slim-baltagi/14-results-of-a-benchmark-between-apache-flink-and-apache-spark

Re: Creating a representative streaming workload

2015-11-16 Thread Nick Dimiduk
Why not use an existing benchmarking tool -- is there one? Perhaps you'd like to build something like YCSB [0] but for streaming workloads? Apache Storm is the OSS framework that's been around the longest. Search for "apache storm benchmark" and you'll get some promising hits. Looks like IBMStream

Creating a representative streaming workload

2015-11-16 Thread Vasiliki Kalavri
Hello squirrels, with some colleagues and students here at KTH, we have started 2 projects to evaluate (1) performance and (2) behavior in the presence of memory interference in cloud environments, for Flink and other systems. We want to provide our students with a workload of representative appli