Hey Vasia, I think a very common workload would be an event stream from web servers of an online shop. Usually, these shops have multiple servers, so events arrive out of order. I think there are plenty of different use cases that you can build around that data: - Users perform different actions that a streaming system could track (analysis of click-paths), - some simple statistics using windows (items sold in the last 10 minutes, ..). - Maybe fraud detection would be another use case. - Often, there also needs to be a sink to HDFS or another file system for a long-term archive.
I would love to see such an event generator in flink's contrib module. I think that's something the entire streaming space could use. On Mon, Nov 16, 2015 at 8:22 PM, Nick Dimiduk <ndimi...@gmail.com> wrote: > All those should apply for streaming too... > > On Mon, Nov 16, 2015 at 11:06 AM, Vasiliki Kalavri < > vasilikikala...@gmail.com> wrote: > >> Hi, >> >> thanks Nick and Ovidiu for the links! >> >> Just to clarify, we're not looking into creating a generic streaming >> benchmark. We have quite limited time and resources for this project. What >> we want is to decide on a set of 3-4 _common_ streaming applications. To >> give you an idea, for the batch workload, we will pick something like a >> grep, one relational application, a graph algorithm, and an ML algorithm. >> >> Cheers, >> -Vasia. >> >> On 16 November 2015 at 19:25, Ovidiu-Cristian MARCU < >> ovidiu-cristian.ma...@inria.fr> wrote: >> >>> Regarding Flink vs Spark / Storm you can check here: >>> http://www.sparkbigdata.com/102-spark-blog-slim-baltagi/14-results-of-a-benchmark-between-apache-flink-and-apache-spark >>> >>> Best regards, >>> Ovidiu >>> >>> On 16 Nov 2015, at 15:21, Vasiliki Kalavri <vasilikikala...@gmail.com> >>> wrote: >>> >>> Hello squirrels, >>> >>> with some colleagues and students here at KTH, we have started 2 >>> projects to evaluate (1) performance and (2) behavior in the presence of >>> memory interference in cloud environments, for Flink and other systems. We >>> want to provide our students with a workload of representative applications >>> for testing. >>> >>> While for batch applications, it is quite clear to us what classes of >>> applications are widely used and how to create a workload of different >>> types of applications, we are not quite sure about the streaming workload. >>> >>> That's why, we'd like your opinions! If you're using Flink streaming in >>> your company or your project, we'd love your input even more :-) >>> >>> What kind of applications would you consider as "representative" of a >>> streaming workload? Have you run any experiments to evaluate Flink versus >>> Spark, Storm etc.? If yes, would you mind sharing your code with us? >>> >>> We will of course be happy to share our results with everyone after we >>> have completed our study. >>> >>> Thanks a lot! >>> -Vasia. >>> >>> >>> >> >