[DISCUSS] (Meta)data Driven Window Triggers

2016-08-12 Thread Kevin Jacobs
Hi, Today I will be giving a presentation about Apache Flink and in terms of the use cases at my company, Apache Flink performs better than Apache Spark. There is only one issue I encountered, and that is the lack of support for (Meta)data Driven Window Triggers. I would like to start a disc

Conceptual difference Windows and DataSet

2016-08-04 Thread Kevin Jacobs
Hi, I have the following use case: 1. Group by a specific field. 2. Get a list of all messages belonging to the group. 3. Count the number of records in the group. With the use of DataSets, it is fairly easy to do this (see http://stackoverflow.com/questions/38745446/apache-flink

Re: Introduction

2016-08-01 Thread Kevin Jacobs
Hi! Welcome to the community :-)! On 01.08.2016 09:51, Ufuk Celebi wrote: On Sun, Jul 31, 2016 at 8:07 PM, Neelesh Salian wrote: I am Neelesh Salian; I recently joined the Flink community and I wanted to take this opportunity to formally introduce myself. Thanks and welcome! :-)

Re: Discard out-of-order events

2016-07-29 Thread Kevin Jacobs
or you: - Can I make this more efficient? - Is there a way of mixing datasets and datastreams? That would be really awesome (for at least this use case). - Is there a way to ensure checkpoints, since I am using an iterative stream here? - Can I get rid of the TumblingProcessingTimeWindows? Because in fact,

Discard out-of-order events

2016-07-28 Thread Kevin Jacobs
Is it possible to discard events that are out-of-order (in terms of event time)?

Use case

2016-07-28 Thread Kevin Jacobs
Hi all, I am trying to keep track of the biggest value in a stream. I do this by using the iterative step mechanism of Apache Flink. However, I get an exception that checkpointing is not supported for iterative jobs. Why can't this be enabled? My iterative stream is also quite small: only one

Re: FlinkKafkaConsumer09

2016-07-28 Thread Kevin Jacobs
t", "earliest") to achieve the same behavior. Kafka keeps track of the offsets per group id. If you have already read from a topic with a certain group id and want to restart from the smallest offset available, you need to generate a unique group id. Cheers, Max On Thu, Jul 28,

FlinkKafkaConsumer09

2016-07-28 Thread Kevin Jacobs
Hi, I am currently facing strange behaviour of the FlinkKafkaConsumer09 class. I am using Flink 1.0.3. These are my properties: val properties = new Properties() properties.setProperty("bootstrap.servers", config.urlKafka) properties.setProperty("group.id", COLLECTOR_NAME) properties.setPrope

Re: Evaluating Apache Flink

2016-07-08 Thread Kevin Jacobs
w.slideshare.net/GyulaFra/largescale-stream-processing-in-the-hadoop-ecosystem [6] http://www.slideshare.net/GyulaFra/largescale-stream-processing-in-the-hadoop-ecosystem-hadoop-summit-2016-60887821 On Fri, Jul 8, 2016 at 2:23 PM, Kevin Jacobs wrote: Hi, I am currently working working for an or

Evaluating Apache Flink

2016-07-08 Thread Kevin Jacobs
Hi, I am currently working working for an organization which is using Apache Spark as main data processing framework. Now the organization is wondering whether Apache Flink is better at processing their data than Apache Spark. Therefore, I am evaluating Apache Flink and I am comparing it to A

Re: Contributing

2016-07-08 Thread Kevin Jacobs
rg/how-to-contribute.html If you have follow-up question, just go for it :) -Matthias On 07/08/2016 10:02 AM, Kevin Jacobs wrote: Hi, I am relatively new to the development process of Apache Flink. Where can I start to help you developing Flink? Kind regards, Kevin

Contributing

2016-07-08 Thread Kevin Jacobs
Hi, I am relatively new to the development process of Apache Flink. Where can I start to help you developing Flink? Kind regards, Kevin

Re: [Discuss] Why different job's tasks can run in the single process.

2016-06-30 Thread Kevin Jacobs
In my opinion the streaming process can be perfectly simulated on a single node. You can setup a message distribution system like Kafka on a single node, you can run Spark on a single node and the only thing you need to change when running it on a cluster is that you need to change the environm