Re: Pointers about internal threads and communication in Flink (streaming)

2015-08-17 Thread Vincenzo Gulisano
Thank you very much! I will have a look at the docs Vincenzo On 17 August 2015 at 16:26, Aljoscha Krettek wrote: > Hi Vincenzo, > regarding TaskManagers and how they execute the operations: > > The TaskManager gets a class that is derived from AbstractInvokable. The > TaskManager will create an

Re: Pointers about internal threads and communication in Flink (streaming)

2015-08-17 Thread Aljoscha Krettek
Hi Vincenzo, regarding TaskManagers and how they execute the operations: The TaskManager gets a class that is derived from AbstractInvokable. The TaskManager will create an object from that class and then call methods to facilitate execution. The two main methods are registerInputOutput() and invo

Re: Pointers about internal threads and communication in Flink (streaming)

2015-08-17 Thread Stephan Ewen
Hi! We are working on more docs for that. Here is a start that has a section about the TaskManager task execution. Until then, here is a bit from our wiki: Data Exchange: https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks Serialization for Data Exchange: https://cwiki

Pointers about internal threads and communication in Flink (streaming)

2015-08-17 Thread Vincenzo Gulisano
Hi, is there any document describing how streaming operators are run by the TaskManagers and how communication (intra-node and inter-node) is managed. The closest documention I found is https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/general_arch.html but it is still pretty hi

Self-join with filter

2015-08-17 Thread Ashwin Jayaprakash
Hi, I'm trying to evaluate Flink to see if it can do efficient semi-joins or self-joins with filter. Problem description: I have 1 stream that can contain "near duplicates" records. The records share a "family name" and so, many records can have the same family name. But each record has a unique i

Re: Serialization and kryo

2015-08-17 Thread Robert Metzger
Hi Jay, this is how you can register a custom Kryo serializer, yes. Flink has this project (https://github.com/magro/kryo-serializers) as a dependency. It contains a lot of Kryo Serializers for common types. They also added support for for Guava's ImmutableMap, but the version we are using (0.27)

Re: DataSource vs DataSet

2015-08-17 Thread Stephan Ewen
Hi! They are really the same. The DataSource is the subclass of DataSet which is used when sou do a source operation (env.readTextFile(...) returns a DataSource for example). Greetings, Stephan On Mon, Aug 17, 2015 at 11:23 AM, Flavio Pompermaier wrote: > Hi flinkers, > I have a very simple q

DataSource vs DataSet

2015-08-17 Thread Flavio Pompermaier
Hi flinkers, I have a very simple question for you after the reading of the amazing blog page posted by Nezih Yigitbasi at http://t.co/dCDfFf6DhW (that shows how to integrate Flink and Avro-Parquet through Kite SDK). Is there any code style policy about when to use DataSource or when to use DataSe