Re: Objects deserialization on Jobmanager

2015-04-24 Thread Ventura Del Monte
Hi Stephan! Thank you for your reply, first of all! You're right about how I distributed my data. I need this because I have an object that should be shared among tasks. I am working on decoupling this object from the cuda type at the moment and I will follow your suggestions! About my CudaExecut

flink ml - k-means

2015-04-24 Thread Pa Rö
hi flink community, at the time I write my master thesis in the field machine learning. My main task is to evaluated different k-means variants for large data sets (BigData). I would like test flink ml against Apache Mahout and Apache Hadoop MapReduce in areas of scalability and performance(time a

[no subject]

2015-04-24 Thread Pa Rö
user-sc.1429880470. oeiopbmoofcapkjibfab-paul.roewer1990=googlemail@flink.apache.org

Re: Flink Java 8 problem (no lambda, simple code)

2015-04-24 Thread Aljoscha Krettek
Unfortunately I can't reproduce your error on my machine (OS X, java 8) i created a fresh maven project from your pom and source example and it runs. As a workaround you can call cluster.getConfig().disableClosureCleaner(). The closure cleaner normally cleans closures from unneeded stuff because w

Re: Flink Java 8 problem (no lambda, simple code)

2015-04-24 Thread Stephan Ewen
One thing I noticed a while back with ASM version 4 and Java 8 had issues - but those were related to Java 8 lambdas. Back then, bumping ASM to version 5 helped it. Not sure if this is the same problem, though, since you do not seem to use Java 8 lambdas... On Fri, Apr 24, 2015 at 11:32 AM, Aljos

Re: Flink Java 8 problem (no lambda, simple code)

2015-04-24 Thread Aljoscha Krettek
I'm looking into it, On Fri, Apr 24, 2015 at 11:13 AM, LINZ, Arnaud wrote: > Hi, > > > > I have the following simple code that works well in Java 7 : > > > > final ExecutionEnvironment cluster = > ExecutionEnvironment.createLocalEnvironment(); > > final DataSet textFile = > cluste

Re: How to make a generic key for groupBy

2015-04-24 Thread Stephan Ewen
Hi Arnaud! Thank you for the warm words! Let's find a good way to get this to work... As a bit of background: In Flink, the API needs to now a bit about the types that go through the functions, because Flink pre-generates and configures serializers, and validates that things fit together. It is

Flink Java 8 problem (no lambda, simple code)

2015-04-24 Thread LINZ, Arnaud
Hi, I have the following simple code that works well in Java 7 : final ExecutionEnvironment cluster = ExecutionEnvironment.createLocalEnvironment(); final DataSet textFile = cluster.readTextFile(MiscTools.chercher("jeuDeDonnees.txt")); final DataSet> words = textFile

Re: Tuples serialization

2015-04-24 Thread Stephan Ewen
For the Input Side: The data set myTuples has its type via "myTuples.getType()". The TypeSerializerOutputFormat implements a special interface that picks up that type automatically. If you want to use the type serializer input, you can always do it like this: DataSet> myTuples = ...; myTuples.o

Re: Tuples serialization

2015-04-24 Thread Stephan Ewen
I think you need not create any TypeInformation anyways. It is always present in the data set. DataSet> myTuples = ...; myTuples.output(new TypeSerializerOutputFormat>()); On Fri, Apr 24, 2015 at 10:20 AM, Fabian Hueske wrote: > The BLOCK_SIZE_PARAMETER_KEY is used to split a file into process

Re: Objects deserialization on Jobmanager

2015-04-24 Thread Stephan Ewen
Hi Ventura! You are distributing your data via something like "env.fromElements(...)" or "env.fromCollection(...)", is that correct? The master node (JobManager) currently takes each InputFormat and checks whether it needs some "master side initialization". For file input formats, this computes f

Re: Tuples serialization

2015-04-24 Thread Fabian Hueske
The BLOCK_SIZE_PARAMETER_KEY is used to split a file into processable blocks. Since this is a binary file format, the InputFormat does not know where a new record starts. When writing such a file, each block starts with a new record and is filled until no more records fit completely in. The remaini

Re: Tuples serialization

2015-04-24 Thread Flavio Pompermaier
I managed to read and write avro files and still I have two doubts: Which size do I have to use for BLOCK_SIZE_PARAMETER_KEY? Do I have really to create a sample tuple to extract the TypeInformation to instantiate the TypeSerializerInputFormat? On Thu, Apr 23, 2015 at 7:04 PM, Flavio Pompermaier