Re: Tuples serialization

2015-04-23 Thread Flavio Pompermaier
I've searched within flink for a working example of TypeSerializerOutputFormat usage but I didn't find anything usable. Cold you show me a simple snippet of code? Do I have to configure BinaryInputFormat.BLOCK_SIZE_PARAMETER_KEY? Which size do I have to use? Will flink write a single file or a set

Re: How to make a generic key for groupBy

2015-04-23 Thread Soumitra Kumar
Will you elaborate on your use case? It would help to find out where Flink shines. IMO, its a great project, but needs more differentiation from Spark. On Thu, Apr 23, 2015 at 7:25 AM, LINZ, Arnaud wrote: > Hello, > > > > After a quite successful benchmark yesterday (Flink being about twice > f

How to make a generic key for groupBy

2015-04-23 Thread LINZ, Arnaud
Hello, After a quite successful benchmark yesterday (Flink being about twice faster than Spark on my use cases), I’ve turned instantly from spark-fan to flink-fan – great job, committers! So I’ve decided to port my existing Spark tools to Flink. Happily, most of the difficulty was renaming clas

Re: Tuples serialization

2015-04-23 Thread Fabian Hueske
Have you tried the TypeSerializerOutputFormat? This will serialize data using Flink's own serializers and write it to binary files. The data can be read back using the TypeSerializerInputFormat. Cheers, Fabian 2015-04-23 11:14 GMT+02:00 Flavio Pompermaier : > Hi to all, > > in my use case I'd li

Tuples serialization

2015-04-23 Thread Flavio Pompermaier
Hi to all, in my use case I'd like to persist within a directory batch of Tuple2. Which is the most efficient way to achieve that in Flink? I was thinking to use Avro but I can't find an example of how to do that. Once generated how can I (re)generate a Dataset from it? Best, Flavio