I think you need not create any TypeInformation anyways. It is always present in the data set.
DataSet<Tuple2<String, integer>> myTuples = ...; myTuples.output(new TypeSerializerOutputFormat<Tuple2<String, integer>>()); On Fri, Apr 24, 2015 at 10:20 AM, Fabian Hueske <fhue...@gmail.com> wrote: > The BLOCK_SIZE_PARAMETER_KEY is used to split a file into processable > blocks. Since this is a binary file format, the InputFormat does not know > where a new record starts. When writing such a file, each block starts with > a new record and is filled until no more records fit completely in. The > remaining space until the next block border is padded. > > As long as the BLOCK_SIZE_PARAMETER_KEY is larger than the max record size > and Input and OutputFormats use the same setting, the parameter has only > performance implications. Smaller settings waste more space but allow for > higher read parallelism (too much parallelism causes scheduling overhead). > I'd simply set it to 64MB and experiment with smaller and larger settings > if performance is a major concern here. > > You don't need to create a sample tuple to create a TypeInformation for it. > > private TupleTypeInfo<Tuple2<String, byte[]>> tInfo = new > TupleTypeInfo<Tuple2<String, byte[]>>( > BasicTypeInfo.STRING_TYPE_INFO, > PrimitiveArrayTypeInfo.BYTE_PRIMITIVE_ARRAY_TYPE_INFO > ); > > 2015-04-24 9:54 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > >> I managed to read and write avro files and still I have two doubts: >> >> Which size do I have to use for BLOCK_SIZE_PARAMETER_KEY? >> Do I have really to create a sample tuple to extract the TypeInformation >> to instantiate the TypeSerializerInputFormat? >> >> On Thu, Apr 23, 2015 at 7:04 PM, Flavio Pompermaier <pomperma...@okkam.it >> > wrote: >> >>> I've searched within flink for a working example of >>> TypeSerializerOutputFormat >>> usage but I didn't find anything usable. >>> Cold you show me a simple snippet of code? >>> Do I have to configure BinaryInputFormat.BLOCK_SIZE_PARAMETER_KEY? >>> Which size do I have to use? Will flink write a single file or a set of >>> avro file in a directory? >>> Is it possible to read all files in a directory at once? >>> >>> On Thu, Apr 23, 2015 at 12:16 PM, Fabian Hueske <fhue...@gmail.com> >>> wrote: >>> >>>> Have you tried the TypeSerializerOutputFormat? >>>> This will serialize data using Flink's own serializers and write it to >>>> binary files. >>>> The data can be read back using the TypeSerializerInputFormat. >>>> >>>> Cheers, Fabian >>>> >>>> 2015-04-23 11:14 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: >>>> >>>>> Hi to all, >>>>> >>>>> in my use case I'd like to persist within a directory batch of >>>>> Tuple2<String, byte[]>. >>>>> Which is the most efficient way to achieve that in Flink? >>>>> I was thinking to use Avro but I can't find an example of how to do >>>>> that. >>>>> Once generated how can I (re)generate a Dataset<Tuple2<String, byte[]> >>>>> from it? >>>>> >>>>> Best, >>>>> Flavio >>>>> >>>> >>> >