Hi! Fabian refers to the "TypeSerializerOutputFormat" [1]. You can save your types in efficient binary representation by calling 'dataset.write(new TypeSerializerOutputFormat<Type>(), "/your/file/path"); '
Greetings, Stephan [1] https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/io/TypeSerializerOutputFormat.java On Wed, Feb 4, 2015 at 12:07 AM, Fabian Hueske <fhue...@gmail.com> wrote: > Hi, > > Flink provides a CsvInputFormat which returns Tuples of the parsed fields. > The format can be configured in several ways (which fields to read, > line/field delimiters, comment prefixes, ...) The CsvInputFormat expects > that a file has a consistent format. You could configure the format for > each file format that you need to read and convert their output to your > common type using a Map function. > If the format of the files is not known ahead, you should implement your > own format (probably based on the DelimitedInputFormat). > > Flink provides a SerializedOutputFormat that writes data in the binary > representation that is also used during processing, e.g., for network > transfer and disk spilling. The data can be later read using the > SerializedInputFormat. > > Best regards, Fabian > > 2015-02-03 22:31 GMT+01:00 Vinh June <hoangthevinh....@gmail.com>: > >> Hi Flinkers, >> I am totally new to Flink and Scala. I am trying to study Flink in Scala >> for >> a project in university and ran into 2 problems, it would be great if you >> guys can give me any advice. >> >> #1 problem is that I want to read CSV files with varied fields (different >> names and number of fields), for example: >> file 1: id, name, age >> file 2: id, name, [unknown1], [unknown2] >> expected result set: id, name, age, [unknown1], [unknown2] >> >> Currently I read each file as Array, then map the array to a common class >> with Map[header, value] (since I will need to know which value belongs to >> which header) >> With this method I ran into #2 problem with output format >> >> #2 I would like to store binary info, for example, for class >> DataSet[MyClass[id: Long, array: Array[String]]] to read them later. I >> found >> FileOutputFormat might be the solution, but I can't find any example of >> how >> to define one in Scala >> >> >> >> >> >> -- >> View this message in context: >> http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/CSV-input-with-unknown-of-fields-and-Custom-output-format-tp670.html >> Sent from the Apache Flink (Incubator) User Mailing List archive. mailing >> list archive at Nabble.com. >> > >