Hi!

Fabian refers to the "TypeSerializerOutputFormat" [1]. You can save your
types in efficient binary representation by calling 'dataset.write(new
TypeSerializerOutputFormat<Type>(), "/your/file/path"); '

Greetings,
Stephan



[1]
https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/io/TypeSerializerOutputFormat.java

On Wed, Feb 4, 2015 at 12:07 AM, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi,
>
> Flink provides a CsvInputFormat which returns Tuples of the parsed fields.
> The format can be configured in several ways (which fields to read,
> line/field delimiters, comment prefixes, ...) The CsvInputFormat expects
> that a file has a consistent format. You could configure the format for
> each file format that you need to read and convert their output to your
> common type using a Map function.
> If the format of the files is not known ahead, you should implement your
> own format (probably based on the DelimitedInputFormat).
>
> Flink provides a SerializedOutputFormat that writes data in the binary
> representation that is also used during processing, e.g., for network
> transfer and disk spilling. The data can be later read using the
> SerializedInputFormat.
>
> Best regards, Fabian
>
> 2015-02-03 22:31 GMT+01:00 Vinh June <hoangthevinh....@gmail.com>:
>
>> Hi Flinkers,
>> I am totally new to Flink and Scala. I am trying to study Flink in Scala
>> for
>> a project in university and ran into 2 problems, it would be great if you
>> guys can give me any advice.
>>
>> #1 problem is that I want to read CSV files with varied fields (different
>> names and number of fields), for example:
>> file 1: id, name, age
>> file 2: id, name, [unknown1], [unknown2]
>> expected result set: id, name, age, [unknown1], [unknown2]
>>
>> Currently I read each file as Array, then map the array to a common class
>> with Map[header, value] (since I will need to know which value belongs to
>> which header)
>> With this method I ran into #2 problem with output format
>>
>> #2 I would like to store binary info, for example, for class
>> DataSet[MyClass[id: Long, array: Array[String]]] to read them later. I
>> found
>> FileOutputFormat might be the solution, but I can't find any example of
>> how
>> to define one in Scala
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/CSV-input-with-unknown-of-fields-and-Custom-output-format-tp670.html
>> Sent from the Apache Flink (Incubator) User Mailing List archive. mailing
>> list archive at Nabble.com.
>>
>
>

Reply via email to