[jira] [Commented] (FLINK-3871) Add Kafka TableSource with Avro serialization

ASF GitHub Bot (JIRA) Thu, 08 Dec 2016 00:12:13 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731504#comment-15731504
 ]


ASF GitHub Bot commented on FLINK-3871:
---------------------------------------

Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/2762
  
    Hi @mushketyk, sorry for the concise description before. The problem is the 
following:
    
    A `TableSource` provides schema information for the Table it produces. 
However, the methods `TableSource.getFieldNames()` and 
`TableSouce.getFieldTypes()` return flat arrays which are interpreted as flat 
schema (the second field in the array represents the second table attribute) 
without any nesting.
    
    Avro and many other storage formats (JSON, Parquet, ...) support nested 
data structures. With the current limitation of the `TableSource` interface, we 
would need to convert the nested data into a flat schema. However, the Table 
API and SQL support processing of nested data and it would be a much better 
integration to pass Avro objects in their original structure into the Table API 
/ SQL query (see my updated proposal for 
[FLINK-3871](https://issues.apache.org/jira/browse/FLINK-3871). 
    
    In order to be able to create a Table of nested Avro data, we need to 
improve the `TableSource` interface first. Once this is done, we can continue 
with this PR.
    
    I'm very sorry, that I did not think about this earlier and the effort you 
already put into this issue.
    Please let me know if you have any questions. 
    Best, Fabian


> Add Kafka TableSource with Avro serialization
> ---------------------------------------------
>
>                 Key: FLINK-3871
>                 URL: https://issues.apache.org/jira/browse/FLINK-3871
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>
> Add a Kafka TableSource which supports Avro serialized data.
> The KafkaAvroTableSource should support two modes:
> # SpecificRecord Mode: In this case the user specifies a class which was 
> code-generated by Avro depending on a schema. Flink treats these classes as 
> regular POJOs. Hence, they are also natively supported by the Table API and 
> SQL. Classes generated by Avro contain their Schema in a static field. The 
> schema should be used to automatically derive field names and types. Hence, 
> there is no additional information required than the name of the class.
> # GenericRecord Mode: In this case the user specifies an Avro Schema. The 
> schema is used to deserialize the data into a GenericRecord which must be 
> translated into possibly nested {{Row}} based on the schema information. 
> Again, the Avro Schema is used to automatically derive the field names and 
> types. This mode is less efficient than the SpecificRecord mode because the 
> {{GenericRecord}} needs to be converted into {{Row}}.
> This feature depends on FLINK-5280, i.e., support for nested data in 
> {{TableSource}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3871) Add Kafka TableSource with Avro serialization

Reply via email to