[ 
https://issues.apache.org/jira/browse/FLINK-29267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther updated FLINK-29267:
---------------------------------
    Description: 
Many connectors and formats require supporting external data types. Postgres 
users request UUID support, Avro users require enum support, etc.

FLINK-19869 implemented support for Postgres UUIDs poorly and even impacts 
performance with regular strings.

The long-term solution should be user-defined types in Flink. This is however a 
bigger effort that requires a FLIP and a bigger amount of resources.

As a mid-term solution, we should offer a consistent approach based on DDL 
options that allows to define a mapping from Flink type system to the external 
type system. I suggest the following:

{code}
CREATE TABLE MyTable (
...
) WITH(
  'mapping.data-types' = '<Flink field name>: <External field data type>'
)
{code}

The mapping defines a map from Flink data type to external data type. The 
external data type should be string parsable. This works for most connectors 
and formats (e.g. Avro schema string).


Examples:

{code}
CREATE TABLE MyTable (
  regular_col STRING,
  uuid_col STRING,
  point_col ARRAY<DOUBLE>,
  box_col ARRAY<ARRAY<DOUBLE>>
) WITH(
  'mapping.data-types' = 'uuid_col: uuid, point_col: point, box_col: box'
)
{code}

We provide a table of supported mapping data types. E.g. the {{point}} type is 
always maped to {{ARRAY<DOUBLE>}}. In general we choose a data type in Flink 
that comes closest to the required functionality.


Future work:

In theory, we can also offer mapping of field names. It might be a requirement 
that Flink's column name is different from the external system's one. 

{code}
CREATE TABLE MyTable (
...
) WITH(
  'mapping.names' = '<Flink field name>: <External field name>'
)
{code}

  was:
Many connectors and formats require supporting external data types. Postgres 
users request UUID support, Avro users require enum support, etc.

FLINK-19869 implemented support for Postgres UUIDs poorly and event impacts 
pipelines with regular strings.

The long-term solution should be user-defined types in Flink. This is however a 
bigger effort that requires a FLIP and a bigger amount of resources.

As a mid-term solution, we should offer a consistent approach based on DDL 
options that allows to define a mapping from Flink type system to the external 
type system. I suggest the following:

{code}
CREATE TABLE MyTable (
...
) WITH(
  'mapping.data-types' = '<Flink field name>: <External field data type>'
)
{code}

The mapping defines a map from Flink data type to external data type. The 
external data type should be string parsable. This works for most connectors 
and formats (e.g. Avro schema string).


Examples:

{code}
CREATE TABLE MyTable (
  regular_col STRING,
  uuid_col STRING,
  point_col ARRAY<DOUBLE>,
  box_col ARRAY<ARRAY<DOUBLE>>
) WITH(
  'mapping.data-types' = 'uuid_col: uuid, point_col: point, box_col: box'
)
{code}

We provide a table of supported mapping data types. E.g. the {{point}} type is 
always maped to {{ARRAY<DOUBLE>}}. In general we choose a data type in Flink 
that comes closest to the required functionality.


Future work:

In theory, we can also offer mapping of field names. It might be a requirement 
that Flink's column name is different from the external system's one. 

{code}
CREATE TABLE MyTable (
...
) WITH(
  'mapping.names' = '<Flink field name>: <External field name>'
)
{code}


> Support external type systems in DDL
> ------------------------------------
>
>                 Key: FLINK-29267
>                 URL: https://issues.apache.org/jira/browse/FLINK-29267
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / JDBC, Formats (JSON, Avro, Parquet, ORC, 
> SequenceFile), Table SQL / Ecosystem
>            Reporter: Timo Walther
>            Assignee: Timo Walther
>            Priority: Major
>
> Many connectors and formats require supporting external data types. Postgres 
> users request UUID support, Avro users require enum support, etc.
> FLINK-19869 implemented support for Postgres UUIDs poorly and even impacts 
> performance with regular strings.
> The long-term solution should be user-defined types in Flink. This is however 
> a bigger effort that requires a FLIP and a bigger amount of resources.
> As a mid-term solution, we should offer a consistent approach based on DDL 
> options that allows to define a mapping from Flink type system to the 
> external type system. I suggest the following:
> {code}
> CREATE TABLE MyTable (
> ...
> ) WITH(
>   'mapping.data-types' = '<Flink field name>: <External field data type>'
> )
> {code}
> The mapping defines a map from Flink data type to external data type. The 
> external data type should be string parsable. This works for most connectors 
> and formats (e.g. Avro schema string).
> Examples:
> {code}
> CREATE TABLE MyTable (
>   regular_col STRING,
>   uuid_col STRING,
>   point_col ARRAY<DOUBLE>,
>   box_col ARRAY<ARRAY<DOUBLE>>
> ) WITH(
>   'mapping.data-types' = 'uuid_col: uuid, point_col: point, box_col: box'
> )
> {code}
> We provide a table of supported mapping data types. E.g. the {{point}} type 
> is always maped to {{ARRAY<DOUBLE>}}. In general we choose a data type in 
> Flink that comes closest to the required functionality.
> Future work:
> In theory, we can also offer mapping of field names. It might be a 
> requirement that Flink's column name is different from the external system's 
> one. 
> {code}
> CREATE TABLE MyTable (
> ...
> ) WITH(
>   'mapping.names' = '<Flink field name>: <External field name>'
> )
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to