sjwiesman commented on a change in pull request #9161: [FLINK-13262][docs] Add documentation for the new Table & SQL API type system URL: https://github.com/apache/flink/pull/9161#discussion_r305025804
########## File path: docs/dev/table/types.md ########## @@ -0,0 +1,1201 @@ +--- +title: "Data Types" +nav-parent_id: tableapi +nav-pos: 1 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Due to historical reasons, the data types of Flink's Table & SQL API were closely coupled to Flink's +`TypeInformation` before Flink 1.9. `TypeInformation` is used in DataSet and DataStream API and is +sufficient to describe all information needed to serialize and deserialize JVM-based objects in a +distributed setting. + +However, `TypeInformation` was not designed to properly represent logical types independent of an +actual JVM class. In the past, it was difficult to properly map SQL standard types to this abstraction. +Furthermore, some types were not SQL-compliant and were introduced without a bigger picture in mind. + +Starting with Flink 1.9, the Table & SQL API will receive a new type system that serves as a long-term +solution for API stablility and standard compliance. + +Reworking the type system is a major effort that touches almost all user-facing interfaces. Therefore, its introduction +spans multiple releases and the community aims to finish this effort by Flink 1.10. + +Due to the simultaneous addition of a new planner for table programs (see [FLINK-11439](https://issues.apache.org/jira/browse/FLINK-11439)), +not every combination of planner and data type is supported. Furthermore, planners might not support every +data type with the desired precision or parameter. + +<span class="label label-danger">Attention</span> Please see the planner compatibility table and limitations +section before using a data type. + +* This will be replaced by the TOC +{:toc} + +Data Type +--------- + +A *data type* describes the data type of a value in the table ecosystem. It can be used to declare input and/or +output types of operations. + +Flink's data types are similar to the SQL standard's *data type* terminology but also contain information +about the nullability of a value for efficient handling of scalar expressions. + +Examples of data types are: +- `INT` +- `INT NOT NULL` +- `INTERVAL DAY TO SECOND(3)` +- `ROW<myField ARRAY<BOOLEAN>, myOtherField TIMESTAMP(3)>` + +A list of all pre-defined data types can be found in [below](#list-of-data-types). + +### Data Types in the Table API + +Users of the JVM-based API are dealing with instances of `org.apache.flink.table.types.DataType` within the Table API or when +defining connectors, catalogs, or user-defined functions. + +A `DataType` instance has two responsibilities: +- **Declaration of a logical type** which does not imply a concrete physical representation for transmission +or storage but defines the boundaries between JVM-based languages and the table ecosystem. +- *Optional:* **Giving hints about the physical representation of data to the planner** which is useful at the edges to other APIs . + +For JVM-based languages, all pre-defined data types are available in `org.apache.flink.table.api.DataTypes`. + +It is recommended to add a star import to your table programs for having a fluent API: + +<div class="codetabs" markdown="1"> + +<div data-lang="Java" markdown="1"> +{% highlight java %} +import static org.apache.flink.table.api.DataTypes.*; + +DataType t = INTERVAL(DAY(), SECOND(3)); +{% endhighlight %} +</div> + +<div data-lang="Scala" markdown="1"> +{% highlight scala %} +import org.apache.flink.table.api.DataTypes._ + +val t: DataType = INTERVAL(DAY(), SECOND(3)); +{% endhighlight %} +</div> + +</div> + +#### Physical Hints + +Physical hints are required at the edges of the table ecosystem. Hints indicate the data format that an implementation +expects. + +For example, a data source could express that it produces values for logical `TIMESTAMP`s using a `java.sql.Timestamp` class +instead of using `java.time.LocalDateTime` which would be the default. With this information, the runtime is able to convert +the produced class into its internal data format. In return, a data sink can declare the data format it consumes from the runtime. + +Here are some examples of how to declare a bridging conversion class: + +<div class="codetabs" markdown="1"> + +<div data-lang="Java" markdown="1"> +{% highlight java %} +// tell the runtime to not produce or consume java.time.LocalDateTime instances +// but java.sql.Timestamp +DataType t = DataTypes.TIMESTAMP(3).bridgedTo(java.sql.Timestamp.class); + +// tell the runtime to not produce or consume boxed integer arrays +// but primitive int arrays +DataType t = DataTypes.ARRAY(DataTypes.INT().notNull()).bridgedTo(int[].class); +{% endhighlight %} +</div> + +<div data-lang="Scala" markdown="1"> +{% highlight scala %} +// tell the runtime to not produce or consume java.time.LocalDateTime instances +// but java.sql.Timestamp +val t: DataType = DataTypes.TIMESTAMP(3).bridgedTo(classOf[java.sql.Timestamp]); + +// tell the runtime to not produce or consume boxed integer arrays +// but primitive int arrays +val t: DataType = DataTypes.ARRAY(DataTypes.INT().notNull()).bridgedTo(classOf[Array[Int]]); +{% endhighlight %} +</div> + +</div> + +<span class="label label-danger">Attention</span> Please note that physical hints are usually only required if the +API is extended. Users of predefined sources/sinks/functions do not need to define such hints. Hints within +a table program (e.g. `field.cast(TIMESTAMP(3).bridgedTo(Timestamp.class))`) are ignored. + +Planner Compatibility +--------------------- + +As mentioned in the introduction, reworking the type system will span multiple releases and the support of each data Review comment: ```suggestion As mentioned in the introduction, reworking the type system will span multiple releases, and the support of each data ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services