Hi all!

+1 for this FLIP. By adding schema information into data files we can not
only support schema evolution, which is a very useful feature for data
storages, but also make it easier for table store to integrate with other
systems.

For example timestamp type in Hive does not support precision. With this
extra schema information however we can directly deduce the precision of a
schema column.

Jingsong Li <jingsongl...@gmail.com> 于2022年4月29日周五 17:54写道:

> Hi devs,
>
> I want to start a discussion about Schema Evolution on the Flink Table
> Store. [1]
>
> In FLINK-21634, We plan to support many schema changes in Flink SQL.
> But for the current Table Store, it may result in wrong data, unclear
> evolutions.
>
> In general, the user has these operations for schema:
> - Add column: Adding a column to a table.
> - Modify column type.
> - Drop column: Drop a column.
> - Rename column: For example, rename the "name_1" column to "name_2".
>
> Another schema change is partition keys, the data is changing over
> time, for example, a table with day partition, as the business
> continues to grow, the new partition of the table by day will become
> larger and the business wants to change to hourly partitions.
>
> A simple approach is to rewrite all the existing data when modifying the
> schema.
> But this expensive way is not acceptable to the user, so we need to
> support and define it clearly.
> Modifying the schema does not rewrite the existing data, when reading
> the original data needs to evolve to the current schema.
>
> Look forward to your feedback!
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-226%3A+Introduce+Schema+Evolution+on+Table+Store
>
> Best,
> Jingsong
>

Reply via email to