Re: [DISCUSS] FLIP-226: Introduce Schema Evolution on Table Store

Jark Wu Mon, 09 May 2022 05:13:50 -0700

Thanks for proposing this exciting feature, Jingsong!

I only have a few questions:


1) Does table-store support evolve schemas multiple times during a
checkpoint?
For example, cp1 -> write 1M rows (may flush file store) -> evolve schema1
->
write 1M rows (may flush file store again) -> evolve schema2 -> write 1M
rows -> cp2

That means the schemas of new data files are different in this snapshot.
Besides, it may need to register schemas before the checkpoint is complete.

2) Does ADD COLUMN support add a NOT-NULL column?

3) What's the matrix of type evolution? Do you support modifying a column
to any type?

Best,
Jark



On Mon, 9 May 2022 at 16:44, Caizhi Weng <tsreape...@gmail.com> wrote:

> Hi all!
>
> +1 for this FLIP. By adding schema information into data files we can not
> only support schema evolution, which is a very useful feature for data
> storages, but also make it easier for table store to integrate with other
> systems.
>
> For example timestamp type in Hive does not support precision. With this
> extra schema information however we can directly deduce the precision of a
> schema column.
>
> Jingsong Li <jingsongl...@gmail.com> 于2022年4月29日周五 17:54写道：
>
> > Hi devs,
> >
> > I want to start a discussion about Schema Evolution on the Flink Table
> > Store. [1]
> >
> > In FLINK-21634, We plan to support many schema changes in Flink SQL.
> > But for the current Table Store, it may result in wrong data, unclear
> > evolutions.
> >
> > In general, the user has these operations for schema:
> > - Add column: Adding a column to a table.
> > - Modify column type.
> > - Drop column: Drop a column.
> > - Rename column: For example, rename the "name_1" column to "name_2".
> >
> > Another schema change is partition keys, the data is changing over
> > time, for example, a table with day partition, as the business
> > continues to grow, the new partition of the table by day will become
> > larger and the business wants to change to hourly partitions.
> >
> > A simple approach is to rewrite all the existing data when modifying the
> > schema.
> > But this expensive way is not acceptable to the user, so we need to
> > support and define it clearly.
> > Modifying the schema does not rewrite the existing data, when reading
> > the original data needs to evolve to the current schema.
> >
> > Look forward to your feedback!
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-226%3A+Introduce+Schema+Evolution+on+Table+Store
> >
> > Best,
> > Jingsong
> >
>

Re: [DISCUSS] FLIP-226: Introduce Schema Evolution on Table Store

Reply via email to