Hi Jark,

Thanks for your feedback.

> 1) Does table-store support evolve schemas multiple times during a
checkpoint?

In this case this checkpoint is split into multiple commits, e.g.:
- commit1: write 1 million rows
- commit1: write 1 million rows
- commit2: evolve mode 1
- commit3: write 1 million lines
....

Some works needs to be done on the connector side.

> 2) Does ADD COLUMN support add a NOT-NULL column?

I tend not to support it at this time.
The other strategy is to support it, but report errors when reading data
with the new shcema, which ensures that data can be read with the old
schema.

> 3) What's the matrix of type evolution? Do you support modifying a column
to any type?

For type evolution, we currently only support types that are supported by
implicit conversions. (From Flink LogicalTypeCasts)
Three modes can be supported in future to allow the user to select
- Default implicit conversions
- Allow implicit and explicit conversions
    - Throw exceptions when cast fail.
    - Return null when cast fail.

I have updated FLIP.

Best,
Jingsong

On Mon, May 9, 2022 at 8:14 PM Jark Wu <imj...@gmail.com> wrote:

> Thanks for proposing this exciting feature, Jingsong!
>
> I only have a few questions:
>
> 1) Does table-store support evolve schemas multiple times during a
> checkpoint?
> For example, cp1 -> write 1M rows (may flush file store) -> evolve schema1
> ->
> write 1M rows (may flush file store again) -> evolve schema2 -> write 1M
> rows -> cp2
>
> That means the schemas of new data files are different in this snapshot.
> Besides, it may need to register schemas before the checkpoint is complete.
>
> 2) Does ADD COLUMN support add a NOT-NULL column?
>
> 3) What's the matrix of type evolution? Do you support modifying a column
> to any type?
>
> Best,
> Jark
>
>
>
> On Mon, 9 May 2022 at 16:44, Caizhi Weng <tsreape...@gmail.com> wrote:
>
> > Hi all!
> >
> > +1 for this FLIP. By adding schema information into data files we can not
> > only support schema evolution, which is a very useful feature for data
> > storages, but also make it easier for table store to integrate with other
> > systems.
> >
> > For example timestamp type in Hive does not support precision. With this
> > extra schema information however we can directly deduce the precision of
> a
> > schema column.
> >
> > Jingsong Li <jingsongl...@gmail.com> 于2022年4月29日周五 17:54写道:
> >
> > > Hi devs,
> > >
> > > I want to start a discussion about Schema Evolution on the Flink Table
> > > Store. [1]
> > >
> > > In FLINK-21634, We plan to support many schema changes in Flink SQL.
> > > But for the current Table Store, it may result in wrong data, unclear
> > > evolutions.
> > >
> > > In general, the user has these operations for schema:
> > > - Add column: Adding a column to a table.
> > > - Modify column type.
> > > - Drop column: Drop a column.
> > > - Rename column: For example, rename the "name_1" column to "name_2".
> > >
> > > Another schema change is partition keys, the data is changing over
> > > time, for example, a table with day partition, as the business
> > > continues to grow, the new partition of the table by day will become
> > > larger and the business wants to change to hourly partitions.
> > >
> > > A simple approach is to rewrite all the existing data when modifying
> the
> > > schema.
> > > But this expensive way is not acceptable to the user, so we need to
> > > support and define it clearly.
> > > Modifying the schema does not rewrite the existing data, when reading
> > > the original data needs to evolve to the current schema.
> > >
> > > Look forward to your feedback!
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-226%3A+Introduce+Schema+Evolution+on+Table+Store
> > >
> > > Best,
> > > Jingsong
> > >
> >
>

Reply via email to