Re: [DISCUSS] FLIP-226: Introduce Schema Evolution on Table Store

Jark Wu Thu, 12 May 2022 01:14:55 -0700

Thank Jingsong for the explanation. I don't have other concerns.

Best,
Jark


On Thu, 12 May 2022 at 09:53, Jingsong Li <[email protected]> wrote:

> Hi all,
>
> If there are no more comments, I'm going to start a vote.
>
> Best,
> Jingsong
>
> On Tue, May 10, 2022 at 10:37 AM Jingsong Li <[email protected]>
> wrote:
>
> > Hi Jark,
> >
> > Thanks for your feedback.
> >
> > > 1) Does table-store support evolve schemas multiple times during a
> > checkpoint?
> >
> > In this case this checkpoint is split into multiple commits, e.g.:
> > - commit1: write 1 million rows
> > - commit1: write 1 million rows
> > - commit2: evolve mode 1
> > - commit3: write 1 million lines
> > ....
> >
> > Some works needs to be done on the connector side.
> >
> > > 2) Does ADD COLUMN support add a NOT-NULL column?
> >
> > I tend not to support it at this time.
> > The other strategy is to support it, but report errors when reading data
> > with the new shcema, which ensures that data can be read with the old
> > schema.
> >
> > > 3) What's the matrix of type evolution? Do you support modifying a
> column
> > to any type?
> >
> > For type evolution, we currently only support types that are supported by
> > implicit conversions. (From Flink LogicalTypeCasts)
> > Three modes can be supported in future to allow the user to select
> > - Default implicit conversions
> > - Allow implicit and explicit conversions
> >     - Throw exceptions when cast fail.
> >     - Return null when cast fail.
> >
> > I have updated FLIP.
> >
> > Best,
> > Jingsong
> >
> > On Mon, May 9, 2022 at 8:14 PM Jark Wu <[email protected]> wrote:
> >
> >> Thanks for proposing this exciting feature, Jingsong!
> >>
> >> I only have a few questions:
> >>
> >> 1) Does table-store support evolve schemas multiple times during a
> >> checkpoint?
> >> For example, cp1 -> write 1M rows (may flush file store) -> evolve
> schema1
> >> ->
> >> write 1M rows (may flush file store again) -> evolve schema2 -> write 1M
> >> rows -> cp2
> >>
> >> That means the schemas of new data files are different in this snapshot.
> >> Besides, it may need to register schemas before the checkpoint is
> >> complete.
> >>
> >> 2) Does ADD COLUMN support add a NOT-NULL column?
> >>
> >> 3) What's the matrix of type evolution? Do you support modifying a
> column
> >> to any type?
> >>
> >> Best,
> >> Jark
> >>
> >>
> >>
> >> On Mon, 9 May 2022 at 16:44, Caizhi Weng <[email protected]> wrote:
> >>
> >> > Hi all!
> >> >
> >> > +1 for this FLIP. By adding schema information into data files we can
> >> not
> >> > only support schema evolution, which is a very useful feature for data
> >> > storages, but also make it easier for table store to integrate with
> >> other
> >> > systems.
> >> >
> >> > For example timestamp type in Hive does not support precision. With
> this
> >> > extra schema information however we can directly deduce the precision
> >> of a
> >> > schema column.
> >> >
> >> > Jingsong Li <[email protected]> 于2022年4月29日周五 17:54写道：
> >> >
> >> > > Hi devs,
> >> > >
> >> > > I want to start a discussion about Schema Evolution on the Flink
> Table
> >> > > Store. [1]
> >> > >
> >> > > In FLINK-21634, We plan to support many schema changes in Flink SQL.
> >> > > But for the current Table Store, it may result in wrong data,
> unclear
> >> > > evolutions.
> >> > >
> >> > > In general, the user has these operations for schema:
> >> > > - Add column: Adding a column to a table.
> >> > > - Modify column type.
> >> > > - Drop column: Drop a column.
> >> > > - Rename column: For example, rename the "name_1" column to
> "name_2".
> >> > >
> >> > > Another schema change is partition keys, the data is changing over
> >> > > time, for example, a table with day partition, as the business
> >> > > continues to grow, the new partition of the table by day will become
> >> > > larger and the business wants to change to hourly partitions.
> >> > >
> >> > > A simple approach is to rewrite all the existing data when modifying
> >> the
> >> > > schema.
> >> > > But this expensive way is not acceptable to the user, so we need to
> >> > > support and define it clearly.
> >> > > Modifying the schema does not rewrite the existing data, when
> reading
> >> > > the original data needs to evolve to the current schema.
> >> > >
> >> > > Look forward to your feedback!
> >> > >
> >> > > [1]
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-226%3A+Introduce+Schema+Evolution+on+Table+Store
> >> > >
> >> > > Best,
> >> > > Jingsong
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] FLIP-226: Introduce Schema Evolution on Table Store

Reply via email to