Thank Jark~ Hi all,
I just created a vote thread [1]. Feel free to -1 if you think there is something wrong with the design. [1] https://lists.apache.org/thread/lg5txz95mgko4mp6fqcwt1dd1hbjctjy Best, Jingsong On Thu, May 12, 2022 at 4:14 PM Jark Wu <imj...@gmail.com> wrote: > Thank Jingsong for the explanation. I don't have other concerns. > > Best, > Jark > > On Thu, 12 May 2022 at 09:53, Jingsong Li <jingsongl...@gmail.com> wrote: > > > Hi all, > > > > If there are no more comments, I'm going to start a vote. > > > > Best, > > Jingsong > > > > On Tue, May 10, 2022 at 10:37 AM Jingsong Li <jingsongl...@gmail.com> > > wrote: > > > > > Hi Jark, > > > > > > Thanks for your feedback. > > > > > > > 1) Does table-store support evolve schemas multiple times during a > > > checkpoint? > > > > > > In this case this checkpoint is split into multiple commits, e.g.: > > > - commit1: write 1 million rows > > > - commit1: write 1 million rows > > > - commit2: evolve mode 1 > > > - commit3: write 1 million lines > > > .... > > > > > > Some works needs to be done on the connector side. > > > > > > > 2) Does ADD COLUMN support add a NOT-NULL column? > > > > > > I tend not to support it at this time. > > > The other strategy is to support it, but report errors when reading > data > > > with the new shcema, which ensures that data can be read with the old > > > schema. > > > > > > > 3) What's the matrix of type evolution? Do you support modifying a > > column > > > to any type? > > > > > > For type evolution, we currently only support types that are supported > by > > > implicit conversions. (From Flink LogicalTypeCasts) > > > Three modes can be supported in future to allow the user to select > > > - Default implicit conversions > > > - Allow implicit and explicit conversions > > > - Throw exceptions when cast fail. > > > - Return null when cast fail. > > > > > > I have updated FLIP. > > > > > > Best, > > > Jingsong > > > > > > On Mon, May 9, 2022 at 8:14 PM Jark Wu <imj...@gmail.com> wrote: > > > > > >> Thanks for proposing this exciting feature, Jingsong! > > >> > > >> I only have a few questions: > > >> > > >> 1) Does table-store support evolve schemas multiple times during a > > >> checkpoint? > > >> For example, cp1 -> write 1M rows (may flush file store) -> evolve > > schema1 > > >> -> > > >> write 1M rows (may flush file store again) -> evolve schema2 -> write > 1M > > >> rows -> cp2 > > >> > > >> That means the schemas of new data files are different in this > snapshot. > > >> Besides, it may need to register schemas before the checkpoint is > > >> complete. > > >> > > >> 2) Does ADD COLUMN support add a NOT-NULL column? > > >> > > >> 3) What's the matrix of type evolution? Do you support modifying a > > column > > >> to any type? > > >> > > >> Best, > > >> Jark > > >> > > >> > > >> > > >> On Mon, 9 May 2022 at 16:44, Caizhi Weng <tsreape...@gmail.com> > wrote: > > >> > > >> > Hi all! > > >> > > > >> > +1 for this FLIP. By adding schema information into data files we > can > > >> not > > >> > only support schema evolution, which is a very useful feature for > data > > >> > storages, but also make it easier for table store to integrate with > > >> other > > >> > systems. > > >> > > > >> > For example timestamp type in Hive does not support precision. With > > this > > >> > extra schema information however we can directly deduce the > precision > > >> of a > > >> > schema column. > > >> > > > >> > Jingsong Li <jingsongl...@gmail.com> 于2022年4月29日周五 17:54写道: > > >> > > > >> > > Hi devs, > > >> > > > > >> > > I want to start a discussion about Schema Evolution on the Flink > > Table > > >> > > Store. [1] > > >> > > > > >> > > In FLINK-21634, We plan to support many schema changes in Flink > SQL. > > >> > > But for the current Table Store, it may result in wrong data, > > unclear > > >> > > evolutions. > > >> > > > > >> > > In general, the user has these operations for schema: > > >> > > - Add column: Adding a column to a table. > > >> > > - Modify column type. > > >> > > - Drop column: Drop a column. > > >> > > - Rename column: For example, rename the "name_1" column to > > "name_2". > > >> > > > > >> > > Another schema change is partition keys, the data is changing over > > >> > > time, for example, a table with day partition, as the business > > >> > > continues to grow, the new partition of the table by day will > become > > >> > > larger and the business wants to change to hourly partitions. > > >> > > > > >> > > A simple approach is to rewrite all the existing data when > modifying > > >> the > > >> > > schema. > > >> > > But this expensive way is not acceptable to the user, so we need > to > > >> > > support and define it clearly. > > >> > > Modifying the schema does not rewrite the existing data, when > > reading > > >> > > the original data needs to evolve to the current schema. > > >> > > > > >> > > Look forward to your feedback! > > >> > > > > >> > > [1] > > >> > > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-226%3A+Introduce+Schema+Evolution+on+Table+Store > > >> > > > > >> > > Best, > > >> > > Jingsong > > >> > > > > >> > > > >> > > > > > >