Hi Timo, I don't think source should work with `CatalogTableSchema`. So far, a table source doesn't need to know the logic information of computed column and watermark. IMO, we should provide a method to convert from `CatalogTableSchema` into `TableSchema` without computed columns in source factory, and a source should just hold the `TableSchema`.
I agree doing the intersection/diff logic is trivial, but maybe we can provide utilities to do that? So that we can keep the interface clean. Best, Jark On Thu, 2 Apr 2020 at 20:17, Timo Walther <twal...@apache.org> wrote: > Hi Jark, > > if catalogs use `CatalogTableSchema` in the future. The source would > internally also work with `CatalogTableSchema`. I'm fine with cleaning > up the `TableSchema` class but should a source deal with two different > schema classes then? > > Another problem that I see is that connectors usually need to perform > some index arithmetics. Dealing with TableSchema and additionally within > a field with DataType might be a bit inconvenient. A dedicated class > with utilities might be helpful such that not every source needs to > implement the same intersection/diff logic again. > > Regards, > Timo > > > On 02.04.20 14:06, Jark Wu wrote: > > Hi Dawid, > > > >> How to express projections with TableSchema? > > The TableSource holds the original TableSchema (i.e. from DDL) and the > > pushed TableSchema represents the schema after projection. > > Thus the table source can compare them to figure out changed field orders > > or not matched types. > > For most sources who maps physical storage by field names (e.g. jdbc, > > hbase, json) they can just simply apply the pushed TableSchema. > > But sources who maps by field indexes (e.g. csv), they need to figure out > > the projected indexes by comparing the original and projected schema. > > For example, the original schema is [a: String, b: Int, c: Timestamp], > and > > b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the > > source can figure out index=1 is pruned. > > > >> How do we express projection of a nested field with TableSchema? > > This is the same to the above one. For example, the original schema is > [rk: > > String, f1 Row<q1 Int, q2 Double>]. > > If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2 > > Double>]. > > > >> TableSchema might be used at too many different places for different > > responsibilities. > > Agree. We have recognized that a structure and builder for pure table > > schema is required in many places. But we mixed many concepts of catalog > > table schema in TableSchema. > > IIRC, in an offline discussion of FLIP-84, we want to introduce a new > > `CatalogTableSchema` to represent the schema part of a DDL, > > and remove all the watermark, computed column information from > TableSchema? > > Then `TableSchema` can continue to serve as a pure table schema and it > > stays in a good package. > > > > Best, > > Jark > > > > > > > > > > On Thu, 2 Apr 2020 at 19:39, Timo Walther <twal...@apache.org> wrote: > > > >> Hi Dawid, > >> > >> thanks for your feedback. I agree with your concerns. I also observed > >> that TableSchema might be used at too many different places for > >> different responsibilities. > >> > >> How about we introduce a helper class for `SupportsProjectionPushDown` > >> and also `LookupTableSource#Context#getKeys()` to represent nested > >> structure of names. Data types, constraints, or computed columns are not > >> necessary at those locations. > >> > >> We can also add utility methods for connectors to this helper class > >> there to quickly figuring out differences between the original table > >> schema and the new one. > >> > >> SelectedFields { > >> > >> private LogicalType orignalRowType; // set by the planner > >> > >> private int[][] indices; > >> > >> getNames(int... at): String[] > >> > >> getNames(String... at): String[] > >> > >> getIndices(int... at): int[] > >> > >> getNames(String... at): String[] > >> > >> toTableSchema(): TableSchema > >> } > >> > >> What do others think? > >> > >> Thanks, > >> Timo > >> > >> > >> > >> On 02.04.20 12:28, Dawid Wysakowicz wrote: > >>> Generally +1 > >>> > >>> One slight concern I have is about the |SupportsProjectionPushDown.|I > >>> don't necessarily understand how can we express projections with > >>> TableSchema. It's unclear for me what happens when a type of a field > >>> changes, fields are in a different order, when types do not match. How > >>> do we express projection of a nested field with TableSchema? > >>> > >>> I don't think this changes the core design presented in the FLIP, > >>> therefore I'm fine with accepting the FLIP. I wanted to mention my > >>> concerns, so that maybe we can adjust the passed around structures > >> slightly. > >>> > >>> Best, > >>> > >>> Dawid > >>> || > >>> > >>> On 30/03/2020 14:42, Leonard Xu wrote: > >>>> +1(non-binding) > >>>> > >>>> Best, > >>>> Leonard Xu > >>>> > >>>>> 在 2020年3月30日,16:43,Jingsong Li<jingsongl...@gmail.com> 写道: > >>>>> > >>>>> +1 > >>>>> > >>>>> Best, > >>>>> Jingsong Lee > >>>>> > >>>>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<k...@apache.org> wrote: > >>>>> > >>>>>> +1 > >>>>>> > >>>>>> Best, > >>>>>> Kurt > >>>>>> > >>>>>> > >>>>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<libenc...@gmail.com> > >> wrote: > >>>>>> > >>>>>>> +1 (non-binding) > >>>>>>> > >>>>>>> Jark Wu<imj...@gmail.com> 于2020年3月30日周一 下午3:57写道: > >>>>>>> > >>>>>>>> +1 from my side. > >>>>>>>> > >>>>>>>> Thanks Timo for driving this. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jark > >>>>>>>> > >>>>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<twal...@apache.org> > >> wrote: > >>>>>>>> > >>>>>>>>> Hi all, > >>>>>>>>> > >>>>>>>>> I would like to start the vote for FLIP-95 [1], which is > discussed > >>>>>> and > >>>>>>>>> reached a consensus in the discussion thread [2]. > >>>>>>>>> > >>>>>>>>> The vote will be open until April 2nd (72h), unless there is an > >>>>>>>>> objection or not enough votes. > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Timo > >>>>>>>>> > >>>>>>>>> [1] > >>>>>>>>> > >>>>>>>>> > >>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces > >>>>>>>>> [2] > >>>>>>>>> > >>>>>>>>> > >>>>>> > >> > https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E > >>>>>>> -- > >>>>>>> > >>>>>>> Benchao Li > >>>>>>> School of Electronics Engineering and Computer Science, Peking > >> University > >>>>>>> Tel:+86-15650713730 > >>>>>>> Email:libenc...@gmail.com;libenc...@pku.edu.cn > >>>>>>> > >>>>> -- > >>>>> Best, Jingsong Lee > >> > >> > > > >