Hi Timo,

I don't think source should work with `CatalogTableSchema`. So far, a table
source doesn't need to know the logic information of computed column and
watermark.
IMO, we should provide a method to convert from `CatalogTableSchema` into
`TableSchema` without computed columns in source factory,
and a source should just hold the `TableSchema`.

I agree doing the intersection/diff logic is trivial, but maybe we can
provide utilities to do that? So that we can keep the interface clean.


Best,
Jark


On Thu, 2 Apr 2020 at 20:17, Timo Walther <twal...@apache.org> wrote:

> Hi Jark,
>
> if catalogs use `CatalogTableSchema` in the future. The source would
> internally also work with `CatalogTableSchema`. I'm fine with cleaning
> up the `TableSchema` class but should a source deal with two different
> schema classes then?
>
> Another problem that I see is that connectors usually need to perform
> some index arithmetics. Dealing with TableSchema and additionally within
> a field with DataType might be a bit inconvenient. A dedicated class
> with utilities might be helpful such that not every source needs to
> implement the same intersection/diff logic again.
>
> Regards,
> Timo
>
>
> On 02.04.20 14:06, Jark Wu wrote:
> > Hi Dawid,
> >
> >> How to express projections with TableSchema?
> > The TableSource holds the original TableSchema (i.e. from DDL) and the
> > pushed TableSchema represents the schema after projection.
> > Thus the table source can compare them to figure out changed field orders
> > or not matched types.
> > For most sources who maps physical storage by field names (e.g. jdbc,
> > hbase, json) they can just simply apply the pushed TableSchema.
> > But sources who maps by field indexes (e.g. csv), they need to figure out
> > the projected indexes by comparing the original and projected schema.
> > For example, the original schema is [a: String, b: Int, c: Timestamp],
> and
> > b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the
> > source can figure out index=1 is pruned.
> >
> >> How do we express projection of a nested field with TableSchema?
> > This is the same to the above one. For example, the original schema is
> [rk:
> > String, f1 Row<q1 Int, q2 Double>].
> > If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2
> > Double>].
> >
> >> TableSchema might be used at too many different places for different
> > responsibilities.
> > Agree. We have recognized that a structure and builder for pure table
> > schema is required in many places. But we mixed many concepts of catalog
> > table schema in TableSchema.
> > IIRC, in an offline discussion of FLIP-84, we want to introduce a new
> > `CatalogTableSchema` to represent the schema part of a DDL,
> > and remove all the watermark, computed column information from
> TableSchema?
> > Then `TableSchema` can continue to serve as a pure table schema and it
> > stays in a good package.
> >
> > Best,
> > Jark
> >
> >
> >
> >
> > On Thu, 2 Apr 2020 at 19:39, Timo Walther <twal...@apache.org> wrote:
> >
> >> Hi Dawid,
> >>
> >> thanks for your feedback. I agree with your concerns. I also observed
> >> that TableSchema might be used at too many different places for
> >> different responsibilities.
> >>
> >> How about we introduce a helper class for `SupportsProjectionPushDown`
> >> and also `LookupTableSource#Context#getKeys()` to represent nested
> >> structure of names. Data types, constraints, or computed columns are not
> >> necessary at those locations.
> >>
> >> We can also add utility methods for connectors to this helper class
> >> there to quickly figuring out differences between the original table
> >> schema and the new one.
> >>
> >> SelectedFields {
> >>
> >>          private LogicalType orignalRowType; // set by the planner
> >>
> >>          private int[][] indices;
> >>
> >>          getNames(int... at): String[]
> >>
> >>          getNames(String... at): String[]
> >>
> >>          getIndices(int... at): int[]
> >>
> >>          getNames(String... at): String[]
> >>
> >>          toTableSchema(): TableSchema
> >> }
> >>
> >> What do others think?
> >>
> >> Thanks,
> >> Timo
> >>
> >>
> >>
> >> On 02.04.20 12:28, Dawid Wysakowicz wrote:
> >>> Generally +1
> >>>
> >>> One slight concern I have is about the |SupportsProjectionPushDown.|I
> >>> don't necessarily understand how can we express projections with
> >>> TableSchema. It's unclear for me what happens when a type of a field
> >>> changes, fields are in a different order, when types do not match. How
> >>> do we express projection of a nested field with TableSchema?
> >>>
> >>> I don't think this changes the core design presented in the FLIP,
> >>> therefore I'm fine with accepting the FLIP. I wanted to mention my
> >>> concerns, so that maybe we can adjust the passed around structures
> >> slightly.
> >>>
> >>> Best,
> >>>
> >>> Dawid
> >>> ||
> >>>
> >>> On 30/03/2020 14:42, Leonard Xu wrote:
> >>>> +1(non-binding)
> >>>>
> >>>> Best,
> >>>> Leonard Xu
> >>>>
> >>>>> 在 2020年3月30日,16:43,Jingsong Li<jingsongl...@gmail.com>  写道:
> >>>>>
> >>>>> +1
> >>>>>
> >>>>> Best,
> >>>>> Jingsong Lee
> >>>>>
> >>>>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<k...@apache.org>  wrote:
> >>>>>
> >>>>>> +1
> >>>>>>
> >>>>>> Best,
> >>>>>> Kurt
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<libenc...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>>> +1 (non-binding)
> >>>>>>>
> >>>>>>> Jark Wu<imj...@gmail.com>  于2020年3月30日周一 下午3:57写道:
> >>>>>>>
> >>>>>>>> +1 from my side.
> >>>>>>>>
> >>>>>>>> Thanks Timo for driving this.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jark
> >>>>>>>>
> >>>>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<twal...@apache.org>
> >> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> I would like to start the vote for FLIP-95 [1], which is
> discussed
> >>>>>> and
> >>>>>>>>> reached a consensus in the discussion thread [2].
> >>>>>>>>>
> >>>>>>>>> The vote will be open until April 2nd (72h), unless there is an
> >>>>>>>>> objection or not enough votes.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Timo
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> >>>>>>>>> [2]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
> >>>>>>> --
> >>>>>>>
> >>>>>>> Benchao Li
> >>>>>>> School of Electronics Engineering and Computer Science, Peking
> >> University
> >>>>>>> Tel:+86-15650713730
> >>>>>>> Email:libenc...@gmail.com;libenc...@pku.edu.cn
> >>>>>>>
> >>>>> --
> >>>>> Best, Jingsong Lee
> >>
> >>
> >
>
>

Reply via email to