Regarding to ConnectTableDescriptor, if in the end it becomes a builder of CatalogTable, I would be ok with that. But it doesn't look like a builder now, it's more like another form of TableSource/TableSink.
Best, Kurt On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <twal...@apache.org> wrote: > Hi everyone, > > thanks for the great discussion with a good outcome. > > I have one last comment about: > > void createTemporaryFunction(String path, UserDefinedFunction function); > > We should encourage users to register a class instead of an instance. We > should enforce a default constructor in the future. If we need metadata > or type inference, the planner can simply instantiate the class and call > the corresponding getters. If people would like to parameterize a > function, they can use global job parameters instead - which are > available in the open() method. > > I suggest: > > void createTemporaryFunction(String path, Class<? extends > UserDefinedFunction> functionClass); > > This is also closer to the SQL DDL that is about to be implemented in: > https://issues.apache.org/jira/browse/FLINK-7151 > > Thanks, > Timo > > > On 09.10.19 17:07, Bowen Li wrote: > > Hi Dawid, > > > > +1 for proposed changes > > > > On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <dwysakow...@apache.org > > > > wrote: > > > >> Sorry for a very delayed response. > >> > >> @Kurt Yes, this is the goal to have a function created like new > >> Function(...) also be wrapped into CatalogFunction. This would have to > >> be though a temporary function as we cannot represent that as a set of > >> properties. Similar to the createTemporaryView(DataStream stream). > >> > >> As for the ConnectTableDescriptor I agree this is very similar to > >> CatalogTable. I am not sure though if we should get rid of it. In the > >> end I see it as a builder for a CatalogTable, which is a slightly more > >> internal API, but we might revisit that some time in the future if we > >> find that it makes more sense. > >> > >> @All I updated the FLIP page with some more details from the outcome of > >> the discussions around FLIP-57. Please take a look. I would like to > >> start a vote on this FLIP as soon as the vote on FLIP-57 goes through. > >> > >> Best, > >> > >> Dawid > >> > >> > >> On 19/09/2019 09:24, Kurt Young wrote: > >>> IIUC it's good to see that both serializable (tables description from > >> DDL) > >>> and unserializable (tables with DataStream underneath) tables are > treated > >>> unify with CatalogTable. > >>> > >>> Can I also assume functions that either come from a function class > (from > >>> DDL) > >>> or function objects (newed by user) will also treated unify with > >>> CatalogFunction? > >>> > >>> This will greatly simplify and unify current API level concepts and > >> design. > >>> And it seems only one thing left, how do we deal with > >>> ConnectTableDescriptor? > >>> It's actually very similar with serializable CatalogTable, both carry > >> some > >>> text > >>> properties which even are the same. Is there any chance we can further > >> unify > >>> this to CatalogTable? > >>> > >>> object > >>> Best, > >>> Kurt > >>> > >>> > >>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <imj...@gmail.com> wrote: > >>> > >>>> Thanks Dawid for the design doc. > >>>> > >>>> In general, I’m +1 to the FLIP. > >>>> > >>>> > >>>> +1 to the single-string and parse way to express object path. > >>>> > >>>> +1 to deprecate registerTableSink & registerTableSource. > >>>> But I would suggest to provide an easy way to register a custom > >>>> source/sink before we drop them (this is another story). > >>>> Currently, it’s not easy to implement a custom connector descriptor. > >>>> > >>>> Best, > >>>> Jark > >>>> > >>>> > >>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <wysakowicz.da...@gmail.com> 写道: > >>>>> > >>>>> Hi JingsongLee, > >>>>> From my understanding they can. Underneath they will be > CatalogTables. > >>>> The > >>>>> difference is the lifetime of the tables. Plus some of the user > facing > >>>>> interfaces cannot be persisted e.g. datastream. Therefore we must > have > >> a > >>>>> separate methods for that. In the end the temporary tables are held > in > >>>>> memory as CatalogTables. > >>>>> Best, > >>>>> Dawid > >>>>> > >>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <lzljs3620...@aliyun.com > >>>> .invalid> > >>>>> wrote: > >>>>> > >>>>>> Hi dawid: > >>>>>> Can temporary tables achieve the same capabilities as catalog table? > >>>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, > >>>>>> PartitionStatistics > >>>>>> like partition support: we have added some catalog equivalent > >> interfaces > >>>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames > >>>>>> Maybe it's not a good idea to add these interfaces to > >>>>>> TableSource/TableSink. What do you think? > >>>>>> > >>>>>> Best, > >>>>>> Jingsong Lee > >>>>>> > >>>>>> > >>>>>> ------------------------------------------------------------------ > >>>>>> From:Kurt Young <ykt...@gmail.com> > >>>>>> Send Time:2019年9月18日(星期三) 17:54 > >>>>>> To:dev <dev@flink.apache.org> > >>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > Table > >>>>>> module > >>>>>> > >>>>>> Hi all, > >>>>>> > >>>>>> Sorry to join this party late. Big +1 to this flip, especially for > the > >>>>>> dropping > >>>>>> "registerTableSink & registerTableSource" part. These are indeed > >> legacy > >>>>>> and we should try to unify them through CatalogTable after we > >> introduce > >>>>>> the concept of Catalog. > >>>>>> > >>>>>> From my understanding, what we can registered should all be > metadata, > >>>>>> TableSource/TableSink should only be the one who is responsible to > do > >>>>>> the real work, i.e. reading and writing data according to the schema > >> and > >>>>>> other information like computed column, partition, .e.g. > >>>>>> > >>>>>> Best, > >>>>>> Kurt > >>>>>> > >>>>>> > >>>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee < > lzljs3620...@aliyun.com > >>>>>> .invalid> > >>>>>> wrote: > >>>>>> > >>>>>>> After some development and thinking, I have a general > understanding. > >>>>>>> +1 to registering a source/sink does not fit into the SQL world. > >>>>>>> I am OK to have a deprecated registerTemporarySource/Sink to > >> compatible > >>>>>>> with old ways. > >>>>>>> > >>>>>>> Best, > >>>>>>> Jingsong Lee > >>>>>>> > >>>>>>> > >>>>>>> ------------------------------------------------------------------ > >>>>>>> From:Timo Walther <twal...@apache.org> > >>>>>>> Send Time:2019年9月17日(星期二) 08:00 > >>>>>>> To:dev <dev@flink.apache.org> > >>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > Table > >>>>>>> module > >>>>>>> > >>>>>>> Hi Dawid, > >>>>>>> > >>>>>>> thanks for the design document. It fixes big concept gaps due to > >>>>>>> historical reasons with proper support for serializability and > >> catalog > >>>>>>> support in mind. > >>>>>>> > >>>>>>> I would not mind a registerTemporarySource/Sink, but the problem > >> that I > >>>>>>> see is that many people think that this is the recommended way of > >>>>>>> registering a table source/sink which is not true. We should guide > >>>> users > >>>>>>> to either use connect() or DDL API which can be validated and > stored > >> in > >>>>>>> catalog. > >>>>>>> > >>>>>>> Also from a concept perspective, registering a source/sink does not > >> fit > >>>>>>> into the SQL world. SQL does not know about source/sinks but only > >> about > >>>>>>> tables. If the responsibility of a TableSource/TableSink is just a > >> pure > >>>>>>> physical data consumer/producer that is not connected to the actual > >>>>>>> logical table schema, we would need a possibility of defining time > >>>>>>> attributes and interpreting/converting a changelog. This should be > >> done > >>>>>>> by the framework with information from the DDL/connect() and not be > >>>>>>> defined in every table source. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Timo > >>>>>>> > >>>>>>> > >>>>>>> On 09.09.19 14:16, JingsongLee wrote: > >>>>>>>> Hi dawid: > >>>>>>>> > >>>>>>>> It is difficult to describe specific examples. > >>>>>>>> Sometimes users will generate some java converters through some > >>>>>>>> Java code, or generate some Java classes through third-party > >>>>>>>> libraries. Of course, these can be best done through properties. > >>>>>>>> But this requires additional work from users.My suggestion is to > >>>>>>>> keep this Java instance class way that is user-friendly. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jingsong Lee > >>>>>>>> > >>>>>>>> > >>>>>>>> ------------------------------------------------------------------ > >>>>>>>> From:Dawid Wysakowicz <dwysakow...@apache.org> > >>>>>>>> Send Time:2019年9月6日(星期五) 16:21 > >>>>>>>> To:dev <dev@flink.apache.org> > >>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >> Table > >>>>>>> module > >>>>>>>> Hi all, > >>>>>>>> @Jingsong Could you elaborate a bit more what do you mean by > >>>>>>>> "some Connectors are difficult to convert all states to > properties" > >>>>>>>> All the Flink provided connectors will definitely be expressible > >> with > >>>>>>> properties (In the end you should be able to use them from DDL). I > >>>> think > >>>>>> if > >>>>>>> a TableSource is complex enough that it handles filter push down, > >>>>>> partition > >>>>>>> support etc. should rather be made available both from DDL & > >> java/scala > >>>>>>> code. I'm happy to reconsider adding registerTemporaryTable(String > >>>> path, > >>>>>>> TableSource source) if you have some concrete examples in mind. > >>>>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually > >>>>>> introducing > >>>>>>> a new identifier representation to differentiate between resolved > and > >>>>>>> unresolved identifiers) with the same concerns. We decided to > suggest > >>>> the > >>>>>>> string & parsing logic because of usability. > >>>>>>>> tEnv.from("cat.db.table") > >>>>>>>> is shorter and easier to write than > >>>>>>>> tEnv.from(Identifier.for("cat", "db", "name") > >>>>>>>> And also implicitly solves the problem what happens if a user > (e.g. > >>>>>> used > >>>>>>> to other systems) uses that API in a following manner: > >>>>>>>> tEnv.from(Identifier.for("db.name") > >>>>>>>> I'm happy to revisit it if the general consensus is that it's > better > >>>> to > >>>>>>> use the OO aproach. > >>>>>>>> Best, > >>>>>>>> Dawid > >>>>>>>> > >>>>>>>> On 06/09/2019 10:00, Xuefu Z wrote: > >>>>>>>> > >>>>>>>> Thanks to Dawid for starting the discussion and writeup. It looks > >>>>>> pretty > >>>>>>>> good to me except that I'm a little concerned about the object > >>>>>> reference > >>>>>>>> and string parsing in the code, which seems to an anti-pattern to > >> OOP. > >>>>>>> Have > >>>>>>>> we considered using ObjectIdenitifier with optional catalog and db > >>>>>> parts, > >>>>>>>> esp. if we are worried about arguments of variable length or > method > >>>>>>>> overloading? It's quite likely that the result of string parsing > is > >> an > >>>>>>>> ObjectIdentifier instance any way. > >>>>>>>> > >>>>>>>> Having string parsing logic in the code is a little dangerous as > it > >>>>>>>> duplicates part of the DDL/DML parsing, and they can easily get > out > >> of > >>>>>>> sync. > >>>>>>>> Thanks, > >>>>>>>> Xuefu > >>>>>>>> > >>>>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee < > lzljs3620...@aliyun.com > >>>>>>> .invalid> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> Thanks dawid, +1 for this approach. > >>>>>>>> > >>>>>>>> One concern is the removal of registerTableSink & > >> registerTableSource > >>>>>>>> in TableEnvironment. It has two alternatives: > >>>>>>>> 1.the properties approach (DDL, descriptor). > >>>>>>>> 2.from/toDataStream. > >>>>>>>> > >>>>>>>> #1 can only be properties, not java states, and some Connectors > >>>>>>>> are difficult to convert all states to properties. > >>>>>>>> #2 can contain java state. But can't use TableSource-related > >> features, > >>>>>>>> like project & filter push down, partition support, etc.. > >>>>>>>> > >>>>>>>> Any idea about this? > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jingsong Lee > >>>>>>>> > >>>>>>>> > >>>>>>>> ------------------------------------------------------------------ > >>>>>>>> From:Dawid Wysakowicz <dwysakow...@apache.org> > >>>>>>>> Send Time:2019年9月4日(星期三) 22:20 > >>>>>>>> To:dev <dev@flink.apache.org> > >>>>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table > >>>>>> module > >>>>>>>> Hi all, > >>>>>>>> As part of FLIP-30 a Catalog API was introduced that enables > storing > >>>>>>> table > >>>>>>>> meta objects permanently. At the same time the majority of current > >>>> APIs > >>>>>>>> create temporary objects that cannot be serialized. We should > >> clarify > >>>>>> the > >>>>>>>> creation of meta objects (tables, views, functions) in a unified > >> way. > >>>>>>>> Another current problem in the API is that all the temporary > objects > >>>>>> are > >>>>>>>> stored in a special built-in catalog, which is not very intuitive > >> for > >>>>>>> many > >>>>>>>> users, as they must be aware of that catalog to reference > temporary > >>>>>>> objects. > >>>>>>>> Lastly, different APIs have different ways of providing object > >> paths: > >>>>>>>> String path…, > >>>>>>>> String path, String pathContinued… > >>>>>>>> String name > >>>>>>>> We should choose one approach and unify it across all APIs. > >>>>>>>> I suggest a FLIP to address the above issues. > >>>>>>>> Looking forward to your opinions. > >>>>>>>> FLIP link: > >>>>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > >> > >