Thanks for the comments. @Timo I think your suggestion makes sense. I changed the signature to
void createTemporaryFunction(String path, Class<? extends UserDefinedFunction> functionClass); This will mean that after we make QueryOperations string serializable and we expose the type inference in functions we will have only a single object that cannot be persisted in a catalog and thus have only a "Temporary" method: void createTemporaryView(String path, DataStream stream); I updated the FLIP page accordingly. I also listed explicitly methods that we need to deprecate and added a note that the implementation of the methods concerning UserDefinedFunctions has to be postponed until we have type inference in place. As the voting period for FLIP-57 ended, I will start the VOTE thread for FLIP-64 tomorrow morning, unless there are some objections until then. Best, Dawid On 10/10/2019 16:16, Kurt Young wrote: > Thanks for the clarification Timo, that's sounds good to me. Let's continue > to discuss other things. > > Best, > Kurt > > > On Thu, Oct 10, 2019 at 4:14 PM Timo Walther <twal...@apache.org> wrote: > >> The purpose of ConnectTableDescriptor was to have a programmatic way of >> expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree >> that it needs some rework in the future, once we have finalized the DDL >> to ensure that both concepts are in sync. >> >> Regards, >> Timo >> >> >> On 10.10.19 16:08, Kurt Young wrote: >>> Regarding to ConnectTableDescriptor, if in the end it becomes a builder >>> of CatalogTable, I would be ok with that. But it doesn't look like a >> builder >>> now, it's more like another form of TableSource/TableSink. >>> >>> Best, >>> Kurt >>> >>> >>> On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <twal...@apache.org> wrote: >>> >>>> Hi everyone, >>>> >>>> thanks for the great discussion with a good outcome. >>>> >>>> I have one last comment about: >>>> >>>> void createTemporaryFunction(String path, UserDefinedFunction function); >>>> >>>> We should encourage users to register a class instead of an instance. We >>>> should enforce a default constructor in the future. If we need metadata >>>> or type inference, the planner can simply instantiate the class and call >>>> the corresponding getters. If people would like to parameterize a >>>> function, they can use global job parameters instead - which are >>>> available in the open() method. >>>> >>>> I suggest: >>>> >>>> void createTemporaryFunction(String path, Class<? extends >>>> UserDefinedFunction> functionClass); >>>> >>>> This is also closer to the SQL DDL that is about to be implemented in: >>>> https://issues.apache.org/jira/browse/FLINK-7151 >>>> >>>> Thanks, >>>> Timo >>>> >>>> >>>> On 09.10.19 17:07, Bowen Li wrote: >>>>> Hi Dawid, >>>>> >>>>> +1 for proposed changes >>>>> >>>>> On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz < >> dwysakow...@apache.org >>>>> wrote: >>>>> >>>>>> Sorry for a very delayed response. >>>>>> >>>>>> @Kurt Yes, this is the goal to have a function created like new >>>>>> Function(...) also be wrapped into CatalogFunction. This would have to >>>>>> be though a temporary function as we cannot represent that as a set of >>>>>> properties. Similar to the createTemporaryView(DataStream stream). >>>>>> >>>>>> As for the ConnectTableDescriptor I agree this is very similar to >>>>>> CatalogTable. I am not sure though if we should get rid of it. In the >>>>>> end I see it as a builder for a CatalogTable, which is a slightly more >>>>>> internal API, but we might revisit that some time in the future if we >>>>>> find that it makes more sense. >>>>>> >>>>>> @All I updated the FLIP page with some more details from the outcome >> of >>>>>> the discussions around FLIP-57. Please take a look. I would like to >>>>>> start a vote on this FLIP as soon as the vote on FLIP-57 goes through. >>>>>> >>>>>> Best, >>>>>> >>>>>> Dawid >>>>>> >>>>>> >>>>>> On 19/09/2019 09:24, Kurt Young wrote: >>>>>>> IIUC it's good to see that both serializable (tables description from >>>>>> DDL) >>>>>>> and unserializable (tables with DataStream underneath) tables are >>>> treated >>>>>>> unify with CatalogTable. >>>>>>> >>>>>>> Can I also assume functions that either come from a function class >>>> (from >>>>>>> DDL) >>>>>>> or function objects (newed by user) will also treated unify with >>>>>>> CatalogFunction? >>>>>>> >>>>>>> This will greatly simplify and unify current API level concepts and >>>>>> design. >>>>>>> And it seems only one thing left, how do we deal with >>>>>>> ConnectTableDescriptor? >>>>>>> It's actually very similar with serializable CatalogTable, both carry >>>>>> some >>>>>>> text >>>>>>> properties which even are the same. Is there any chance we can >> further >>>>>> unify >>>>>>> this to CatalogTable? >>>>>>> >>>>>>> object >>>>>>> Best, >>>>>>> Kurt >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <imj...@gmail.com> wrote: >>>>>>> >>>>>>>> Thanks Dawid for the design doc. >>>>>>>> >>>>>>>> In general, I’m +1 to the FLIP. >>>>>>>> >>>>>>>> >>>>>>>> +1 to the single-string and parse way to express object path. >>>>>>>> >>>>>>>> +1 to deprecate registerTableSink & registerTableSource. >>>>>>>> But I would suggest to provide an easy way to register a custom >>>>>>>> source/sink before we drop them (this is another story). >>>>>>>> Currently, it’s not easy to implement a custom connector descriptor. >>>>>>>> >>>>>>>> Best, >>>>>>>> Jark >>>>>>>> >>>>>>>> >>>>>>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <wysakowicz.da...@gmail.com> >> 写道: >>>>>>>>> Hi JingsongLee, >>>>>>>>> From my understanding they can. Underneath they will be >>>> CatalogTables. >>>>>>>> The >>>>>>>>> difference is the lifetime of the tables. Plus some of the user >>>> facing >>>>>>>>> interfaces cannot be persisted e.g. datastream. Therefore we must >>>> have >>>>>> a >>>>>>>>> separate methods for that. In the end the temporary tables are held >>>> in >>>>>>>>> memory as CatalogTables. >>>>>>>>> Best, >>>>>>>>> Dawid >>>>>>>>> >>>>>>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <lzljs3620...@aliyun.com >>>>>>>> .invalid> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi dawid: >>>>>>>>>> Can temporary tables achieve the same capabilities as catalog >> table? >>>>>>>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, >>>>>>>>>> PartitionStatistics >>>>>>>>>> like partition support: we have added some catalog equivalent >>>>>> interfaces >>>>>>>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames >>>>>>>>>> Maybe it's not a good idea to add these interfaces to >>>>>>>>>> TableSource/TableSink. What do you think? >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jingsong Lee >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------ >>>>>>>>>> From:Kurt Young <ykt...@gmail.com> >>>>>>>>>> Send Time:2019年9月18日(星期三) 17:54 >>>>>>>>>> To:dev <dev@flink.apache.org> >>>>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >>>> Table >>>>>>>>>> module >>>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> Sorry to join this party late. Big +1 to this flip, especially for >>>> the >>>>>>>>>> dropping >>>>>>>>>> "registerTableSink & registerTableSource" part. These are indeed >>>>>> legacy >>>>>>>>>> and we should try to unify them through CatalogTable after we >>>>>> introduce >>>>>>>>>> the concept of Catalog. >>>>>>>>>> >>>>>>>>>> From my understanding, what we can registered should all be >>>> metadata, >>>>>>>>>> TableSource/TableSink should only be the one who is responsible to >>>> do >>>>>>>>>> the real work, i.e. reading and writing data according to the >> schema >>>>>> and >>>>>>>>>> other information like computed column, partition, .e.g. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Kurt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee < >>>> lzljs3620...@aliyun.com >>>>>>>>>> .invalid> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> After some development and thinking, I have a general >>>> understanding. >>>>>>>>>>> +1 to registering a source/sink does not fit into the SQL world. >>>>>>>>>>> I am OK to have a deprecated registerTemporarySource/Sink to >>>>>> compatible >>>>>>>>>>> with old ways. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Jingsong Lee >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >> ------------------------------------------------------------------ >>>>>>>>>>> From:Timo Walther <twal...@apache.org> >>>>>>>>>>> Send Time:2019年9月17日(星期二) 08:00 >>>>>>>>>>> To:dev <dev@flink.apache.org> >>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >>>> Table >>>>>>>>>>> module >>>>>>>>>>> >>>>>>>>>>> Hi Dawid, >>>>>>>>>>> >>>>>>>>>>> thanks for the design document. It fixes big concept gaps due to >>>>>>>>>>> historical reasons with proper support for serializability and >>>>>> catalog >>>>>>>>>>> support in mind. >>>>>>>>>>> >>>>>>>>>>> I would not mind a registerTemporarySource/Sink, but the problem >>>>>> that I >>>>>>>>>>> see is that many people think that this is the recommended way of >>>>>>>>>>> registering a table source/sink which is not true. We should >> guide >>>>>>>> users >>>>>>>>>>> to either use connect() or DDL API which can be validated and >>>> stored >>>>>> in >>>>>>>>>>> catalog. >>>>>>>>>>> >>>>>>>>>>> Also from a concept perspective, registering a source/sink does >> not >>>>>> fit >>>>>>>>>>> into the SQL world. SQL does not know about source/sinks but only >>>>>> about >>>>>>>>>>> tables. If the responsibility of a TableSource/TableSink is just >> a >>>>>> pure >>>>>>>>>>> physical data consumer/producer that is not connected to the >> actual >>>>>>>>>>> logical table schema, we would need a possibility of defining >> time >>>>>>>>>>> attributes and interpreting/converting a changelog. This should >> be >>>>>> done >>>>>>>>>>> by the framework with information from the DDL/connect() and not >> be >>>>>>>>>>> defined in every table source. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Timo >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 09.09.19 14:16, JingsongLee wrote: >>>>>>>>>>>> Hi dawid: >>>>>>>>>>>> >>>>>>>>>>>> It is difficult to describe specific examples. >>>>>>>>>>>> Sometimes users will generate some java converters through some >>>>>>>>>>>> Java code, or generate some Java classes through third-party >>>>>>>>>>>> libraries. Of course, these can be best done through >> properties. >>>>>>>>>>>> But this requires additional work from users.My suggestion is to >>>>>>>>>>>> keep this Java instance class way that is user-friendly. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Jingsong Lee >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >> ------------------------------------------------------------------ >>>>>>>>>>>> From:Dawid Wysakowicz <dwysakow...@apache.org> >>>>>>>>>>>> Send Time:2019年9月6日(星期五) 16:21 >>>>>>>>>>>> To:dev <dev@flink.apache.org> >>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >>>>>> Table >>>>>>>>>>> module >>>>>>>>>>>> Hi all, >>>>>>>>>>>> @Jingsong Could you elaborate a bit more what do you mean by >>>>>>>>>>>> "some Connectors are difficult to convert all states to >>>> properties" >>>>>>>>>>>> All the Flink provided connectors will definitely be expressible >>>>>> with >>>>>>>>>>> properties (In the end you should be able to use them from DDL). >> I >>>>>>>> think >>>>>>>>>> if >>>>>>>>>>> a TableSource is complex enough that it handles filter push down, >>>>>>>>>> partition >>>>>>>>>>> support etc. should rather be made available both from DDL & >>>>>> java/scala >>>>>>>>>>> code. I'm happy to reconsider adding >> registerTemporaryTable(String >>>>>>>> path, >>>>>>>>>>> TableSource source) if you have some concrete examples in mind. >>>>>>>>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually >>>>>>>>>> introducing >>>>>>>>>>> a new identifier representation to differentiate between resolved >>>> and >>>>>>>>>>> unresolved identifiers) with the same concerns. We decided to >>>> suggest >>>>>>>> the >>>>>>>>>>> string & parsing logic because of usability. >>>>>>>>>>>> tEnv.from("cat.db.table") >>>>>>>>>>>> is shorter and easier to write than >>>>>>>>>>>> tEnv.from(Identifier.for("cat", "db", "name") >>>>>>>>>>>> And also implicitly solves the problem what happens if a user >>>> (e.g. >>>>>>>>>> used >>>>>>>>>>> to other systems) uses that API in a following manner: >>>>>>>>>>>> tEnv.from(Identifier.for("db.name") >>>>>>>>>>>> I'm happy to revisit it if the general consensus is that it's >>>> better >>>>>>>> to >>>>>>>>>>> use the OO aproach. >>>>>>>>>>>> Best, >>>>>>>>>>>> Dawid >>>>>>>>>>>> >>>>>>>>>>>> On 06/09/2019 10:00, Xuefu Z wrote: >>>>>>>>>>>> >>>>>>>>>>>> Thanks to Dawid for starting the discussion and writeup. It >> looks >>>>>>>>>> pretty >>>>>>>>>>>> good to me except that I'm a little concerned about the object >>>>>>>>>> reference >>>>>>>>>>>> and string parsing in the code, which seems to an anti-pattern >> to >>>>>> OOP. >>>>>>>>>>> Have >>>>>>>>>>>> we considered using ObjectIdenitifier with optional catalog and >> db >>>>>>>>>> parts, >>>>>>>>>>>> esp. if we are worried about arguments of variable length or >>>> method >>>>>>>>>>>> overloading? It's quite likely that the result of string parsing >>>> is >>>>>> an >>>>>>>>>>>> ObjectIdentifier instance any way. >>>>>>>>>>>> >>>>>>>>>>>> Having string parsing logic in the code is a little dangerous as >>>> it >>>>>>>>>>>> duplicates part of the DDL/DML parsing, and they can easily get >>>> out >>>>>> of >>>>>>>>>>> sync. >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Xuefu >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee < >>>> lzljs3620...@aliyun.com >>>>>>>>>>> .invalid> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks dawid, +1 for this approach. >>>>>>>>>>>> >>>>>>>>>>>> One concern is the removal of registerTableSink & >>>>>> registerTableSource >>>>>>>>>>>> in TableEnvironment. It has two alternatives: >>>>>>>>>>>> 1.the properties approach (DDL, descriptor). >>>>>>>>>>>> 2.from/toDataStream. >>>>>>>>>>>> >>>>>>>>>>>> #1 can only be properties, not java states, and some Connectors >>>>>>>>>>>> are difficult to convert all states to properties. >>>>>>>>>>>> #2 can contain java state. But can't use TableSource-related >>>>>> features, >>>>>>>>>>>> like project & filter push down, partition support, etc.. >>>>>>>>>>>> >>>>>>>>>>>> Any idea about this? >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Jingsong Lee >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >> ------------------------------------------------------------------ >>>>>>>>>>>> From:Dawid Wysakowicz <dwysakow...@apache.org> >>>>>>>>>>>> Send Time:2019年9月4日(星期三) 22:20 >>>>>>>>>>>> To:dev <dev@flink.apache.org> >>>>>>>>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in >> Table >>>>>>>>>> module >>>>>>>>>>>> Hi all, >>>>>>>>>>>> As part of FLIP-30 a Catalog API was introduced that enables >>>> storing >>>>>>>>>>> table >>>>>>>>>>>> meta objects permanently. At the same time the majority of >> current >>>>>>>> APIs >>>>>>>>>>>> create temporary objects that cannot be serialized. We should >>>>>> clarify >>>>>>>>>> the >>>>>>>>>>>> creation of meta objects (tables, views, functions) in a unified >>>>>> way. >>>>>>>>>>>> Another current problem in the API is that all the temporary >>>> objects >>>>>>>>>> are >>>>>>>>>>>> stored in a special built-in catalog, which is not very >> intuitive >>>>>> for >>>>>>>>>>> many >>>>>>>>>>>> users, as they must be aware of that catalog to reference >>>> temporary >>>>>>>>>>> objects. >>>>>>>>>>>> Lastly, different APIs have different ways of providing object >>>>>> paths: >>>>>>>>>>>> String path…, >>>>>>>>>>>> String path, String pathContinued… >>>>>>>>>>>> String name >>>>>>>>>>>> We should choose one approach and unify it across all APIs. >>>>>>>>>>>> I suggest a FLIP to address the above issues. >>>>>>>>>>>> Looking forward to your opinions. >>>>>>>>>>>> FLIP link: >>>>>>>>>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module >>
signature.asc
Description: OpenPGP digital signature