Regarding to ConnectTableDescriptor, if in the end it becomes a builder
of CatalogTable, I would be ok with that. But it doesn't look like a builder
now, it's more like another form of TableSource/TableSink.

Best,
Kurt


On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <twal...@apache.org> wrote:

> Hi everyone,
>
> thanks for the great discussion with a good outcome.
>
> I have one last comment about:
>
> void createTemporaryFunction(String path, UserDefinedFunction function);
>
> We should encourage users to register a class instead of an instance. We
> should enforce a default constructor in the future. If we need metadata
> or type inference, the planner can simply instantiate the class and call
> the corresponding getters. If people would like to parameterize a
> function, they can use global job parameters instead - which are
> available in the open() method.
>
> I suggest:
>
> void createTemporaryFunction(String path, Class<? extends
> UserDefinedFunction> functionClass);
>
> This is also closer to the SQL DDL that is about to be implemented in:
> https://issues.apache.org/jira/browse/FLINK-7151
>
> Thanks,
> Timo
>
>
> On 09.10.19 17:07, Bowen Li wrote:
> > Hi Dawid,
> >
> > +1 for proposed changes
> >
> > On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <dwysakow...@apache.org
> >
> > wrote:
> >
> >> Sorry for a very delayed response.
> >>
> >> @Kurt Yes, this is the goal to have a function created like new
> >> Function(...) also be wrapped into CatalogFunction. This would have to
> >> be though a temporary function as we cannot represent that as a set of
> >> properties. Similar to the createTemporaryView(DataStream stream).
> >>
> >> As for the ConnectTableDescriptor I agree this is very similar to
> >> CatalogTable. I am not sure though if we should get rid of it. In the
> >> end I see it as a builder for a CatalogTable, which is a slightly more
> >> internal API, but we might revisit that some time in the future if we
> >> find that it makes more sense.
> >>
> >> @All I updated the FLIP page with some more details from the outcome of
> >> the discussions around FLIP-57. Please take a look. I would like to
> >> start a vote on this FLIP as soon as the vote on FLIP-57 goes through.
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >>
> >> On 19/09/2019 09:24, Kurt Young wrote:
> >>> IIUC it's good to see that both serializable (tables description from
> >> DDL)
> >>> and unserializable (tables with DataStream underneath) tables are
> treated
> >>> unify with CatalogTable.
> >>>
> >>> Can I also assume functions that either come from a function class
> (from
> >>> DDL)
> >>> or function objects (newed by user) will also treated unify with
> >>> CatalogFunction?
> >>>
> >>> This will greatly simplify and unify current API level concepts and
> >> design.
> >>> And it seems only one thing left, how do we deal with
> >>> ConnectTableDescriptor?
> >>> It's actually very similar with serializable CatalogTable, both carry
> >> some
> >>> text
> >>> properties which even are the same. Is there any chance we can further
> >> unify
> >>> this to CatalogTable?
> >>>
> >>> object
> >>> Best,
> >>> Kurt
> >>>
> >>>
> >>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <imj...@gmail.com> wrote:
> >>>
> >>>> Thanks Dawid for the design doc.
> >>>>
> >>>> In general, I’m +1 to the FLIP.
> >>>>
> >>>>
> >>>> +1 to the single-string and parse way to express object path.
> >>>>
> >>>> +1 to deprecate registerTableSink & registerTableSource.
> >>>> But I would suggest to provide an easy way to register a custom
> >>>> source/sink before we drop them (this is another story).
> >>>> Currently, it’s not easy to implement a custom connector descriptor.
> >>>>
> >>>> Best,
> >>>> Jark
> >>>>
> >>>>
> >>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <wysakowicz.da...@gmail.com> 写道:
> >>>>>
> >>>>> Hi JingsongLee,
> >>>>>  From my understanding they can. Underneath they will be
> CatalogTables.
> >>>> The
> >>>>> difference is the lifetime of the tables. Plus some of the user
> facing
> >>>>> interfaces cannot be persisted e.g. datastream. Therefore we must
> have
> >> a
> >>>>> separate methods for that. In the end the temporary tables are held
> in
> >>>>> memory as CatalogTables.
> >>>>> Best,
> >>>>> Dawid
> >>>>>
> >>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <lzljs3620...@aliyun.com
> >>>> .invalid>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi dawid:
> >>>>>> Can temporary tables achieve the same capabilities as catalog table?
> >>>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics,
> >>>>>> PartitionStatistics
> >>>>>> like partition support: we have added some catalog equivalent
> >> interfaces
> >>>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames
> >>>>>> Maybe it's not a good idea to add these interfaces to
> >>>>>> TableSource/TableSink. What do you think?
> >>>>>>
> >>>>>> Best,
> >>>>>> Jingsong Lee
> >>>>>>
> >>>>>>
> >>>>>> ------------------------------------------------------------------
> >>>>>> From:Kurt Young <ykt...@gmail.com>
> >>>>>> Send Time:2019年9月18日(星期三) 17:54
> >>>>>> To:dev <dev@flink.apache.org>
> >>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
> Table
> >>>>>> module
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> Sorry to join this party late. Big +1 to this flip, especially for
> the
> >>>>>> dropping
> >>>>>> "registerTableSink & registerTableSource" part. These are indeed
> >> legacy
> >>>>>> and we should try to unify them through CatalogTable after we
> >> introduce
> >>>>>> the concept of Catalog.
> >>>>>>
> >>>>>>  From my understanding, what we can registered should all be
> metadata,
> >>>>>> TableSource/TableSink should only be the one who is responsible to
> do
> >>>>>> the real work, i.e. reading and writing data according to the schema
> >> and
> >>>>>> other information like computed column, partition, .e.g.
> >>>>>>
> >>>>>> Best,
> >>>>>> Kurt
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <
> lzljs3620...@aliyun.com
> >>>>>> .invalid>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> After some development and thinking, I have a general
> understanding.
> >>>>>>> +1 to registering a source/sink does not fit into the SQL world.
> >>>>>>> I am OK to have a deprecated registerTemporarySource/Sink to
> >> compatible
> >>>>>>> with old ways.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jingsong Lee
> >>>>>>>
> >>>>>>>
> >>>>>>> ------------------------------------------------------------------
> >>>>>>> From:Timo Walther <twal...@apache.org>
> >>>>>>> Send Time:2019年9月17日(星期二) 08:00
> >>>>>>> To:dev <dev@flink.apache.org>
> >>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
> Table
> >>>>>>> module
> >>>>>>>
> >>>>>>> Hi Dawid,
> >>>>>>>
> >>>>>>> thanks for the design document. It fixes big concept gaps due to
> >>>>>>> historical reasons with proper support for serializability and
> >> catalog
> >>>>>>> support in mind.
> >>>>>>>
> >>>>>>> I would not mind a registerTemporarySource/Sink, but the problem
> >> that I
> >>>>>>> see is that many people think that this is the recommended way of
> >>>>>>> registering a table source/sink which is not true. We should guide
> >>>> users
> >>>>>>> to either use connect() or DDL API which can be validated and
> stored
> >> in
> >>>>>>> catalog.
> >>>>>>>
> >>>>>>> Also from a concept perspective, registering a source/sink does not
> >> fit
> >>>>>>> into the SQL world. SQL does not know about source/sinks but only
> >> about
> >>>>>>> tables. If the responsibility of a TableSource/TableSink is just a
> >> pure
> >>>>>>> physical data consumer/producer that is not connected to the actual
> >>>>>>> logical table schema, we would need a possibility of defining time
> >>>>>>> attributes and interpreting/converting a changelog. This should be
> >> done
> >>>>>>> by the framework with information from the DDL/connect() and not be
> >>>>>>> defined in every table source.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Timo
> >>>>>>>
> >>>>>>>
> >>>>>>> On 09.09.19 14:16, JingsongLee wrote:
> >>>>>>>> Hi dawid:
> >>>>>>>>
> >>>>>>>> It is difficult to describe specific examples.
> >>>>>>>> Sometimes users will generate some java converters through some
> >>>>>>>>   Java code, or generate some Java classes through third-party
> >>>>>>>>   libraries. Of course, these can be best done through properties.
> >>>>>>>> But this requires additional work from users.My suggestion is to
> >>>>>>>>   keep this Java instance class way that is user-friendly.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jingsong Lee
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ------------------------------------------------------------------
> >>>>>>>> From:Dawid Wysakowicz <dwysakow...@apache.org>
> >>>>>>>> Send Time:2019年9月6日(星期五) 16:21
> >>>>>>>> To:dev <dev@flink.apache.org>
> >>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
> >> Table
> >>>>>>> module
> >>>>>>>> Hi all,
> >>>>>>>> @Jingsong Could you elaborate a bit more what do you mean by
> >>>>>>>> "some Connectors are difficult to convert all states to
> properties"
> >>>>>>>> All the Flink provided connectors will definitely be expressible
> >> with
> >>>>>>> properties (In the end you should be able to use them from DDL). I
> >>>> think
> >>>>>> if
> >>>>>>> a TableSource is complex enough that it handles filter push down,
> >>>>>> partition
> >>>>>>> support etc. should rather be made available both from DDL &
> >> java/scala
> >>>>>>> code. I'm happy to reconsider adding registerTemporaryTable(String
> >>>> path,
> >>>>>>> TableSource source) if you have some concrete examples in mind.
> >>>>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually
> >>>>>> introducing
> >>>>>>> a new identifier representation to differentiate between resolved
> and
> >>>>>>> unresolved identifiers) with the same concerns. We decided to
> suggest
> >>>> the
> >>>>>>> string & parsing logic because of usability.
> >>>>>>>>      tEnv.from("cat.db.table")
> >>>>>>>> is shorter and easier to write than
> >>>>>>>>      tEnv.from(Identifier.for("cat", "db", "name")
> >>>>>>>> And also implicitly solves the problem what happens if a user
> (e.g.
> >>>>>> used
> >>>>>>> to other systems) uses that API in a following manner:
> >>>>>>>>      tEnv.from(Identifier.for("db.name")
> >>>>>>>> I'm happy to revisit it if the general consensus is that it's
> better
> >>>> to
> >>>>>>> use the OO aproach.
> >>>>>>>> Best,
> >>>>>>>> Dawid
> >>>>>>>>
> >>>>>>>> On 06/09/2019 10:00, Xuefu Z wrote:
> >>>>>>>>
> >>>>>>>> Thanks to Dawid for starting the discussion and writeup. It looks
> >>>>>> pretty
> >>>>>>>> good to me except that I'm a little concerned about the object
> >>>>>> reference
> >>>>>>>> and string parsing in the code, which seems to an anti-pattern to
> >> OOP.
> >>>>>>> Have
> >>>>>>>> we considered using ObjectIdenitifier with optional catalog and db
> >>>>>> parts,
> >>>>>>>> esp. if we are worried about arguments of variable length or
> method
> >>>>>>>> overloading? It's quite likely that the result of string parsing
> is
> >> an
> >>>>>>>> ObjectIdentifier instance any way.
> >>>>>>>>
> >>>>>>>> Having string parsing logic in the code is a little dangerous as
> it
> >>>>>>>> duplicates part of the DDL/DML parsing, and they can easily get
> out
> >> of
> >>>>>>> sync.
> >>>>>>>> Thanks,
> >>>>>>>> Xuefu
> >>>>>>>>
> >>>>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <
> lzljs3620...@aliyun.com
> >>>>>>> .invalid>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks dawid, +1 for this approach.
> >>>>>>>>
> >>>>>>>> One concern is the removal of registerTableSink &
> >> registerTableSource
> >>>>>>>>   in TableEnvironment. It has two alternatives:
> >>>>>>>> 1.the properties approach (DDL, descriptor).
> >>>>>>>> 2.from/toDataStream.
> >>>>>>>>
> >>>>>>>> #1 can only be properties, not java states, and some Connectors
> >>>>>>>>   are difficult to convert all states to properties.
> >>>>>>>> #2 can contain java state. But can't use TableSource-related
> >> features,
> >>>>>>>> like project & filter push down, partition support, etc..
> >>>>>>>>
> >>>>>>>> Any idea about this?
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jingsong Lee
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ------------------------------------------------------------------
> >>>>>>>> From:Dawid Wysakowicz <dwysakow...@apache.org>
> >>>>>>>> Send Time:2019年9月4日(星期三) 22:20
> >>>>>>>> To:dev <dev@flink.apache.org>
> >>>>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table
> >>>>>> module
> >>>>>>>> Hi all,
> >>>>>>>> As part of FLIP-30 a Catalog API was introduced that enables
> storing
> >>>>>>> table
> >>>>>>>> meta objects permanently. At the same time the majority of current
> >>>> APIs
> >>>>>>>> create temporary objects that cannot be serialized. We should
> >> clarify
> >>>>>> the
> >>>>>>>> creation of meta objects (tables, views, functions) in a unified
> >> way.
> >>>>>>>> Another current problem in the API is that all the temporary
> objects
> >>>>>> are
> >>>>>>>> stored in a special built-in catalog, which is not very intuitive
> >> for
> >>>>>>> many
> >>>>>>>> users, as they must be aware of that catalog to reference
> temporary
> >>>>>>> objects.
> >>>>>>>> Lastly, different APIs have different ways of providing object
> >> paths:
> >>>>>>>> String path…,
> >>>>>>>> String path, String pathContinued…
> >>>>>>>> String name
> >>>>>>>> We should choose one approach and unify it across all APIs.
> >>>>>>>> I suggest a FLIP to address the above issues.
> >>>>>>>> Looking forward to your opinions.
> >>>>>>>> FLIP link:
> >>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >>
>
>

Reply via email to