Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Xuefu Z Wed, 09 Oct 2019 20:21:50 -0700

Jark has a good point. However, I think validation logic can put in place
to restrict one instance per type. Maybe the doc needs to be specific on
this.


Thanks,
Xuefu

On Wed, Oct 9, 2019 at 7:41 PM Jark Wu <imj...@gmail.com> wrote:

> Thanks Bowen for the updating.
>
> I have some different opinions on the change.
> IIUC, in the previous design, the "name" is also the "id" or "type" to
> identify which module to load. Which means we can only load one instance of
> a module.
> In the new design, the "name" is just an alias to the module instance, the
> "kind" is used to identify modules. Which means we can load different
> instances of a module.
> However, what's the "name" or alias used for? Do we need to support loading
> different instances of a module? From my point of view, it brings more
> complexity and confusion.
> For example, if we load a "hive121" which uses HiveModule with version
> 1.2.1 and load a "hive234" which uses HiveModule with version 2.3.4, then
> how to solve the class conflict problem?
>
> IMO, a module can only be load once in a session, so "name" maybe useless.
> So my proposal is similar to the previous one, but only change "name" to
> "kind".
>
>    SQL:
>          LOAD MODULE "kind" [WITH (properties)];
>          UNLOAD MODULE "kind";
>     Table:
>          tEnv.loadModule("kind" [, properties]);
>          tEnv.unloadModule("kind");
>
> What do you think?
>
>
> Best,
> Jark
>
>
>
>
>
> On Wed, 9 Oct 2019 at 20:38, Bowen Li <bowenl...@gmail.com> wrote:
>
> > Thanks everyone for your review.
> >
> > After discussing with Timo and Dawid offline, as well as incorporating
> > feedback from Xuefu and Jark on mailing list, I decided to make a few
> > critical changes to the proposal.
> >
> > - renamed the keyword "type" to "kind". The community has plan to have
> > "type" keyword in yaml/descriptor refer to data types exclusively in the
> > near future. We should cater to that change in our design
> > - allowed specifying names for modules to simplify and unify module
> > loading/unloading syntax between programming and SQL. Here're the
> proposed
> > changes:
> >     SQL:
> >          LOAD MODULE "name" WITH ("kind"="xxx" [, (properties)])
> >          UNLOAD MODULE "name";
> >     Table:
> >          tEnv.loadModule("name", new Xxx(properties));
> >          tEnv.unloadModule("name");
> >
> > I have completely updated the google doc [1]. Please take another look,
> and
> > let me know if you have any other questions. Thanks!
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit#
> >
> >
> > On Tue, Oct 8, 2019 at 6:26 AM Jark Wu <imj...@gmail.com> wrote:
> >
> > > Hi Bowen,
> > >
> > > Thanks for the proposal. I have two thoughts:
> > >
> > > 1) Regarding to "loadModule", how about
> > > tableEnv.loadModule("xxx" [, propertiesMap]);
> > > tableEnv.unloadModule(“xxx”);
> > >
> > > This makes the API similar to SQL. IMO, instance of Module is not
> needed
> > > and verbose as parameter.
> > > And this makes it easier to load a simple module without any additional
> > > properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")
> > >
> > > 2) In current design, the module interface only defines function
> > metadata,
> > > but no implementations.
> > > I'm wondering how to call/map the implementations in runtime? Am I
> > missing
> > > something?
> > >
> > > Besides, I left some minor comments in the doc.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > > On Sat, 5 Oct 2019 at 08:42, Xuefu Z <usxu...@gmail.com> wrote:
> > >
> > > > I agree with Timo that the new table APIs need to be consistent. I'd
> go
> > > > further that an name (or id) is needed for module definition in YAML
> > > file.
> > > > In the current design, name is skipped and type has binary meanings.
> > > >
> > > > Thanks,
> > > > Xuefu
> > > >
> > > > On Fri, Oct 4, 2019 at 5:24 AM Timo Walther <twal...@apache.org>
> > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > first, I was also questioning my proposal. But Bowen's proposal of
> > > > > `tEnv.offloadToYaml(<file_path>)` would not work with the current
> > > design
> > > > > because we don't know how to serialize a catalog or module into
> > > > > properties. Currently, there is no converter from instance to
> > > > > properties. It is a one way conversion. We can add a `toProperties`
> > > > > method to both Catalog and Module class in the future to solve
> this.
> > > > > Solving the table environment serializability can be future work.
> > > > >
> > > > > However, I find the current proposal for the TableEnvironment
> methods
> > > is
> > > > > contradicting:
> > > > >
> > > > > tableEnv.loadModule(new Yyy());
> > > > > tableEnv.unloadModule(“Xxx”);
> > > > >
> > > > > The loading is specified programmatically whereas the unloading
> > > requires
> > > > > a string that is not specified in the module itself. But is defined
> > in
> > > > > the factory according to the current design.
> > > > >
> > > > > SQL does it more consistently. There, the name `xxx` is used when
> > > > > loading and unloading the module:
> > > > >
> > > > > LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
> > > > > UNLOAD MODULE 'xxx’
> > > > >
> > > > > How about:
> > > > >
> > > > > tableEnv.loadModule("xxx", new Yyy());
> > > > > tableEnv.unloadModule(“xxx”);
> > > > >
> > > > > This would be similar to the catalog interfaces. The name is not
> part
> > > of
> > > > > the instance itself.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Thanks,
> > > > > Timo
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On 01.10.19 21:17, Bowen Li wrote:
> > > > > > If something like the yaml file is the way to go and achieve such
> > > > > > motivation, we would cover that with current design.
> > > > > >
> > > > > > On Tue, Oct 1, 2019 at 12:05 Bowen Li <bowenl...@gmail.com>
> wrote:
> > > > > >
> > > > > >> Hi Timo, Dawid,
> > > > > >>
> > > > > >> I've added the suggested SQL and related changes to
> > TableEnvironment
> > > > API
> > > > > >> and other classes to the google doc. Also removed "USE MODULE"
> and
> > > its
> > > > > >> APIs. Will update FLIP wiki once we have a consensus.
> > > > > >>
> > > > > >> W.r.t. descriptor approach, my gut feeling is similar to
> Dawid's.
> > > > > Besides,
> > > > > >> I feel yaml file would be a better solution to persist
> > serializable
> > > > > state
> > > > > >> of an environment as the file itself is in serializable format
> > > > already.
> > > > > >> Though yaml file only serves SQL CLI at this moment, we may be
> > able
> > > to
> > > > > >> extend its reach to Table API and allow users to load/offload a
> > > > > >> TableEnvironment from/to yaml files, as something like
> > > > "TableEnvironment
> > > > > >> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
> > > > > >> "tEnv.offloadToYaml(<file_path>)" to restore and persist state,
> > and
> > > > try
> > > > > to
> > > > > >> make yaml file more expressive.
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <
> > > > dwysakow...@apache.org
> > > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Hi Timo, Bowen,
> > > > > >>>
> > > > > >>> Unfortunately I did not have enough time to go through all the
> > > > > >>> suggestions in details so I can not comment on the whole FLIP.
> > > > > >>>
> > > > > >>> I just wanted to give my opinion on the "descriptor approach in
> > > > > >>> loadModule" part. I am not sure if we need it here. We might be
> > > > > >>> overthinking this a bit. It definitely makes sense for objects
> > like
> > > > > >>> TableSource/TableSink etc. as they are logical definitions that
> > > > nearly
> > > > > >>> always have to be persisted in a Catalog. I'm not sure if we
> > really
> > > > > need
> > > > > >>> the same for a whole session. If we need a resume session
> > feature,
> > > > the
> > > > > >>> way to go would probably be to keep the session in memory on
> the
> > > > server
> > > > > >>> side. I fear we will never be able to serialize the whole
> session
> > > > > >>> entirely (temporary objects, objects derived from DataStream
> > etc.)
> > > > > >>>
> > > > > >>> I think it is ok to use instances for objects like Catalogs or
> > > > Modules
> > > > > >>> and have an overlay on top of that that can create instances
> from
> > > > > >>> properties.
> > > > > >>>
> > > > > >>> Best,
> > > > > >>>
> > > > > >>> Dawid
> > > > > >>>
> > > > > >>> On 01/10/2019 11:28, Timo Walther wrote:
> > > > > >>>> Hi Bowen,
> > > > > >>>>
> > > > > >>>> thanks for your response.
> > > > > >>>>
> > > > > >>>> Re 2) I also don't have a better approach for this issue. It
> is
> > > > > >>>> similar to changing the general TableConfig between two
> > > statements.
> > > > It
> > > > > >>>> would be good to add your explanation to the design document.
> > > > > >>>>
> > > > > >>>> Re 3) It would be interesting to know about which "core"
> > functions
> > > > we
> > > > > >>>> are actually talking about. Also for the overriding built-in
> > > > functions
> > > > > >>>> that we discussed in the other FLIP. But I'm fine with leaving
> > it
> > > to
> > > > > >>>> the user for now. How about we just introduce loadModule(),
> > > > > >>>> unloadModule() methods instead of useModules()? This would
> > ensure
> > > > that
> > > > > >>>> users don't forget to add the core module when adding an
> > > additional
> > > > > >>>> module and they need to explicitly call
> "unloadModule('core')".
> > > > > >>>>
> > > > > >>>> Re 4) Every table environment feature should also be designed
> > with
> > > > SQL
> > > > > >>>> statements in mind to verify the concept. SQL is also more
> > popular
> > > > > >>>> that Java/Scala API or YAML file. I would like to add it to
> 1.10
> > > for
> > > > > >>>> marking the feature as complete.
> > > > > >>>>
> > > > > >>>> SHOW MODULES -> sounds good to me, we should add a
> > listModules():
> > > > > >>>> List<String> method to table environment
> > > > > >>>>
> > > > > >>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should
> > > add a
> > > > > >>>> loadModule() method to table environment
> > > > > >>>>
> > > > > >>>> UNLOAD MODULE 'hive' --> we should add a unloadModule() method
> > to
> > > > > >>>> table environment
> > > > > >>>>
> > > > > >>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity
> > and
> > > > > >>>> concise API. Users need to load the module anyway with
> > properties.
> > > > > >>>> They can also load them "in order" immediately. CREATE TABLE
> can
> > > > also
> > > > > >>>> not create multiple tables but only one at a time in that
> order.
> > > > > >>>>
> > > > > >>>> One thing that came to my mind, shall we use a descriptor
> > approach
> > > > for
> > > > > >>>> loadModule()? The past has shown that passing instances causes
> > > > > >>>> problems when persisting objects. That's why we also want to
> get
> > > rid
> > > > > >>>> of registerTableSource. I could image that users might want to
> > > > persist
> > > > > >>>> a table environment's state for later use in the future. Even
> > > though
> > > > > >>>> this is future work, we should already keep such use cases in
> > mind
> > > > > >>>> when adding new API methods. What do you think?
> > > > > >>>>
> > > > > >>>> Regards,
> > > > > >>>> Timo
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On 30.09.19 23:17, Bowen Li wrote:
> > > > > >>>>> Hi Timo,
> > > > > >>>>>
> > > > > >>>>> Re 1) I agree. I renamed the title to "Extend Core Table
> System
> > > > with
> > > > > >>>>> Pluggable Modules" and all internal references
> > > > > >>>>>
> > > > > >>>>> Re 2) First, I'll rename the API to useModules(). The design
> > > > doesn't
> > > > > >>>>> forbid
> > > > > >>>>> users to call useModules() multi times. Objects in modules
> are
> > > > loaded
> > > > > >>> on
> > > > > >>>>> demand instead of eagerly, so there won't be inconsistency.
> > Users
> > > > > >>>>> have to
> > > > > >>>>> be fully aware of the consequences of resetting modules as
> that
> > > > might
> > > > > >>>>> cause
> > > > > >>>>> that some objects can not be referenced anymore or resolution
> > > order
> > > > > >>>>> of some
> > > > > >>>>> objects changes.
> > > > > >>>>>
> > > > > >>>>> Re 3) Yes, we'd leave that to users.
> > > > > >>>>>
> > > > > >>>>> Another approach can be to have a non-optional "Core" module
> > for
> > > > all
> > > > > >>>>> objects that cannot be overrode like "CAST" and "AS"
> functions,
> > > and
> > > > > >>>>> have an
> > > > > >>>>> optional "ExtendedCore" module for other replaceable built-in
> > > > > objects.
> > > > > >>>>> "Core" should be positioned at the 1st in module list all the
> > > time.
> > > > > >>>>>
> > > > > >>>>> I'm fine with either solution.
> > > > > >>>>>
> > > > > >>>>> Re 4) It may sound like a nice-to-have advanced feature for
> > 1.10,
> > > > but
> > > > > >>> we
> > > > > >>>>> can surely fully discuss it for the sake of feature
> > completeness.
> > > > > >>>>>
> > > > > >>>>> Unlike other configs, the order of modules would matter in
> > Flink,
> > > > and
> > > > > >>> it
> > > > > >>>>> implies the LOAD/UNLOAD commands would not be equal in
> > operation
> > > > > >>>>> positions.
> > > > > >>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x to
> > the
> > > > end
> > > > > >>> of
> > > > > >>>>> module list, and UNLOAD MODULE 'x' would be interpreted as
> > > > removing x
> > > > > >>>>> from
> > > > > >>>>> any position in the list?
> > > > > >>>>>
> > > > > >>>>> I'm thinking of the following list of commands:
> > > > > >>>>>
> > > > > >>>>> SHOW MODULES - list modules in order
> > > > > >>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and
> > > append
> > > > > the
> > > > > >>>>> module to end of the module list
> > > > > >>>>> UNLOAD MODULE 'hive' - remove the module from module list,
> and
> > > > other
> > > > > >>>>> modules remain the same relative positions
> > > > > >>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y'
> > > 'z'"?),
> > > > > >>>>> or USE
> > > > > >>>>> MODULES 'x,y,z' - to reorder module list completely
> > > > > >>>>>
> > > > > >>>
> > > > >
> > > > >
> > > >
> > > > --
> > > > Xuefu Zhang
> > > >
> > > > "In Honey We Trust!"
> > > >
> > >
> >
>


-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Reply via email to