Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Jark Wu Mon, 07 Oct 2019 21:26:10 -0700

Hi Bowen,

Thanks for the proposal. I have two thoughts:


1) Regarding to "loadModule", how about
tableEnv.loadModule("xxx" [, propertiesMap]);
tableEnv.unloadModule(“xxx”);

This makes the API similar to SQL. IMO, instance of Module is not needed
and verbose as parameter.
And this makes it easier to load a simple module without any additional
properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")

2) In current design, the module interface only defines function metadata,
but no implementations.
I'm wondering how to call/map the implementations in runtime? Am I missing
something?

Besides, I left some minor comments in the doc.

Best,
Jark


On Sat, 5 Oct 2019 at 08:42, Xuefu Z <usxu...@gmail.com> wrote:

> I agree with Timo that the new table APIs need to be consistent. I'd go
> further that an name (or id) is needed for module definition in YAML file.
> In the current design, name is skipped and type has binary meanings.
>
> Thanks,
> Xuefu
>
> On Fri, Oct 4, 2019 at 5:24 AM Timo Walther <twal...@apache.org> wrote:
>
> > Hi everyone,
> >
> > first, I was also questioning my proposal. But Bowen's proposal of
> > `tEnv.offloadToYaml(<file_path>)` would not work with the current design
> > because we don't know how to serialize a catalog or module into
> > properties. Currently, there is no converter from instance to
> > properties. It is a one way conversion. We can add a `toProperties`
> > method to both Catalog and Module class in the future to solve this.
> > Solving the table environment serializability can be future work.
> >
> > However, I find the current proposal for the TableEnvironment methods is
> > contradicting:
> >
> > tableEnv.loadModule(new Yyy());
> > tableEnv.unloadModule(“Xxx”);
> >
> > The loading is specified programmatically whereas the unloading requires
> > a string that is not specified in the module itself. But is defined in
> > the factory according to the current design.
> >
> > SQL does it more consistently. There, the name `xxx` is used when
> > loading and unloading the module:
> >
> > LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
> > UNLOAD MODULE 'xxx’
> >
> > How about:
> >
> > tableEnv.loadModule("xxx", new Yyy());
> > tableEnv.unloadModule(“xxx”);
> >
> > This would be similar to the catalog interfaces. The name is not part of
> > the instance itself.
> >
> > What do you think?
> >
> > Thanks,
> > Timo
> >
> >
> >
> >
> > On 01.10.19 21:17, Bowen Li wrote:
> > > If something like the yaml file is the way to go and achieve such
> > > motivation, we would cover that with current design.
> > >
> > > On Tue, Oct 1, 2019 at 12:05 Bowen Li <bowenl...@gmail.com> wrote:
> > >
> > >> Hi Timo, Dawid,
> > >>
> > >> I've added the suggested SQL and related changes to TableEnvironment
> API
> > >> and other classes to the google doc. Also removed "USE MODULE" and its
> > >> APIs. Will update FLIP wiki once we have a consensus.
> > >>
> > >> W.r.t. descriptor approach, my gut feeling is similar to Dawid's.
> > Besides,
> > >> I feel yaml file would be a better solution to persist serializable
> > state
> > >> of an environment as the file itself is in serializable format
> already.
> > >> Though yaml file only serves SQL CLI at this moment, we may be able to
> > >> extend its reach to Table API and allow users to load/offload a
> > >> TableEnvironment from/to yaml files, as something like
> "TableEnvironment
> > >> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
> > >> "tEnv.offloadToYaml(<file_path>)" to restore and persist state, and
> try
> > to
> > >> make yaml file more expressive.
> > >>
> > >>
> > >> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <
> dwysakow...@apache.org
> > >
> > >> wrote:
> > >>
> > >>> Hi Timo, Bowen,
> > >>>
> > >>> Unfortunately I did not have enough time to go through all the
> > >>> suggestions in details so I can not comment on the whole FLIP.
> > >>>
> > >>> I just wanted to give my opinion on the "descriptor approach in
> > >>> loadModule" part. I am not sure if we need it here. We might be
> > >>> overthinking this a bit. It definitely makes sense for objects like
> > >>> TableSource/TableSink etc. as they are logical definitions that
> nearly
> > >>> always have to be persisted in a Catalog. I'm not sure if we really
> > need
> > >>> the same for a whole session. If we need a resume session feature,
> the
> > >>> way to go would probably be to keep the session in memory on the
> server
> > >>> side. I fear we will never be able to serialize the whole session
> > >>> entirely (temporary objects, objects derived from DataStream etc.)
> > >>>
> > >>> I think it is ok to use instances for objects like Catalogs or
> Modules
> > >>> and have an overlay on top of that that can create instances from
> > >>> properties.
> > >>>
> > >>> Best,
> > >>>
> > >>> Dawid
> > >>>
> > >>> On 01/10/2019 11:28, Timo Walther wrote:
> > >>>> Hi Bowen,
> > >>>>
> > >>>> thanks for your response.
> > >>>>
> > >>>> Re 2) I also don't have a better approach for this issue. It is
> > >>>> similar to changing the general TableConfig between two statements.
> It
> > >>>> would be good to add your explanation to the design document.
> > >>>>
> > >>>> Re 3) It would be interesting to know about which "core" functions
> we
> > >>>> are actually talking about. Also for the overriding built-in
> functions
> > >>>> that we discussed in the other FLIP. But I'm fine with leaving it to
> > >>>> the user for now. How about we just introduce loadModule(),
> > >>>> unloadModule() methods instead of useModules()? This would ensure
> that
> > >>>> users don't forget to add the core module when adding an additional
> > >>>> module and they need to explicitly call "unloadModule('core')".
> > >>>>
> > >>>> Re 4) Every table environment feature should also be designed with
> SQL
> > >>>> statements in mind to verify the concept. SQL is also more popular
> > >>>> that Java/Scala API or YAML file. I would like to add it to 1.10 for
> > >>>> marking the feature as complete.
> > >>>>
> > >>>> SHOW MODULES -> sounds good to me, we should add a listModules():
> > >>>> List<String> method to table environment
> > >>>>
> > >>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should add a
> > >>>> loadModule() method to table environment
> > >>>>
> > >>>> UNLOAD MODULE 'hive' --> we should add a unloadModule() method to
> > >>>> table environment
> > >>>>
> > >>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity and
> > >>>> concise API. Users need to load the module anyway with properties.
> > >>>> They can also load them "in order" immediately. CREATE TABLE can
> also
> > >>>> not create multiple tables but only one at a time in that order.
> > >>>>
> > >>>> One thing that came to my mind, shall we use a descriptor approach
> for
> > >>>> loadModule()? The past has shown that passing instances causes
> > >>>> problems when persisting objects. That's why we also want to get rid
> > >>>> of registerTableSource. I could image that users might want to
> persist
> > >>>> a table environment's state for later use in the future. Even though
> > >>>> this is future work, we should already keep such use cases in mind
> > >>>> when adding new API methods. What do you think?
> > >>>>
> > >>>> Regards,
> > >>>> Timo
> > >>>>
> > >>>>
> > >>>> On 30.09.19 23:17, Bowen Li wrote:
> > >>>>> Hi Timo,
> > >>>>>
> > >>>>> Re 1) I agree. I renamed the title to "Extend Core Table System
> with
> > >>>>> Pluggable Modules" and all internal references
> > >>>>>
> > >>>>> Re 2) First, I'll rename the API to useModules(). The design
> doesn't
> > >>>>> forbid
> > >>>>> users to call useModules() multi times. Objects in modules are
> loaded
> > >>> on
> > >>>>> demand instead of eagerly, so there won't be inconsistency. Users
> > >>>>> have to
> > >>>>> be fully aware of the consequences of resetting modules as that
> might
> > >>>>> cause
> > >>>>> that some objects can not be referenced anymore or resolution order
> > >>>>> of some
> > >>>>> objects changes.
> > >>>>>
> > >>>>> Re 3) Yes, we'd leave that to users.
> > >>>>>
> > >>>>> Another approach can be to have a non-optional "Core" module for
> all
> > >>>>> objects that cannot be overrode like "CAST" and "AS" functions, and
> > >>>>> have an
> > >>>>> optional "ExtendedCore" module for other replaceable built-in
> > objects.
> > >>>>> "Core" should be positioned at the 1st in module list all the time.
> > >>>>>
> > >>>>> I'm fine with either solution.
> > >>>>>
> > >>>>> Re 4) It may sound like a nice-to-have advanced feature for 1.10,
> but
> > >>> we
> > >>>>> can surely fully discuss it for the sake of feature completeness.
> > >>>>>
> > >>>>> Unlike other configs, the order of modules would matter in Flink,
> and
> > >>> it
> > >>>>> implies the LOAD/UNLOAD commands would not be equal in operation
> > >>>>> positions.
> > >>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x to the
> end
> > >>> of
> > >>>>> module list, and UNLOAD MODULE 'x' would be interpreted as
> removing x
> > >>>>> from
> > >>>>> any position in the list?
> > >>>>>
> > >>>>> I'm thinking of the following list of commands:
> > >>>>>
> > >>>>> SHOW MODULES - list modules in order
> > >>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and append
> > the
> > >>>>> module to end of the module list
> > >>>>> UNLOAD MODULE 'hive' - remove the module from module list, and
> other
> > >>>>> modules remain the same relative positions
> > >>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y' 'z'"?),
> > >>>>> or USE
> > >>>>> MODULES 'x,y,z' - to reorder module list completely
> > >>>>>
> > >>>
> >
> >
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Reply via email to