Jark has a good point. However, I think validation logic can put in place to restrict one instance per type. Maybe the doc needs to be specific on this.
Thanks, Xuefu On Wed, Oct 9, 2019 at 7:41 PM Jark Wu <imj...@gmail.com> wrote: > Thanks Bowen for the updating. > > I have some different opinions on the change. > IIUC, in the previous design, the "name" is also the "id" or "type" to > identify which module to load. Which means we can only load one instance of > a module. > In the new design, the "name" is just an alias to the module instance, the > "kind" is used to identify modules. Which means we can load different > instances of a module. > However, what's the "name" or alias used for? Do we need to support loading > different instances of a module? From my point of view, it brings more > complexity and confusion. > For example, if we load a "hive121" which uses HiveModule with version > 1.2.1 and load a "hive234" which uses HiveModule with version 2.3.4, then > how to solve the class conflict problem? > > IMO, a module can only be load once in a session, so "name" maybe useless. > So my proposal is similar to the previous one, but only change "name" to > "kind". > > SQL: > LOAD MODULE "kind" [WITH (properties)]; > UNLOAD MODULE "kind"; > Table: > tEnv.loadModule("kind" [, properties]); > tEnv.unloadModule("kind"); > > What do you think? > > > Best, > Jark > > > > > > On Wed, 9 Oct 2019 at 20:38, Bowen Li <bowenl...@gmail.com> wrote: > > > Thanks everyone for your review. > > > > After discussing with Timo and Dawid offline, as well as incorporating > > feedback from Xuefu and Jark on mailing list, I decided to make a few > > critical changes to the proposal. > > > > - renamed the keyword "type" to "kind". The community has plan to have > > "type" keyword in yaml/descriptor refer to data types exclusively in the > > near future. We should cater to that change in our design > > - allowed specifying names for modules to simplify and unify module > > loading/unloading syntax between programming and SQL. Here're the > proposed > > changes: > > SQL: > > LOAD MODULE "name" WITH ("kind"="xxx" [, (properties)]) > > UNLOAD MODULE "name"; > > Table: > > tEnv.loadModule("name", new Xxx(properties)); > > tEnv.unloadModule("name"); > > > > I have completely updated the google doc [1]. Please take another look, > and > > let me know if you have any other questions. Thanks! > > > > [1] > > > > > https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit# > > > > > > On Tue, Oct 8, 2019 at 6:26 AM Jark Wu <imj...@gmail.com> wrote: > > > > > Hi Bowen, > > > > > > Thanks for the proposal. I have two thoughts: > > > > > > 1) Regarding to "loadModule", how about > > > tableEnv.loadModule("xxx" [, propertiesMap]); > > > tableEnv.unloadModule(“xxx”); > > > > > > This makes the API similar to SQL. IMO, instance of Module is not > needed > > > and verbose as parameter. > > > And this makes it easier to load a simple module without any additional > > > properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO") > > > > > > 2) In current design, the module interface only defines function > > metadata, > > > but no implementations. > > > I'm wondering how to call/map the implementations in runtime? Am I > > missing > > > something? > > > > > > Besides, I left some minor comments in the doc. > > > > > > Best, > > > Jark > > > > > > > > > On Sat, 5 Oct 2019 at 08:42, Xuefu Z <usxu...@gmail.com> wrote: > > > > > > > I agree with Timo that the new table APIs need to be consistent. I'd > go > > > > further that an name (or id) is needed for module definition in YAML > > > file. > > > > In the current design, name is skipped and type has binary meanings. > > > > > > > > Thanks, > > > > Xuefu > > > > > > > > On Fri, Oct 4, 2019 at 5:24 AM Timo Walther <twal...@apache.org> > > wrote: > > > > > > > > > Hi everyone, > > > > > > > > > > first, I was also questioning my proposal. But Bowen's proposal of > > > > > `tEnv.offloadToYaml(<file_path>)` would not work with the current > > > design > > > > > because we don't know how to serialize a catalog or module into > > > > > properties. Currently, there is no converter from instance to > > > > > properties. It is a one way conversion. We can add a `toProperties` > > > > > method to both Catalog and Module class in the future to solve > this. > > > > > Solving the table environment serializability can be future work. > > > > > > > > > > However, I find the current proposal for the TableEnvironment > methods > > > is > > > > > contradicting: > > > > > > > > > > tableEnv.loadModule(new Yyy()); > > > > > tableEnv.unloadModule(“Xxx”); > > > > > > > > > > The loading is specified programmatically whereas the unloading > > > requires > > > > > a string that is not specified in the module itself. But is defined > > in > > > > > the factory according to the current design. > > > > > > > > > > SQL does it more consistently. There, the name `xxx` is used when > > > > > loading and unloading the module: > > > > > > > > > > LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)] > > > > > UNLOAD MODULE 'xxx’ > > > > > > > > > > How about: > > > > > > > > > > tableEnv.loadModule("xxx", new Yyy()); > > > > > tableEnv.unloadModule(“xxx”); > > > > > > > > > > This would be similar to the catalog interfaces. The name is not > part > > > of > > > > > the instance itself. > > > > > > > > > > What do you think? > > > > > > > > > > Thanks, > > > > > Timo > > > > > > > > > > > > > > > > > > > > > > > > > On 01.10.19 21:17, Bowen Li wrote: > > > > > > If something like the yaml file is the way to go and achieve such > > > > > > motivation, we would cover that with current design. > > > > > > > > > > > > On Tue, Oct 1, 2019 at 12:05 Bowen Li <bowenl...@gmail.com> > wrote: > > > > > > > > > > > >> Hi Timo, Dawid, > > > > > >> > > > > > >> I've added the suggested SQL and related changes to > > TableEnvironment > > > > API > > > > > >> and other classes to the google doc. Also removed "USE MODULE" > and > > > its > > > > > >> APIs. Will update FLIP wiki once we have a consensus. > > > > > >> > > > > > >> W.r.t. descriptor approach, my gut feeling is similar to > Dawid's. > > > > > Besides, > > > > > >> I feel yaml file would be a better solution to persist > > serializable > > > > > state > > > > > >> of an environment as the file itself is in serializable format > > > > already. > > > > > >> Though yaml file only serves SQL CLI at this moment, we may be > > able > > > to > > > > > >> extend its reach to Table API and allow users to load/offload a > > > > > >> TableEnvironment from/to yaml files, as something like > > > > "TableEnvironment > > > > > >> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and > > > > > >> "tEnv.offloadToYaml(<file_path>)" to restore and persist state, > > and > > > > try > > > > > to > > > > > >> make yaml file more expressive. > > > > > >> > > > > > >> > > > > > >> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz < > > > > dwysakow...@apache.org > > > > > > > > > > > >> wrote: > > > > > >> > > > > > >>> Hi Timo, Bowen, > > > > > >>> > > > > > >>> Unfortunately I did not have enough time to go through all the > > > > > >>> suggestions in details so I can not comment on the whole FLIP. > > > > > >>> > > > > > >>> I just wanted to give my opinion on the "descriptor approach in > > > > > >>> loadModule" part. I am not sure if we need it here. We might be > > > > > >>> overthinking this a bit. It definitely makes sense for objects > > like > > > > > >>> TableSource/TableSink etc. as they are logical definitions that > > > > nearly > > > > > >>> always have to be persisted in a Catalog. I'm not sure if we > > really > > > > > need > > > > > >>> the same for a whole session. If we need a resume session > > feature, > > > > the > > > > > >>> way to go would probably be to keep the session in memory on > the > > > > server > > > > > >>> side. I fear we will never be able to serialize the whole > session > > > > > >>> entirely (temporary objects, objects derived from DataStream > > etc.) > > > > > >>> > > > > > >>> I think it is ok to use instances for objects like Catalogs or > > > > Modules > > > > > >>> and have an overlay on top of that that can create instances > from > > > > > >>> properties. > > > > > >>> > > > > > >>> Best, > > > > > >>> > > > > > >>> Dawid > > > > > >>> > > > > > >>> On 01/10/2019 11:28, Timo Walther wrote: > > > > > >>>> Hi Bowen, > > > > > >>>> > > > > > >>>> thanks for your response. > > > > > >>>> > > > > > >>>> Re 2) I also don't have a better approach for this issue. It > is > > > > > >>>> similar to changing the general TableConfig between two > > > statements. > > > > It > > > > > >>>> would be good to add your explanation to the design document. > > > > > >>>> > > > > > >>>> Re 3) It would be interesting to know about which "core" > > functions > > > > we > > > > > >>>> are actually talking about. Also for the overriding built-in > > > > functions > > > > > >>>> that we discussed in the other FLIP. But I'm fine with leaving > > it > > > to > > > > > >>>> the user for now. How about we just introduce loadModule(), > > > > > >>>> unloadModule() methods instead of useModules()? This would > > ensure > > > > that > > > > > >>>> users don't forget to add the core module when adding an > > > additional > > > > > >>>> module and they need to explicitly call > "unloadModule('core')". > > > > > >>>> > > > > > >>>> Re 4) Every table environment feature should also be designed > > with > > > > SQL > > > > > >>>> statements in mind to verify the concept. SQL is also more > > popular > > > > > >>>> that Java/Scala API or YAML file. I would like to add it to > 1.10 > > > for > > > > > >>>> marking the feature as complete. > > > > > >>>> > > > > > >>>> SHOW MODULES -> sounds good to me, we should add a > > listModules(): > > > > > >>>> List<String> method to table environment > > > > > >>>> > > > > > >>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should > > > add a > > > > > >>>> loadModule() method to table environment > > > > > >>>> > > > > > >>>> UNLOAD MODULE 'hive' --> we should add a unloadModule() method > > to > > > > > >>>> table environment > > > > > >>>> > > > > > >>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity > > and > > > > > >>>> concise API. Users need to load the module anyway with > > properties. > > > > > >>>> They can also load them "in order" immediately. CREATE TABLE > can > > > > also > > > > > >>>> not create multiple tables but only one at a time in that > order. > > > > > >>>> > > > > > >>>> One thing that came to my mind, shall we use a descriptor > > approach > > > > for > > > > > >>>> loadModule()? The past has shown that passing instances causes > > > > > >>>> problems when persisting objects. That's why we also want to > get > > > rid > > > > > >>>> of registerTableSource. I could image that users might want to > > > > persist > > > > > >>>> a table environment's state for later use in the future. Even > > > though > > > > > >>>> this is future work, we should already keep such use cases in > > mind > > > > > >>>> when adding new API methods. What do you think? > > > > > >>>> > > > > > >>>> Regards, > > > > > >>>> Timo > > > > > >>>> > > > > > >>>> > > > > > >>>> On 30.09.19 23:17, Bowen Li wrote: > > > > > >>>>> Hi Timo, > > > > > >>>>> > > > > > >>>>> Re 1) I agree. I renamed the title to "Extend Core Table > System > > > > with > > > > > >>>>> Pluggable Modules" and all internal references > > > > > >>>>> > > > > > >>>>> Re 2) First, I'll rename the API to useModules(). The design > > > > doesn't > > > > > >>>>> forbid > > > > > >>>>> users to call useModules() multi times. Objects in modules > are > > > > loaded > > > > > >>> on > > > > > >>>>> demand instead of eagerly, so there won't be inconsistency. > > Users > > > > > >>>>> have to > > > > > >>>>> be fully aware of the consequences of resetting modules as > that > > > > might > > > > > >>>>> cause > > > > > >>>>> that some objects can not be referenced anymore or resolution > > > order > > > > > >>>>> of some > > > > > >>>>> objects changes. > > > > > >>>>> > > > > > >>>>> Re 3) Yes, we'd leave that to users. > > > > > >>>>> > > > > > >>>>> Another approach can be to have a non-optional "Core" module > > for > > > > all > > > > > >>>>> objects that cannot be overrode like "CAST" and "AS" > functions, > > > and > > > > > >>>>> have an > > > > > >>>>> optional "ExtendedCore" module for other replaceable built-in > > > > > objects. > > > > > >>>>> "Core" should be positioned at the 1st in module list all the > > > time. > > > > > >>>>> > > > > > >>>>> I'm fine with either solution. > > > > > >>>>> > > > > > >>>>> Re 4) It may sound like a nice-to-have advanced feature for > > 1.10, > > > > but > > > > > >>> we > > > > > >>>>> can surely fully discuss it for the sake of feature > > completeness. > > > > > >>>>> > > > > > >>>>> Unlike other configs, the order of modules would matter in > > Flink, > > > > and > > > > > >>> it > > > > > >>>>> implies the LOAD/UNLOAD commands would not be equal in > > operation > > > > > >>>>> positions. > > > > > >>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x to > > the > > > > end > > > > > >>> of > > > > > >>>>> module list, and UNLOAD MODULE 'x' would be interpreted as > > > > removing x > > > > > >>>>> from > > > > > >>>>> any position in the list? > > > > > >>>>> > > > > > >>>>> I'm thinking of the following list of commands: > > > > > >>>>> > > > > > >>>>> SHOW MODULES - list modules in order > > > > > >>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and > > > append > > > > > the > > > > > >>>>> module to end of the module list > > > > > >>>>> UNLOAD MODULE 'hive' - remove the module from module list, > and > > > > other > > > > > >>>>> modules remain the same relative positions > > > > > >>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y' > > > 'z'"?), > > > > > >>>>> or USE > > > > > >>>>> MODULES 'x,y,z' - to reorder module list completely > > > > > >>>>> > > > > > >>> > > > > > > > > > > > > > > > > > > -- > > > > Xuefu Zhang > > > > > > > > "In Honey We Trust!" > > > > > > > > > > -- Xuefu Zhang "In Honey We Trust!"