Thanks for the nice summary Jane. The summary looks great. Some minor
feedback:
- Remove the `used` column for SHOW MODULES. It will always show true.
- `List<Pair<String, Boolean>> listFullModules()` is a very long
signature. And `Pair` should be avoided in code because it is not very
descriptive. How about creating a POJO (static inner class of
ModuleManager) called `ModuleEntry` or similar.
Otherwise +1 for the proposal.
Regards,
Timo
On 03.02.21 11:24, Jane Chan wrote:
Hi everyone,
I did a summary on the Jira issue page [1] since the discussion has
achieved a consensus. If there is anything missed or not corrected, please
let me know.
[1] https://issues.apache.org/jira/browse/FLINK-21045#
Best,
Jane
On Wed, Feb 3, 2021 at 1:33 PM Jark Wu <imj...@gmail.com> wrote:
Hi Jane,
Yes. I think we should fail fast.
Best,
Jark
On Wed, 3 Feb 2021 at 12:06, Jane Chan <qingyue....@gmail.com> wrote:
Hi everyone,
Thanks for the discussion to make this improvement plan clearer.
Hi, @Jark, @Rui, and @Timo, I'm collecting the final discussion summaries
now and want to confirm one thing that for the statement `USE MODULES x
[,
y, z, ...]`, if the module name list contains an unexsited module, shall
we
#1 fail the execution for all of them or #2 enabled the rest modules and
return a warning to users? My personal preference goes to #1 for
simplicity. What do you think?
Best,
Jane
On Tue, Feb 2, 2021 at 3:53 PM Timo Walther <twal...@apache.org> wrote:
+1
@Jane Can you summarize our discussion in the JIRA issue?
Thanks,
Timo
On 02.02.21 03:50, Jark Wu wrote:
Hi Timo,
Another question is whether a LOAD operation also adds the module to
the
enabled list by default?
I would like to add the module to the enabled list by default, the
main
reasons are:
1) Reordering is an advanced requirement, adding modules needs
additional
USE statements with "core" module
sounds too burdensome. Most users should be satisfied with only
LOAD
statements.
2) We should keep compatible for TableEnvironment#loadModule().
3) We are using the LOAD statement instead of CREATE, so I think it's
fine
that it does some implicit things.
Best,
Jark
On Tue, 2 Feb 2021 at 00:48, Timo Walther <twal...@apache.org>
wrote:
Not the module itself but the ModuleManager should handle this case,
yes.
Regards,
Timo
On 01.02.21 17:35, Jane Chan wrote:
+1 to Jark's proposal
To make it clearer, will `module#getFunctionDefinition()`
return
empty
suppose the module is loaded but not enabled?
Best,
Jane
On Mon, Feb 1, 2021 at 10:02 PM Timo Walther <twal...@apache.org>
wrote:
+1 to Jark's proposal
I like the difference between just loading and actually enabling
these
modules.
@Rui: I would use the same behavior as catalogs here. You cannot
`USE` a
catalog without creating it before.
Another question is whether a LOAD operation also adds the module
to
the
enabled list by default?
Regards,
Timo
On 01.02.21 13:52, Rui Li wrote:
If `USE MODULES` implies unloading modules that are not listed,
does
it
also imply loading modules that are not previously loaded,
especially
since
we're mapping modules by name now?
On Mon, Feb 1, 2021 at 8:20 PM Jark Wu <imj...@gmail.com> wrote:
I agree with Timo that the USE implies the specified modules are
in
use
in
the specified order and others are not used.
This would be easier to know what's the result list and order
after
the
USE
statement.
That means: if current modules in order are x, y, z. And `USE
MODULES
z, y`
means current modules in order are z, y.
But I would like to not unload the unmentioned modules in the
USE
statement. Because it seems strange that USE
will implicitly remove modules. In the above example, the user
may
type
the
wrong modules list using USE by mistake
and would like to declare the list again, the user has to
create
the
module again with some properties he may don't know. Therefore,
I
propose
the USE statement just specifies the current module lists and
doesn't
unload modules.
Besides that, we may need a new syntax to list all the modules
including
not used but loaded.
We can introduce SHOW FULL MODULES for this purpose with an
additional
`used` column.
For example:
Flink SQL> list modules:
-----------
| modules |
-----------
| x |
| y |
| z |
-----------
Flink SQL> USE MODULES z, y;
Flink SQL> show modules:
-----------
| modules |
-----------
| z |
| y |
-----------
Flink SQL> show FULL modules;
-------------------
| modules | used |
-------------------
| z | true |
| y | true |
| x | false |
-------------------
Flink SQL> USE MODULES z, y, x;
Flink SQL> show modules;
-----------
| modules |
-----------
| z |
| y |
| x |
-----------
What do you think?
Best,
Jark
On Mon, 1 Feb 2021 at 19:02, Jane Chan <qingyue....@gmail.com>
wrote:
Hi Timo, thanks for the discussion.
It seems to reach an agreement regarding #3 that <1> Module
name
should
better be a simple identifier rather than a string literal. <2>
Property
`type` is redundant and should be removed, and mapping will
rely
on
the
module name because loading a module multiple times just using
a
different
module name doesn't make much sense. <3> We should migrate to
the
newer
API
rather than the deprecated `TableFactory` class.
Regarding #1, I think the point lies in whether changing the
resolution
order implies an `unload` operation explicitly (i.e., users
could
sense
it). What do others think?
Best,
Jane
On Mon, Feb 1, 2021 at 6:41 PM Timo Walther <
twal...@apache.org>
wrote:
IMHO I would rather unload the not mentioned modules. The
statement
expresses `USE` that implicilty implies that the other modules
are
"not
used". What do others think?
Regards,
Timo
On 01.02.21 11:28, Jane Chan wrote:
Hi Jark and Rui,
Thanks for the discussions.
Regarding #1, I'm fine with `USE MODULES` syntax, and
It can be interpreted as "setting the current order of
modules",
which
is
similar to "setting the current catalog" for `USE CATALOG`.
I would like to confirm that the unmentioned modules remain
in
the
same
relative order? E.g., if there are three loaded modules `X`,
`Y`,
`Z`,
then
`USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`.
Regarding #3, I'm fine with mapping modules purely by name,
and I
think
Jark raised a good point on making the module name a simple
identifier
instead of a string literal. For backward compatibility,
since
we
haven't
supported this syntax yet, the affected users are those who
defined
modules
in the YAML configuration file. Maybe we can eliminate the
'type'
from
the
'requiredContext' to make it optional. Thus the proposed
mapping
mechanism
could use the module name to lookup the suitable factory,
and
in
the
meanwhile updating documentation to encourage users to
simplify
their
YAML
configuration. And in the long run, we can deprecate the
'type'.
Best,
Jane
On Mon, Feb 1, 2021 at 4:19 PM Rui Li <lirui.fu...@gmail.com
wrote:
Thanks Jane for starting the discussion.
Regarding #1, I also prefer `USE MODULES` syntax. It can be
interpreted
as
"setting the current order of modules", which is similar to
"setting
the
current catalog" for `USE CATALOG`.
Regarding #3, I'm fine to map modules purely by name
because I
think
it
satisfies all the use cases we have at hand. But I guess we
need
to
make
sure we're backward compatible, i.e. users don't need to
change
their
yaml
files to configure the modules.
On Mon, Feb 1, 2021 at 3:10 PM Jark Wu <imj...@gmail.com>
wrote:
Thanks Jane for the summary and starting the discussion in
the
mailing
list.
Here are my thoughts:
1) syntax to reorder modules
I agree with Rui Li it would be quite useful if we can have
some
syntax
to
reorder modules.
I slightly prefer `USE MODULES x, y, z` than `RELOAD
MODULES
x,
y,
z`,
because USE has a more sense of effective and specifying
ordering,
than
RELOAD.
From my feeling, RELOAD just means we unregister and
register
x,y,z
modules
again,
it sounds like other registered modules are still in use
and
in
the
order.
3) mapping modules purely by name
This can definitely improve the usability of loading
modules,
because
the 'type=' property
looks really redundant. We can think of this as a syntax
sugar
that
the
default type value is the module name.
And we can support to specify 'type=' property in the
future
to
allow
multiple modules for one module type.
Besides, I would like to mention one more change, that the
module
name
proposed in FLIP-68 is a string literal.
But I think we are all on the same page to change it into a
simple
(non-compound) identifier.
LOAD/UNLOAD MODULE 'core'
==>
LOAD/UNLOAD MODULE core
Best,
Jark
On Sat, 30 Jan 2021 at 04:00, Jane Chan <
qingyue....@gmail.com
wrote:
Hi everyone,
I would like to start a discussion on FLINK-21045 [1]
about
supporting
`LOAD MODULE` and `UNLOAD MODULE` SQL syntax. It's first
proposed
by
FLIP-68 [2] as following.
-- load a module with the given name and append it to the
end
of
the
module
list
LOAD MODULE 'name' [WITH ('type'='xxx', 'prop'='myProp',
...)]
--unload a module by name from the module list and other
modules
remain
in
the same relative positions
UNLOAD MODULE 'name'
After a round of discussion on the Jira ticket, it seems
some
unanswered
questions need more opinions and suggestions.
1. The way to redefine resolution order easily
Rui Li suggested introducing `USE MODULES` and
adding
similar
functionality to the API because
1) It's very tedious to unload old modules just to
reorder
them.
2) Users may not even know how to "re-load" an old
module
if it
was
not
initially loaded by the user, e.g. don't know which type
to
use.
Jane Chan wondered that module is not like the
catalog
which
has
a
concept of namespace could specify, and `USE` sounds like
a
mutual-exclusive concept.
Maybe `RELOAD MODULES` can express upgrading the
priority of
the
loaded
module(s).
2. `LOAD/UNLOAD MODULE` v.s. `CREATE/DROP MODULE` syntax
Jark Wu and Nicholas Jiang proposed to use
`CREATE/DROP
MODULE`
instead
of `LOAD/UNLOAD MODULE` because
1) From a pure SQL user's perspective, maybe `CREATE
MODULE +
USE
MODULE`
is easier to use rather than `LOAD/UNLOAD`.
2) This will be very similar to what the catalog
used
now.
Timo Walther would rather stick to the agreed design
because
loading/unloading modules is a concept known from kernels
etc.
3. Simplify the module design by mapping modules purely by
name
LOAD MODULE geo_utils
LOAD MODULE hive WITH ('version'='2.1') -- no dedicated
'type='/'module='
but allow only 1 module to be loaded parameterized
UNLOAD hive
USE MODULES hive, core
Please find more details in the reference link. Looking
forward
to
your
feedback.
[1] https://issues.apache.org/jira/browse/FLINK-21045#
<
https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Pluggable+Modules
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Pluggable+Modules
Best,
Jane
--
Best regards!
Rui Li