Currently I think we can load from the jar and check the services file to
get the connector type. but is it necessary we may continue to discuss.
Hi, Sergey, WDYT?
Another idea is FactoryUtil#discoverFactories and
check if it implements DynamicTableSourceFactory or DynamicTableSinkFactory
with versions it could be trickier...
Moreover it seems the version could be a part of the name sometimes[1].
I think name and type could be enough or please correct me if I'm wrong
or can we open a single ticket under this FLIP?
I have a relatively old jira issue[2] for showing connectors with a poc pr.
Could I propose to move this jira issue as a subtask under the FLIP one and
revive it?
[1]
https://github.com/apache/flink/blob/161014149e803bfd1d3653badb230b2ed36ce3cb/flink-table/flink-table-common/src/main/java/org/apache/flink/table/factories/Factory.java#L65-L69
[2] https://issues.apache.org/jira/browse/FLINK-25788
On Tue, Feb 28, 2023 at 11:56 AM Ran Tao <chucheng...@gmail.com> wrote:
Hi, Jark. thanks.
About ILIKE
I have updated the FLIP for ILIKE support (Including existing showTables
&
showColumns how to change).
About show connectors @Sergey,
Currently I think we can load from the jar and check the services file to
get the connector type. but is it necessary we may continue to discuss.
Hi, Sergey, WDYT?or can we open a single ticket under this FLIP?
Best Regards,
Ran Tao
Jark Wu <imj...@gmail.com> 于2023年2月28日周二 17:45写道:
Besides, if we introduce the ILIKE, we should also add this feature for
the previous SHOW with LIKE statements. They should be included in this
FLIP.
Best,
Jark
2023年2月28日 17:40,Jark Wu <imj...@gmail.com> 写道:
Hi Ran,
Could you add descriptions about what’s the behavior and differences
between the LIKE and ILIKE?
Besides, I don’t see the SHOW CONNECTOR syntax and description and
how
it works in the FLIP. Is it intended to be included in this FLIP?
Best,
Jark
2023年2月28日 10:58,Ran Tao <chucheng...@gmail.com> 写道:
Hi, guys. thanks for advices.
allow me to make a small summary:
1.Support ILIKE
2.Using catalog api to support show operations
3.Need a dedicated FLIP try to support INFORMATION_SCHEMA
4.Support SHOW CONNECTORS
If there are no other questions, i will try to start a VOTE for this
FLIP.
WDYT?
Best Regards,
Ran Tao
Sergey Nuyanzin <snuyan...@gmail.com> 于2023年2月27日周一 21:12写道:
Hi Jark,
thanks for your comment.
Considering they
are orthogonal and information schema requires more complex design
and
discussion, it deserves a separate FLIP
I'm ok with a separate FLIP for INFORMATION_SCHEMA.
Sergey, are you willing to contribute this FLIP?
Seems I need to have more research done for that.
I would try to help/contribute here
On Mon, Feb 27, 2023 at 3:46 AM Ran Tao <chucheng...@gmail.com>
wrote:
HI, Jing. thanks.
@about ILIKE, from my collections of some popular engines founds
that
just
snowflake has this syntax in show with filtering.
do we need to support it? if yes, then current some existed show
operations
need to be addressed either.
@about ShowOperation with like. it's a good idea. yes, two
parameters
for
constructor can work. thanks for your advice.
Best Regards,
Ran Tao
Jing Ge <j...@ververica.com.invalid> 于2023年2月27日周一 06:29写道:
Hi,
@Aitozi
This is exactly why LoD has been introduced: to avoid exposing
internal
structure(2nd and lower level API).
@Jark
IMHO, there is no conflict between LoD and "High power-to-weight
ratio"
with the given example, List.subList() returns List interface
itself,
no
internal or further interface has been exposed. After offering
tEvn.getCatalog(), "all" methods in Catalog Interface have been
provided
by
TableEnvironment(via getCatalog()). From user's perspective and
maintenance
perspective there is no/less difference between providing them
directly
via
TableEnvironment or via getCatalog(). They are all exposed. Using
getCatalog() will reduce the number of boring wrapper methods,
but
on
the
other hand not every method in Catalog needs to be exposed, so
the
number
of wrapper methods would be limited/less, if we didn't expose
Catalog.
Nevertheless, since we already offered getCatalog(), it makes
sense
to
continue using it. The downside is the learning effort for users
-
they
have to know that listDatabases() is hidden in Catalog, go to
another
haystack and then find the needle in there.
+1 for Information schema with a different FLIP. From a design
perspective,
information schema should be the first choice for most cases and
easy
to
use. Catalog, on the other hand, will be more powerful and offer
more
advanced features.
Best regards,
Jing
On Sat, Feb 25, 2023 at 3:57 PM Jark Wu <imj...@gmail.com>
wrote:
Hi Sergey,
I think INFORMATION_SCHEMA is a very interesting idea, and I
hope
we
can
support it. However, it doesn't conflict with the idea of
auxiliary
statements. I can see different benefits of them. The
information
schema
provides powerful and flexible capabilities but needs to learn
the
complex
entity relationship[1]. The auxiliary SQL statements are super
handy
and
can resolve most problems, but they offer limited features.
I can see almost all the mature systems support both of them. I
think
it
also makes sense to support both of them in Flink. Considering
they
are orthogonal and information schema requires more complex
design
and
discussion, it deserves a separate FLIP. Sergey, are you willing
to
contribute this FLIP?
Best,
Jark
[1]:
https://docs.databricks.com/sql/language-manual/sql-ref-information-schema.html
On Fri, 24 Feb 2023 at 22:43, Ran Tao <chucheng...@gmail.com>
wrote:
Thanks John.
It seems that most people prefer the information_schema
implementation.
information_schema does have more benefits (however, the show
operation
is
also an option and supplement).
Otherwise, the sql syntax and keywords may be changed
frequently.
Of course, it will be more complicated than the extension of
the
show
operation.
It is necessary to design various tables in information_schema,
which
may
take a period of effort.
I will try to design the information_schema and integrate it
with
flink.
This may be a relatively big feature for me. I hope you guys
can
give
comments and opinions later.
Thank you all.
Best Regards,
Ran Tao
John Roesler <vvcep...@apache.org> 于2023年2月24日周五 21:53写道:
Hello Ran,
Thanks for the FLIP!
Do you mind if we revisit the topic of doing this by adding an
Information
schema? The SHOW approach requires modifying the
parser/language
for
every
gap we identify. On the flip side, an Information schema is
much
easier
to
discover and remember how to use, and the ability to run
queries
on
it
is
quite valuable for admins. It’s also better for Flink
maintainers,
because
the information tables’ schemas can be evolved over time just
like
regular
tables, whereas every change to a SHOW statement would be a
breaking
change.
I understand that it may seem like a big effort, but we’re
proposing
quite
a big extension to the space of SHOW statement, so it seems
appropriate
to
take the opportunity and migrate to a better framework rather
than
incrementally building on (and tying us even more firmly to)
the
existing
approach.
Thanks for your consideration,
John
On Fri, Feb 24, 2023, at 05:58, Sergey Nuyanzin wrote:
thanks for explanation
But it's not clear to me what exactly
you want to display? Is it the name of the plugin?
I was thinking about name, type (source/sink) and may be
version
(not
sure
if it's possible right now)
On Fri, Feb 24, 2023 at 12:46 PM Ran Tao <
chucheng...@gmail.com>
wrote:
Hi, Sergey. thanks. first step we can support filtering for
show
operations
in this FLIP try to align other engines.
If we want to support describe other objects, of course we
need
to
design
how to get these metadatas or informations and printAsStyle.
(So
it
maybe
need another FLIP for more details).
Does it make sense to add support for connectors e.g. show
{sink|source|all} connectors?
I think we can support it, currently flink do support some
operations
only
for flink itself such as showJobs. But it's not clear to me
what
exactly
you want to display? Is it the name of the plugin?
Just Like:
Kafka
Hudi
Files
Best Regards,
Ran Tao
Sergey Nuyanzin <snuyan...@gmail.com> 于2023年2月24日周五
19:11写道:
Thanks for driving the FLIP
I have a couple of questions
Am I right that INFORMATION_SCHEMA mentioned by Timo[1] is
out
of
scope
of
this FLIP?
I noticed there are some support of it in
Spark[2]/Hive[3]/Snowflake[4]
and
others
Does it make sense to add support for connectors e.g. show
{sink|source|all} connectors?
[1]
https://lists.apache.org/thread/2g108qlfwbhb56wnoc5qj0yq29dvq1vv
[2] https://issues.apache.org/jira/browse/SPARK-16452
[3] https://issues.apache.org/jira/browse/HIVE-1010
[4]
https://docs.snowflake.com/en/sql-reference/info-schema
On Fri, Feb 24, 2023 at 4:19 AM Jark Wu <imj...@gmail.com>
wrote:
Hi Jing,
we'd better reduce the dependency chain and follow the
Law
of
Demeter(LoD, clean code).
Adding a new method in TableEnvironment is therefore
better
than
calling
an API chain
I think I don't fully agree that LoD is a good practice.
Actually, I
would
prefer to keep the API clean and concise.
This is also the Java Collection Framework design
principle
[1]:
"High
power-to-weight ratio". Otherwise,
it will explode the API interfaces with different
combinations
of
methods.
Currently, TableEnvironment
already provides 60+ methods.
IMO, with the increasing methods of accessing and
manipulating
metadata,
they can be extracted to
a separate interface, where we can add richer methods.
This
work
can be
aligned with the
CatalogManager interface (FLIP-295) [2].
Best,
Jark
[1]:
https://stackoverflow.com/questions/7568819/why-no-tail-or-head-method-in-list-to-get-last-or-first-element
[2]:
https://lists.apache.org/thread/9bnjblgd9wvrl75lkm84oo654c4lqv70
On Fri, 24 Feb 2023 at 10:38, Aitozi <
gjying1...@gmail.com>
wrote:
Hi,
Thanks for the nice proposal, Ran.
Regarding this api usage, I have some discussion
with
@twalthr
before
as here <
https://github.com/apache/flink/pull/15137#issuecomment-1356124138
Personally, I think leaking the Catalog to the user side
is
not
suitable,
since there are some read/write operations in the
Catalog,
the
TableEnvironment#getCatalog will expose all of them to
the
user
side.
So
I
learned to add a new api in TableEnvironment to reduce
reliance
on
the
current TableEnvironment#getCatalog.
Thanks,
Aitozi
Ran Tao <chucheng...@gmail.com> 于2023年2月23日周四 23:44写道:
Hi, JingSong, Jing.
thank for sharing your opinions.
What you say makes sense, both approaches have pros
and
cons.
If it is a modification of `TableEnvrionment`, such as
listDatabases(catalog). It is actually consistent with
the
other
overloaded
methods before,
and defining this method means that TableEnvrionment
does
provide
this
capability (rather than relying on the functionality
of
another
class).
The disadvantage is that api changes may be required,
and
may
continue
to
be modified in the future.
But from the TableEnvrionment itself, it really
doesn't
pay
attention
to
how the underlying layer is implemented.
(Although it is actually taken from the catalogManager
at
present,
this
is
another question)
Judging from the current dependencies,
flink-table-api-java
strongly
relies
on flink-table-common to use various common classes
and
interfaces,
especially the catalog interface.
So we can extract various metadata information in the
catalog
through
`tEnv.getCatalog`.
The advantage is that it will not cause api
modification,
but
this
method
of use breaks LoD.
In fact, the current flink-table-api-java design is
completely
bound
to
the
catalog interface.
If the mandatory modification of PublicApi is
constrained
(may
be
modified
again and later), I tend to use `tEnv.getCatalog`
directly,
otherwise
It would actually be more standard to add overloaded
methods
to
`TableEnvrionment`.
Another question, can the later capabilities of
TableEnvrionment be
implemented through SupportXXX?
In order to solve the problem that the method needs to
be
added
in
the
future. This kind of usage occurs frequently in flink.
Looking forward to your other considerations,
and also try to wait to see if there are other
relevant
API
designers
or
committers to provide comments.
Best Regards,
Ran Tao
Jing Ge <j...@ververica.com.invalid> 于2023年2月23日周四
18:58写道:
Hi Jingson,
Thanks for sharing your thoughts. Please see my
reply
below.
On Thu, Feb 23, 2023 at 10:16 AM Jingsong Li <
jingsongl...@gmail.com
wrote:
Hi Jing Ge,
First, flink-table-common contains all common
classes
of
Flink
Table,
I think it is hard to bypass its dependence.
If any time when we use flink-table-api-java, we
have
to
cross
through
flink-table-api-java and use flink-table-common, we
should
reconsider
the
design of these two modules and how
interfaces/classes
are
classified
into
those modules.
Secondly, almost all methods in Catalog looks
useful
to
me,
so
if
we
are following LoD, we should add all methods again
to
TableEnvironment. I think it is redundant.
That is the enlarged issue I mentioned previously. A
simple
solution
is
to
move Catalog to the top level API. The fact is that
a
catalog
package
already exists in flink-table-api-java but the
Catalog
interface
is
in
flink-table-common. I don't know the historical
context
of
this
design.
Maybe you could share some insight with us? Thanks
in
advance.
Beyond
that,
there should be other AOP options but need more time
to
figure it
out.
And, this API chain does not look deep.
-
"tEnv.getCatalog(tEnv.getCurrentCatalog()).get().listDatabases()"
looks a little complicated. The complex part is
ahead.
- If we have a method to get Catalog directly, can
be
simplify
to
"tEnv.catalog().listDatabase()", this is simple.
Commonly, it will need more effort to always follow
LoD,
but
for
the
top
level facade API like TableEnvironment, both the API
developer,
API
consumer and the project itself from a long-term
perspective
will
benefit
from sticking to LoD. Since we already have the
getCatalog(String
catalog)
method in TableEnvironment, it also makes sense to
follow
your
suggestion,
if we only want to limit/avoid public API changes.
But
please
be
aware
that
we will have all known long-term drawbacks because
of
LoD
violation, especially the cross modules violation. I
just
checked
all
usages of getCatalog(String catalog) in the master
branch.
Currently
there
are very limited calls. It is better to pay
attention
to
it
before
it
goes
worse. Just my 2 cents. :)
Best,
Jingsong
On Thu, Feb 23, 2023 at 4:47 PM Jing Ge
<j...@ververica.com.invalid
wrote:
Hi Jingson,
Thanks for the knowledge sharing. IMHO, it looks
more
like a
design
guideline question than just avoiding public API
change.
Please
correct
me
if I'm wrong.
Catalog is in flink-table-common module and
TableEnvironment
is
in
flink-table-api-java. Depending on how and where
those
features
proposed
in
this FLIP will be used, we'd better reduce the
dependency
chain
and
follow
the Law of Demeter(LoD, clean code) [1]. Adding
a
new
method
in
TableEnvironment is therefore better than
calling
an
API
chain.
It
is
also
more user friendly for the caller, because there
is
no
need
to
understand
the internal structure of the called API. The
downside
of
doing
this
is
that we might have another issue with the
current
TableEnvironment
design -
the TableEnvironment interface got enlarged with
more
wrapper
methods.
This
is a different issue that could be solved with
improved
abstraction
design
in the future. After considering pros and cons,
if
we
want to
add
those
features now, I would prefer following LoD than
API
chain
calls.
WDYT?
Best regards,
Jing
[1]
https://hackernoon.com/object-oriented-tricks-2-law-of-demeter-4ecc9becad85
On Thu, Feb 23, 2023 at 6:26 AM Ran Tao <
chucheng...@gmail.com
wrote:
Hi Jingsong. thanks. i got it.
In this way, there is no need to introduce new
API
changes.
Best Regards,
Ran Tao
Jingsong Li <jingsongl...@gmail.com>
于2023年2月23日周四
12:26写道:
Hi Ran,
I mean we can just use
TableEnvironment.getCatalog(getCurrentCatalog).get().listDatabases().
We don't need to provide new apis just for
utils.
Best,
Jingsong
On Thu, Feb 23, 2023 at 12:11 PM Ran Tao <
chucheng...@gmail.com>
wrote:
Hi Jingsong, thanks.
The implementation of these statements in
TableEnvironmentImpl
is
called
through the catalog api.
but it does support some new override
methods
on
the
catalog
api
side,
and
I will update it later. Thank you.
e.g.
TableEnvironmentImpl
@Override
public String[] listDatabases() {
return catalogManager
.getCatalog(catalogManager.getCurrentCatalog())
.get()
.listDatabases()
.toArray(new String[0]);
}
Best Regards,
Ran Tao
Jingsong Li <jingsongl...@gmail.com>
于2023年2月23日周四
11:47写道:
Thanks for the proposal.
+1 for the proposal.
I am confused about "Proposed
TableEnvironment
SQL
API
Changes",
can
we just use catalog api for this
requirement?
Best,
Jingsong
On Thu, Feb 23, 2023 at 10:48 AM Jacky
Lau <
liuyong...@gmail.com
wrote:
Hi Ran:
Thanks for driving the FLIP. the
google
doc
looks
really
good.
it
is
important to improve user interactive
experience.
+1
to
support
this
feature.
Jing Ge <j...@ververica.com.invalid>
于2023年2月23日周四
00:51写道:
Hi Ran,
Thanks for driving the FLIP. It
looks
overall
good.
Would
you
like to
add
a description of useLike and
notLike?
I
guess
useLike
true
is for
"LIKE"
and notLike true is for "NOT LIKE"
but I
am
not
sure
if I
understood it
correctly. Furthermore, does it make
sense
to
support
"ILIKE"
too?
Best regards,
Jing
On Wed, Feb 22, 2023 at 1:17 PM Ran
Tao
<
chucheng...@gmail.com>
wrote:
Currently flink sql auxiliary
statements
has
supported
some
good
features
such as catalog/databases/table
support.
But these features are not very
complete
compared
with
other
popular
engines such as spark, presto,
hive
and
commercial
engines
such
as
snowflake.
For example, many engines support
show
operation
with
filtering
except
flink, and support describe other
object(flink
only
support
describe
table).
I wonder can we add these useful
features
for
flink?
You can find details in this
doc.[1]
or
FLIP.[2]
Also, please let me know if there
is a
mistake.
Looking
forward
to
your
reply.
[1]
https://docs.google.com/document/d/1hAiOfPx14VTBTOlpyxG7FA2mB1k5M31VnKYad2XpJ1I/
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements
Best Regards,
Ran Tao
--
Best regards,
Sergey
--
Best regards,
Sergey
--
Best regards,
Sergey
--
Best regards,
Sergey