Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements

Timo Walther Tue, 28 Feb 2023 06:23:27 -0800

Hi Ran Tao,

Thanks for working on this FLIP. The FLIP is in a pretty good shapealready and I don't have much to add.

Will we also support ILIKE in queries? Or is this a pure DDLexpressions? For consistency, we should support it in SELECT and TableAPI as well. I hope this is not too much effort. I hope everybody isaware that ILIKE is not standard compliant but seems to be used by avariety of vendors.

> Because it may be modified under discuss, we put it on the googledocs. please see FLIP-297: Improve Auxiliary Sql Statements Docs

This comment confused me. It would be nice to have the Wiki page as thesingle source of truth and abandon the Google doc. In the past we usedGoogle Docs more but nowadays I support using only the Wiki to avoid anyconfusion.


Regards,
Timo


On 28.02.23 12:51, Ran Tao wrote:

thanks Sergey, sounds good.
You can add in FLIP ticket[1].

[1] https://issues.apache.org/jira/browse/FLINK-31256

Best Regards,
Ran Tao
https://github.com/chucheng92


Sergey Nuyanzin <[email protected]> 于2023年2月28日周二 19:44写道：

Currently I think we can load from the jar and check the services file to
get the connector type. but is it necessary we may continue to discuss.
Hi, Sergey, WDYT？


Another idea is FactoryUtil#discoverFactories and
check if it implements DynamicTableSourceFactory or DynamicTableSinkFactory
with versions it could be trickier...
Moreover it seems the version could be a part of the name sometimes[1].
I think name and type could be enough or please correct me if I'm wrong

or can we open a single ticket under this FLIP?

I have a relatively old jira issue[2] for showing connectors with a poc pr.
Could I propose to move this jira issue as a subtask under the FLIP one and
revive it?

[1]

https://github.com/apache/flink/blob/161014149e803bfd1d3653badb230b2ed36ce3cb/flink-table/flink-table-common/src/main/java/org/apache/flink/table/factories/Factory.java#L65-L69
[2] https://issues.apache.org/jira/browse/FLINK-25788

On Tue, Feb 28, 2023 at 11:56 AM Ran Tao <[email protected]> wrote:

Hi, Jark. thanks.

About ILIKE

I have updated the FLIP for ILIKE support (Including existing showTables

showColumns how to change).

About show connectors @Sergey,

Currently I think we can load from the jar and check the services file to
get the connector type. but is it necessary we may continue to discuss.
Hi, Sergey, WDYT？or can we open a single ticket under this FLIP?


Best Regards,
Ran Tao


Jark Wu <[email protected]> 于2023年2月28日周二 17:45写道：

Besides, if we introduce the ILIKE, we should also add this feature for
the previous SHOW with LIKE statements. They should be included in this
FLIP.

Best,
Jark

2023年2月28日 17:40，Jark Wu <[email protected]> 写道：

Hi Ran,

Could you add descriptions about what’s the behavior and differences

between the LIKE and ILIKE?


Besides, I don’t see the SHOW CONNECTOR syntax and description and

how

it works in the FLIP. Is it intended to be included in this FLIP?


Best,
Jark

2023年2月28日 10:58，Ran Tao <[email protected]> 写道：

Hi, guys. thanks for advices.

allow me to make a small summary:

1.Support ILIKE
2.Using catalog api to support show operations
3.Need a dedicated FLIP try to support INFORMATION_SCHEMA
4.Support SHOW CONNECTORS

If there are no other questions, i will try to start a VOTE for this

FLIP.

WDYT?

Best Regards,
Ran Tao


Sergey Nuyanzin <[email protected]> 于2023年2月27日周一 21:12写道：

Hi Jark,

thanks for your comment.

Considering they
are orthogonal and information schema requires more complex design

and

discussion, it deserves a separate FLIP

I'm ok with a separate FLIP for INFORMATION_SCHEMA.

Sergey, are you willing to contribute this FLIP?

Seems I need to have more research done for that.
I would try to help/contribute here


On Mon, Feb 27, 2023 at 3:46 AM Ran Tao <[email protected]>

wrote:

HI, Jing. thanks.

@about ILIKE, from my collections of some popular engines founds

that

just

snowflake has this syntax in show with filtering.
do we need to support it? if yes, then current some existed show

operations

need to be addressed either.
@about ShowOperation with like. it's a good idea. yes, two

parameters

for

constructor can work. thanks for your advice.


Best Regards,
Ran Tao


Jing Ge <[email protected]> 于2023年2月27日周一 06:29写道：

Hi,

@Aitozi

This is exactly why LoD has been introduced: to avoid exposing

internal

structure(2nd and lower level API).

@Jark

IMHO, there is no conflict between LoD and "High power-to-weight

ratio"

with the given example, List.subList() returns List interface

itself,

no

internal or further interface has been exposed. After offering
tEvn.getCatalog(), "all" methods in Catalog Interface have been

provided

by

TableEnvironment(via getCatalog()). From user's perspective and

maintenance

perspective there is no/less difference between providing them

directly

via

TableEnvironment or via getCatalog(). They are all exposed. Using
getCatalog() will reduce the number of boring wrapper methods,

but

on

the

other hand not every method in Catalog needs to be exposed, so

the

number

of wrapper methods would be limited/less, if we didn't expose

Catalog.

Nevertheless, since we already offered getCatalog(), it makes

sense

to

continue using it. The downside is the learning effort for users

they

have to know that listDatabases() is hidden in Catalog, go to

another

haystack and then find the needle in there.

+1 for Information schema with a different FLIP. From a design

perspective,

information schema should be the first choice for most cases and

easy

to

use. Catalog, on the other hand, will be more powerful and offer

more

advanced features.

Best regards,
Jing


On Sat, Feb 25, 2023 at 3:57 PM Jark Wu <[email protected]>

wrote:

Hi Sergey,

I think INFORMATION_SCHEMA is a very interesting idea, and I

hope

we

can

support it. However, it doesn't conflict with the idea of

auxiliary

statements. I can see different benefits of them. The

information

schema

provides powerful and flexible capabilities but needs to learn

the

complex

entity relationship[1]. The auxiliary SQL statements are super

handy

and

can resolve most problems, but they offer limited features.

I can see almost all the mature systems support both of them. I

think

it

also makes sense to support both of them in Flink. Considering

they

are orthogonal and information schema requires more complex

design

and

discussion, it deserves a separate FLIP. Sergey, are you willing

to

contribute this FLIP?

Best,
Jark

[1]:

https://docs.databricks.com/sql/language-manual/sql-ref-information-schema.html



On Fri, 24 Feb 2023 at 22:43, Ran Tao <[email protected]>

wrote:

Thanks John.

It seems that most people prefer the information_schema

implementation.

information_schema does have more benefits (however, the show

operation

is

also an option and supplement).
Otherwise, the sql syntax and keywords may be changed

frequently.

Of course, it will be more complicated than the extension of

the

show

operation.
It is necessary to design various tables in information_schema,

which

may

take a period of effort.

I will try to design the information_schema and integrate it

with

flink.

This may be a relatively big feature for me. I hope you guys

can

give

comments and opinions later.
Thank you all.

Best Regards,
Ran Tao


John Roesler <[email protected]> 于2023年2月24日周五 21:53写道：

Hello Ran,

Thanks for the FLIP!

Do you mind if we revisit the topic of doing this by adding an

Information

schema? The SHOW approach requires modifying the

parser/language

for

every

gap we identify. On the flip side, an Information schema is

much

easier

to

discover and remember how to use, and the ability to run

queries

on

it

is

quite valuable for admins. It’s also better for Flink

maintainers,

because

the information tables’ schemas can be evolved over time just

like

regular

tables, whereas every change to a SHOW statement would be a

breaking

change.

I understand that it may seem like a big effort, but we’re

proposing

quite

a big extension to the space of SHOW statement, so it seems

appropriate

to

take the opportunity and migrate to a better framework rather

than

incrementally building on (and tying us even more firmly to)

the

existing

approach.

Thanks for your consideration,
John

On Fri, Feb 24, 2023, at 05:58, Sergey Nuyanzin wrote:

thanks for explanation

But it's not clear to me what exactly
you want to display? Is it the name of the plugin?


I was thinking about name, type (source/sink) and may be

version

(not

sure

if it's possible right now)

On Fri, Feb 24, 2023 at 12:46 PM Ran Tao <

[email protected]>

wrote:

Hi, Sergey. thanks. first step we can support filtering for

show

operations

in this FLIP try to align other engines.
If we want to support describe other objects, of course we

need

to

design

how to get these metadatas or informations and printAsStyle.

(So

it

maybe

need another FLIP for more details).

Does it make sense to add support for connectors e.g. show
{sink|source|all} connectors?

I think we can support it, currently flink do support some

operations

only

for flink itself such as showJobs. But it's not clear to me

what

exactly

you want to display? Is it the name of the plugin?
Just Like:
Kafka
Hudi
Files

Best Regards,
Ran Tao


Sergey Nuyanzin <[email protected]> 于2023年2月24日周五

19:11写道：

Thanks for driving the FLIP

I have a couple of questions
Am I right that INFORMATION_SCHEMA mentioned by Timo[1]  is

out

of

scope

of

this FLIP?
I noticed there are some support of it in

Spark[2]/Hive[3]/Snowflake[4]

and

others

Does it make sense to add support for connectors e.g. show
{sink|source|all} connectors?

[1]

https://lists.apache.org/thread/2g108qlfwbhb56wnoc5qj0yq29dvq1vv

[2] https://issues.apache.org/jira/browse/SPARK-16452
[3] https://issues.apache.org/jira/browse/HIVE-1010
[4]

https://docs.snowflake.com/en/sql-reference/info-schema



On Fri, Feb 24, 2023 at 4:19 AM Jark Wu <[email protected]>

wrote:

Hi Jing,

we'd better reduce the dependency chain and follow the

Law

of

Demeter(LoD, clean code).

Adding a new method in TableEnvironment is therefore

better

than

calling

an API chain

I think I don't fully agree that LoD is a good practice.

Actually, I

would

prefer to keep the API clean and concise.
This is also the Java Collection Framework design

principle

[1]:

"High

power-to-weight ratio". Otherwise,
it will explode the API interfaces with different

combinations

of

methods.

Currently, TableEnvironment
already provides 60+ methods.

IMO, with the increasing methods of accessing and

manipulating

metadata,

they can be extracted to
a separate interface, where we can add richer methods.

This

work

can be

aligned with the
CatalogManager interface (FLIP-295) [2].

Best,
Jark

[1]:

https://stackoverflow.com/questions/7568819/why-no-tail-or-head-method-in-list-to-get-last-or-first-element

[2]:

https://lists.apache.org/thread/9bnjblgd9wvrl75lkm84oo654c4lqv70



On Fri, 24 Feb 2023 at 10:38, Aitozi <

[email protected]>

wrote:

Hi,
   Thanks for the nice proposal, Ran.
   Regarding this api usage, I have some discussion

with

@twalthr

before

as here <

https://github.com/apache/flink/pull/15137#issuecomment-1356124138

Personally, I think leaking the Catalog to the user side

is

not

suitable,

since there are some read/write operations in the

Catalog,

the

TableEnvironment#getCatalog will expose all of them to

the

user

side.

So

learned to add a new api in TableEnvironment to reduce

reliance

on

the

current TableEnvironment#getCatalog.

Thanks,
Aitozi


Ran Tao <[email protected]> 于2023年2月23日周四 23:44写道：

Hi, JingSong, Jing.

thank for sharing your opinions.

What you say makes sense, both approaches have pros

and

cons.


If it is a modification of `TableEnvrionment`, such as
listDatabases(catalog). It is actually consistent with

the

other

overloaded

methods before,
and defining this method means that TableEnvrionment

does

provide

this

capability (rather than relying on the functionality

of

another

class).

The disadvantage is that api changes may be required,

and

may

continue

to

be modified in the future.
But from the TableEnvrionment itself, it really

doesn't

pay

attention

to

how the underlying layer is implemented.
(Although it is actually taken from the catalogManager

at

present,

this

is

another question)

Judging from the current dependencies,

flink-table-api-java

strongly

relies

on flink-table-common to use various common classes

and

interfaces,

especially the catalog interface.
So we can extract various metadata information in the

catalog

through

`tEnv.getCatalog`.
The advantage is that it will not cause api

modification,

but

this

method

of use breaks LoD.
In fact, the current flink-table-api-java design is

completely

bound

to

the

catalog interface.

If the mandatory modification of PublicApi is

constrained

(may

be

modified

again and later), I tend to use `tEnv.getCatalog`

directly,

otherwise

It would actually be more standard to add overloaded

methods

to

`TableEnvrionment`.

Another question, can the later capabilities of

TableEnvrionment be

implemented through SupportXXX?
In order to solve the problem that the method needs to

be

added

in

the

future. This kind of usage occurs frequently in flink.

Looking forward to your other considerations,
and also try to wait to see if there are other

relevant

API

designers

or

committers to provide comments.


Best Regards,
Ran Tao

Jing Ge <[email protected]> 于2023年2月23日周四

18:58写道：

Hi Jingson,

Thanks for sharing your thoughts. Please see my

reply

below.


On Thu, Feb 23, 2023 at 10:16 AM Jingsong Li <

[email protected]

wrote:

Hi Jing Ge,

First, flink-table-common contains all common

classes

of

Flink

Table,

I think it is hard to bypass its dependence.


If any time when we use flink-table-api-java, we

have

to

cross

through

flink-table-api-java and use flink-table-common, we

should

reconsider

the

design of these two modules and how

interfaces/classes

are

classified

into

those modules.


Secondly, almost all methods in Catalog looks

useful

to

me,

so

if

we

are following LoD, we should add all methods again

to

TableEnvironment. I think it is redundant.


That is the enlarged issue I mentioned previously. A

simple

solution

is

to

move Catalog to the top level API. The fact is that

catalog

package

already exists in flink-table-api-java but the

Catalog

interface

is

in

flink-table-common. I don't know the historical

context

of

this

design.

Maybe you could share some insight with us? Thanks

in

advance.

Beyond

that,

there should be other AOP options but need more time

to

figure it

out.


And, this API chain does not look deep.
-

"tEnv.getCatalog(tEnv.getCurrentCatalog()).get().listDatabases()"

looks a little complicated. The complex part is

ahead.

- If we have a method to get Catalog directly, can

be

simplify

to

"tEnv.catalog().listDatabase()", this is simple.


Commonly, it will need more effort to always follow

LoD,

but

for

the

top

level facade API like TableEnvironment, both the API

developer,

API

consumer and the project itself from a long-term

perspective

will

benefit

from sticking to LoD. Since we already have the

getCatalog(String

catalog)

method in TableEnvironment, it also makes sense to

follow

your

suggestion,

if we only want to limit/avoid public API changes.

But

please

be

aware

that

we will have all known long-term drawbacks because

of

LoD

violation, especially the cross modules violation. I

just

checked

all

usages of getCatalog(String catalog) in the master

branch.

Currently

there

are very limited calls. It is better to pay

attention

to

it

before

it

goes

worse. Just my 2 cents. :)


Best,
Jingsong

On Thu, Feb 23, 2023 at 4:47 PM Jing Ge

<[email protected]

wrote:


Hi Jingson,

Thanks for the knowledge sharing. IMHO, it looks

more

like a

design

guideline question than just avoiding public API

change.

Please

correct

me

if I'm wrong.

Catalog is in flink-table-common module and

TableEnvironment

is

in

flink-table-api-java. Depending on how and where

those

features

proposed

in

this FLIP will be used, we'd better reduce the

dependency

chain

and

follow

the Law of Demeter(LoD, clean code) [1]. Adding

new

method

in

TableEnvironment is therefore better than

calling

an

API

chain.

It

is

also

more user friendly for the caller, because there

is

no

need

to

understand

the internal structure of the called API. The

downside

of

doing

this

is

that we might have another issue with the

current

TableEnvironment

design -

the TableEnvironment interface got enlarged with

more

wrapper

methods.

This

is a different issue that could be solved with

improved

abstraction

design

in the future. After considering pros and cons,

if

we

want to

add

those

features now, I would prefer following LoD than

API

chain

calls.

WDYT?


Best regards,
Jing

[1]

https://hackernoon.com/object-oriented-tricks-2-law-of-demeter-4ecc9becad85


On Thu, Feb 23, 2023 at 6:26 AM Ran Tao <

[email protected]

wrote:

Hi Jingsong. thanks. i got it.
In this way, there is no need to introduce new

API

changes.


Best Regards,
Ran Tao


Jingsong Li <[email protected]>

于2023年2月23日周四

12:26写道：

Hi Ran,

I mean we can just use

TableEnvironment.getCatalog(getCurrentCatalog).get().listDatabases().


We don't need to provide new apis just for

utils.


Best,
Jingsong

On Thu, Feb 23, 2023 at 12:11 PM Ran Tao <

[email protected]>

wrote:


Hi Jingsong, thanks.

The implementation of these statements in

TableEnvironmentImpl

is

called

through the catalog api.

but it does support some new override

methods

on

the

catalog

api

side,

and

I will update it later. Thank you.

e.g.
TableEnvironmentImpl
   @Override
   public String[] listDatabases() {
       return catalogManager

.getCatalog(catalogManager.getCurrentCatalog())

               .get()
               .listDatabases()
               .toArray(new String[0]);
   }

Best Regards,
Ran Tao


Jingsong Li <[email protected]>

于2023年2月23日周四

11:47写道：

Thanks for the proposal.

+1 for the proposal.

I am confused about "Proposed

TableEnvironment

SQL

API

Changes",

can

we just use catalog api for this

requirement?


Best,
Jingsong

On Thu, Feb 23, 2023 at 10:48 AM Jacky

Lau <

[email protected]

wrote:


Hi Ran:
Thanks for driving the FLIP. the

google

doc

looks

really

good.

it

is

important to improve user interactive

experience.

+1

to

support

this

feature.

Jing Ge <[email protected]>

于2023年2月23日周四

00:51写道：

Hi Ran,

Thanks for driving the FLIP.  It

looks

overall

good.

Would

you

like to

add

a description of useLike and

notLike?

guess

useLike

true

is for

"LIKE"

and notLike true is for "NOT LIKE"

but I

am

not

sure

if I

understood it

correctly. Furthermore, does it make

sense

to

support

"ILIKE"

too?


Best regards,
Jing

On Wed, Feb 22, 2023 at 1:17 PM Ran

Tao

[email protected]>

wrote:

Currently flink sql auxiliary

statements

has

supported

some

good

features

such as catalog/databases/table

support.


But these features are not very

complete

compared

with

other

popular

engines such as spark, presto,

hive

and

commercial

engines

such

as

snowflake.

For example, many engines support

show

operation

with

filtering

except

flink, and support describe other

object(flink

only

support

describe

table).

I wonder can we add these useful

features

for

flink?

You can find details in this

doc.[1]

or

FLIP.[2]


Also, please let me know if there

is a

mistake.

Looking

forward

to

your

reply.

[1]

https://docs.google.com/document/d/1hAiOfPx14VTBTOlpyxG7FA2mB1k5M31VnKYad2XpJ1I/

[2]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements


Best Regards,
Ran Tao



--
Best regards,
Sergey



--
Best regards,
Sergey



--
Best regards,
Sergey



--
Best regards,
Sergey

Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements

Reply via email to