+1 for option 2
Regards,
Timo
On 02.03.21 03:50, Jark Wu wrote:
I prefer option#2 and I think this can make everyone happy.
Best,
Jark
On Mon, 1 Mar 2021 at 18:22, Shengkai Fang <fskm...@gmail.com> wrote:
Hi, everyone.
After the long discussion, I am fine with both choices. But I prefer the
second option that applies to both table modules and sql client. Just as
Timo said, the option `table.dml-sync` can improve the SQL script
portability. Users don't need to modify the script and execute the script
in different platforms e.g gateway.
What do you think? CC Timo, Jark, Leonard.
Best,
Shengkai.
Kurt Young <ykt...@gmail.com> 于2021年3月1日周一 下午5:11写道:
I'm +1 for either:
1. introduce a sql client specific option, or
2. Introduce a table config option and make it apply to both table
module &
sql client.
It would be the FLIP owner's call to decide.
Best,
Kurt
On Mon, Mar 1, 2021 at 3:25 PM Timo Walther <twal...@apache.org> wrote:
We could also think about reading this config option in Table API. The
effect would be to call `await()` directly in an execute call. I could
also imagine this to be useful esp. when you fire a lot of insert into
queries. We had the case before that users where confused that the
execution happens asynchronously, such an option could prevent this to
happen again.
Regards,
Timo
On 01.03.21 05:14, Kurt Young wrote:
I also asked some users about their opinion that if we introduce some
config prefixed with "table" but doesn't
have affection with methods in Table API and SQL. All of them are
kind
of
shocked by such question, asking
why we would do anything like this.
This kind of reaction actually doesn't surprise me a lot, so I jumped
in
and challenged this config option even
after the FLIP had already been accepted.
If we only have to define the execution behavior for multiple
statements
in
SQL client, we should only introduce
a config option which would tell users it's affection scope by its
name.
Prefixing with "table" is definitely not a good
idea here.
Best,
Kurt
On Fri, Feb 26, 2021 at 9:39 PM Leonard Xu <xbjt...@gmail.com>
wrote:
Hi, all
Look like there’s only one divergence about option [ table |
sql-client
].dml-sync in this thread, correct me if I’m wrong.
1. Leaving the context of this thread, from a user's perspective,
the table.xx configurations should take effect in Table API & SQL,
the sql-client.xx configurations should only take effect in
sql-client.
In my(the user's) opinion, other explanations are
counterintuitive.
2. It should be pointed out that both all existed table.xx
configurations
like table.exec.state.ttl, table.optimizer.agg-phase-strategy,
table.local-time-zone,etc.. and the proposed sql-client.xx
configurations
like sql-client.verbose, sql-client.execution.max-table-result.rows
comply with this convention.
3. Considering the portability to support different CLI tools
(sql-client,
sql-gateway, etc.), I prefer table.dml-sync.
In addition, I think sql-client/sql-gateway/other CLI tools can be
placed
out of flink-table module even in an external project, this should
not
affect our conclusion.
Hope this can help you.
Best,
Leonard
在 2021年2月25日,18:51,Shengkai Fang <fskm...@gmail.com> 写道:
Hi, everyone.
I do some summaries about the discussion about the option. If the
summary
has errors, please correct me.
`table.dml-sync`:
- take effect for `executeMultiSql` and sql client
- benefit: SQL script portability. One script for all platforms.
- drawback: Don't work for `TableEnvironment#executeSql`.
`table.multi-dml-sync`:
- take effect for `executeMultiSql` and sql client
- benefit: SQL script portability
- drawback: It's confused when the sql script has one dml statement
but
need to set option `table.multi-dml-sync`
`client.dml-sync`:
- take effect for sql client only
- benefit: clear definition.
- drawback: Every platform needs to define its own option. Bad SQL
script
portability.
Just as Jark said, I think the `table.dml-sync` is a good choice if
we
can
extend its scope and make this option works for `executeSql`.
It's straightforward and users can use this option now in table
api.
The
drawback is the `TableResult#await` plays the same role as the
option.
I
don't think the drawback is really critical because many systems
have
commands play the same role with the different names.
Best,
Shengkai
Timo Walther <twal...@apache.org> 于2021年2月25日周四 下午4:23写道:
The `table.` prefix is meant to be a general option in the table
ecosystem. Not necessarily attached to Table API or SQL Client.
That's
why SQL Client is also located in the `flink-table` module.
My main concern is the SQL script portability. Declaring the
sync/async
behavior will happen in many SQL scripts. And users should be
easily
switch from SQL Client to some commercial product without the need
of
changing the script again.
Sure, we can change from `sql-client.dml-sync` to `table.dml-sync`
later
but that would mean introducing future confusion. An app name
(what
`sql-client` kind of is) should not be part of a config option key
if
other apps will need the same kind of option.
Regards,
Timo
On 24.02.21 08:59, Jark Wu wrote:
From my point of view, I also prefer "sql-client.dml-sync",
because the behavior of this configuration is very clear.
Even if we introduce a new config in the future, e.g.
`table.dml-sync`,
we can also deprecate the sql-client one.
Introducing a "table." configuration without any implementation
will confuse users a lot, as they expect it should take effect on
the Table API.
If we want to introduce an unified "table.dml-sync" option, I
prefer
it should be implemented on Table API and affect all the DMLs on
Table API (`tEnv.executeSql`, `Table.executeInsert`,
`StatementSet`),
as I have mentioned before [1].
It would be very straightforward that it affects all the DMLs on
SQL
CLI
and
TableEnvironment (including `executeSql`, `StatementSet`,
`Table#executeInsert`, etc.).
This can also make SQL CLI easy to support this configuration by
passing
through to the TableEnv.
Best,
Jark
[1]:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html
On Wed, 24 Feb 2021 at 10:39, Kurt Young <ykt...@gmail.com>
wrote:
If we all agree the option should only be handled by sql client,
then
why
don't we
just call it `sql-client.dml-sync`? As you said, calling it
`table.dml-sync` but has no
affection in `TableEnv.executeSql("INSERT INTO")` will also
cause
a
big
confusion for
users.
The only concern I saw is if we introduce
"TableEnvironment.executeMultiSql()" in the
future, how do we control the synchronization between
statements?
TBH
I
don't really
see a strong requirement for such interfaces. Right now, we
have a
pretty
clear semantic
of `TableEnv.executeSql`, and it's very convenient for users if
they
want
to execute multiple
sql statements. They can simulate either synced or async
execution
with
this building block.
This will introduce slight overhead for users, but compared to
the
confusion we might
cause if we introduce such a method of our own, I think it's
better
to
wait
for some more
feedback.
Best,
Kurt
On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <
twal...@apache.org>
wrote:
Hi Kurt,
we can also shorten it to `table.dml-sync` if that would help.
Then
it
would confuse users that do a regular `.executeSql("INSERT
INTO")`
in a
notebook session.
In any case users will need to learn the semantics of this
option.
`table.multi-dml-sync` should be described as "If a you are in
a
multi
statement environment, execute DMLs synchrounous.". I don't
have
a
strong opinion on shortening it to `table.dml-sync`.
Just to clarify the implementation: The option should be
handled
by
the
SQL Client only, but the name can be shared accross platforms.
Regards,
Timo
On 23.02.21 09:54, Kurt Young wrote:
Sorry for the late reply, but I'm confused by
`table.multi-dml-sync`.
IIUC this config will take effect with 2 use cases:
1. SQL client, either interactive mode or executing multiple
statements
via
-f. In most cases,
there will be only one INSERT INTO statement but we are
controlling
the
sync/async behavior
with "*multi-dml*-sync". I think this will confuse a lot of
users.
Besides,
2. TableEnvironment#executeMultiSql(), but this is future
work,
we
are
also
not sure if we will
really introduce this in the future.
I would prefer to introduce this option for only sql client.
For
platforms
Timo mentioned which
need to control such behavior, I think it's easy and flexible
to
introduce
one on their own.
Best,
Kurt
On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <
fskm...@gmail.com
wrote:
Hi everyone.
Sorry for the late response.
For `execution.runtime-mode`, I think it's much better than
`table.execution.mode`. Thanks for Timo's suggestions!
For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We
should
clarify
the
usage of the SHOW CREATE TABLE statements. It should be
allowed
to
specify
the table that is fully qualified and only works for the
table
that
is
created by the sql statements.
I have updated the FLIP with suggestions. It seems we have
reached
a
consensus, I'd like to start a formal vote for the FLIP.
Please vote +1 to approve the FLIP, or -1 with a comment.
Best,
Shengkai
Jark Wu <imj...@gmail.com> 于2021年2月15日周一 下午10:50写道:
Hi Ingo,
1) I think you are right, the table path should be
fully-qualified.
2) I think this is also a good point. The SHOW CREATE TABLE
only aims to print DDL for the tables registered using SQL
CREATE
TABLE
DDL.
If a table is registered using Table API, e.g.
`StreamTableEnvironment#createTemporaryView(String,
DataStream)`,
currently it's not possible to print DDL for such tables.
I think we should point it out in the FLIP.
Best,
Jark
On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <i...@ververica.com
wrote:
Hi all,
I have a couple questions about the SHOW CREATE TABLE
statement.
1) Contrary to the example in the FLIP I think the returned
DDL
should
always have the table identifier fully-qualified. Otherwise
the
DDL
depends
on the current context (catalog/database), which could be
surprising,
especially since "the same" table can behave differently if
created
in
different catalogs.
2) How should this handle tables which cannot be fully
characterized
by
properties only? I don't know if there's an example for
this
yet,
but
hypothetically this is not currently a requirement, right?
This
isn't
as
much of a problem if this syntax is SQL-client-specific,
but
if
it's
general Flink SQL syntax we should consider this (one way
or
another).
Regards
Ingo
On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <
twal...@apache.org
wrote:
Hi Shengkai,
thanks for updating the FLIP.
I have one last comment for the option
`table.execution.mode`.
Should
we
already use the global Flink option
`execution.runtime-mode`
instead?
We are using Flink's options where possible (e.g. `
pipeline.name`
and
`parallism.default`) why not also for batch/streaming
mode?
The description of the option matches to the Blink planner
behavior:
```
Among other things, this controls task scheduling, network
shuffle
behavior, and time semantics.
```
Regards,
Timo
On 10.02.21 06:30, Shengkai Fang wrote:
Hi, guys.
I have updated the FLIP. It seems we have reached
agreement.
Maybe
we
can
start the vote soon. If anyone has other questions,
please
leave
your
comments.
Best,
Shengkai
Rui Li <lirui.fu...@gmail.com>于2021年2月9日 周二下午7:52写道:
Hi guys,
The conclusion sounds good to me.
On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <
fskm...@gmail.com
wrote:
Hi, Timo, Jark.
I am fine with the new option name.
Best,
Shengkai
Timo Walther <twal...@apache.org>于2021年2月9日
周二下午5:35写道:
Yes, `TableEnvironment#executeMultiSql()` can be
future
work.
@Rui, Shengkai: Are you also fine with this
conclusion?
Thanks,
Timo
On 09.02.21 10:14, Jark Wu wrote:
I'm fine with `table.multi-dml-sync`.
My previous concern about "multi" is that DML in CLI
looks
like
single
statement.
But we can treat CLI as a multi-line accepting
statements
from
opening
to
closing.
Thus, I'm fine with `table.multi-dml-sync`.
So the conclusion is `table.multi-dml-sync` (false by
default),
and
we
will
support this config
in SQL CLI first, will support it in
TableEnvironment#executeMultiSql()
in
the future, right?
Best,
Jark
On Tue, 9 Feb 2021 at 16:37, Timo Walther <
twal...@apache.org
wrote:
Hi everyone,
I understand Rui's concerns. `table.dml-sync` should
not
apply
to
regular `executeSql`. Actually, this option makes
only
sense
when
executing multi statements. Once we have a
`TableEnvironment.executeMultiSql()` this config
could
be
considered.
Maybe we can find a better generic name? Other
platforms
will
also
need
to have this config option, which is why I would
like
to
avoid a
SQL
Client specific option. Otherwise every platform has
to
come
up
with
this important config option separately.
Maybe `table.multi-dml-sync`
`table.multi-stmt-sync`?
Or
other
opinions?
Regards,
Timo
On 09.02.21 08:50, Shengkai Fang wrote:
Hi, all.
I think it may cause user confused. The main
problem
is
we
have
no
means
to detect the conflict configuration, e.g. users
set
the
option
true
and
use `TableResult#await` together.
Best,
Shengkai.
--
Best regards!
Rui Li