Why should this be specific to the table API? The datastream API has similar issues with long operator names (like windowing).

On 16/11/2021 11:22, wenlong.lwl wrote:
Thanks Godfrey for the suggestion.
Regarding 1, how about table.optimizer.simplify-operator-name-enabled,
which means that we would simplify the name of operator and keep the
details in description only.
"table.optimizer.operator-name.description-enabled" can not describe what
it means I think.
Regarding 2, I agree that it is better to use enum instead of boolean. For
key I think you are meaning "pipeline.vertex-description-pattern" instead
of "pipeline.vertex-name-pattern", and I would like to choose DEFAULT/TREE
for values.

Best,
Wenlong

On Tue, 16 Nov 2021 at 17:28, godfrey he <godfre...@gmail.com> wrote:

Thanks for creating this FLIP Wenlong.

The FLIP already looks pretty solid, I think the config options can be
improved a little:
1) about table.optimizer.separate-name-and-description, I think
"operator-name" should be considered in the option,
how about table.optimizer.operator-name.description-enabled ?
2) about pipeline.tree-mode-vertex-description, I think we can make
the mode accept string value,
which is more flexible. How about pipeline.vertex-name-pattern, the
default value is "TREE",
another option is "CASCADE" (or "DEFAULT", which is more simple)

What do you think?

Best,
Godfrey

wenlong.lwl <wenlong88....@gmail.com> 于2021年11月15日周一 下午6:36写道:

Hi, all, FYI the FLIP doc has been created :

https://cwiki.apache.org/confluence/display/FLINK/FLIP-195%3A+Improve+the+name+and+structure+of+vertex+and+operator+name+for+sql+job
Best,
Wenlong

On Mon, 15 Nov 2021 at 11:41, wenlong.lwl <wenlong88....@gmail.com>
wrote:
Hi all,
Thanks for the feedback, It seems that the proposal is accepted by all
of
you guys. I will prepare a formal FLIP document and then go ahead to
the
vote stage.
If any one has any other comments or suggestions, please let me know,
thanks.

Best,
Wenlong

On Fri, 12 Nov 2021 at 05:54, Neng Lu <nl...@apache.org> wrote:

+1 (non-binding)
This change will really help to ease developer life.

On Thu, Nov 11, 2021 at 6:33 AM Guowei Ma <guowei....@gmail.com>
wrote:
+1
This would be very helpful for our debugging online job.

Best,
Guowei


On Thu, Nov 11, 2021 at 8:03 PM Yuepeng Pan <flin...@126.com> wrote:

+1. It's useful to understand the job topology.
Looking forward to this feature.
Best,
Yuepeng Pan.






At 2021-11-11 19:44:44, "Yangze Guo" <karma...@gmail.com> wrote:
+1. That's gonna help a lot for debugging.

Best,
Yangze Guo

On Thu, Nov 11, 2021 at 7:37 PM Till Rohrmann <
trohrm...@apache.org>
wrote:
This improvement looks like it makes the life of our users a lot
easier
when it comes to understanding logs and reading the UI. Hence
+1.
Cheers,
Till

On Thu, Nov 11, 2021 at 11:59 AM JING ZHANG <
beyond1...@gmail.com>
wrote:
Big +1.

This is a problem frequently encountered in our production
platform,
look
forward to this improvement.

Best,
Jing Zhang

Martijn Visser <mart...@ververica.com> 于2021年11月11日周四
下午6:26写道:
+1. Looks much better now

On Thu, 11 Nov 2021 at 11:07, godfrey he <
godfre...@gmail.com>
wrote:
Thanks for driving this, this improvement solves a
long-complained
problem, +1

Best,
Godfrey

Jark Wu <imj...@gmail.com> 于2021年11月11日周四 下午5:40写道:
+1 for this. It looks much more clear and structured.

Best,
Jark

On Thu, 11 Nov 2021 at 17:23, Chesnay Schepler <
ches...@apache.org>
wrote:
I'm generally in favor of it, and there are already
tickets
that
proposed a dedicated operator/vertex description:

https://issues.apache.org/jira/browse/FLINK-20388
https://issues.apache.org/jira/browse/FLINK-21858

On 11/11/2021 10:02, wenlong.lwl wrote:
Hi, all, I would like to start a discussion about an
improvement
on
name
and structure of job vertex name, mainly to improve
experience of
debugging
and analyzing sql job at runtime.

the main proposed changes including:
1. separate description and name for operator, so
that
we
can
have
detailed
info at description and shorter name, which could be
more
friendly
for
external systems like logging/metrics without losing
useful
information.
2. introduce a tree-mode vertex description which
can
make
the
description
more readable and easier to understand
3. clean up and improve description for sql operator

here is an example with the changes for a sql job:

vertex name:
GlobalGroupAggregate[52] -> (Calc[53] ->
NotNullEnforcer[54] ->
Sink:
tb_ads_dwi_pub_hbd_spm_dtr_002_003[54], Calc[55] ->
NotNullEnforcer[56]
->
Sink: tb_ads_dwi_pub_hbd_spm_dtr_002_004[56])
vertex description:
[52]:GlobalGroupAggregate(groupBy=[stat_date,
spm_url_ab,
client],
select=[stat_date, spm_url_ab, client,
COUNT(count1$0)
AS
clk_cnt_app_mtr_001, COUNT(distinct$0 count$1) AS
clk_uv_app_mtr_001,
COUNT(count1$2) AS clk_cnt_app_mtr_002,
COUNT(distinct$0
count$3)
AS
clk_uv_app_mtr_002, COUNT(count1$4) AS
clk_cnt_app_mtr_003,
COUNT(distinct$0 count$5) AS clk_uv_app_mtr_003]) :-
[53]:Calc(select=[CASE((client <> ''),
CONCAT_WS('\u0004',
CONCAT(SUBSTRING(MD5(CONCAT(spm_url_ab, '12345')),
1,
4),
':md5'),
CONCAT(spm_url_ab, ':spmab'), '12345:app',
CONCAT(client,
':client'),
CONCAT('ddd:', stat_date)),
null:VARCHAR(2147483647)) AS
rowkey,
clk_cnt_app_mtr_001 AS clk_cnt_app_dtr_001,
clk_uv_app_mtr_001 AS
clk_uv_app_dtr_001, clk_cnt_app_mtr_002 AS
clk_cnt_app_dtr_002,
clk_uv_app_mtr_002 AS clk_uv_app_dtr_002,
clk_cnt_app_mtr_003 AS
clk_cnt_app_dtr_003, clk_uv_app_mtr_003 AS
clk_uv_app_dtr_003]) :
+-
[54]:NotNullEnforcer(fields=[rowkey]) : +-

[54]:Sink(table=[default_catalog.default_database.tb_ads_dwi_pub_hbd_spm_dtr_002_003],
fields=[rowkey, clk_cnt_app_dtr_001,
clk_uv_app_dtr_001,
clk_cnt_app_dtr_002, clk_uv_app_dtr_002,
clk_cnt_app_dtr_003,
clk_uv_app_dtr_003]) +-
[55]:Calc(select=[CASE((client
<>
''),
CONCAT_WS('\u0004',
CONCAT(SUBSTRING(MD5(CONCAT(spm_url_ab,
'12345')), 1,
4), ':md5'), CONCAT(spm_url_ab, ':spmab'),
'12345:app',
CONCAT('ddd:',
stat_date), CONCAT(client, ':client')), (client =
''),
CONCAT_WS('\u0004',
CONCAT(SUBSTRING(MD5(CONCAT(spm_url_ab, '92459')),
1,
4),
':md5'),
CONCAT(spm_url_ab, ':spmab'), '92459:app',
CONCAT('ddd:',
stat_date)),
null:VARCHAR(2147483647)) AS rowkey,
clk_cnt_app_mtr_001 AS
clk_cnt_app_dtr_001, clk_uv_app_mtr_001 AS
clk_uv_app_dtr_001,
clk_cnt_app_mtr_002 AS clk_cnt_app_dtr_002,
clk_uv_app_mtr_002 AS
clk_uv_app_dtr_002, clk_cnt_app_mtr_003 AS
clk_cnt_app_dtr_003,
clk_uv_app_mtr_003 AS clk_uv_app_dtr_003]) +-
[56]:NotNullEnforcer(fields=[rowkey]) +-

[56]:Sink(table=[default_catalog.default_database.tb_ads_dwi_pub_hbd_spm_dtr_002_004],
fields=[rowkey, clk_cnt_app_dtr_001,
clk_uv_app_dtr_001,
clk_cnt_app_dtr_002, clk_uv_app_dtr_002,
clk_cnt_app_dtr_003,
clk_uv_app_dtr_003])

For more detail on the proposal:

https://docs.google.com/document/d/1VUVJeHY_We09GY53-K2lETP3HUNZG9wMKyecFWk_Wxk
<
https://docs.google.com/document/d/1VUVJeHY_We09GY53-K2lETP3HUNZG9wMKyecFWk_Wxk/edit#

Looking forward to your feedback, thanks.

Bests

Wenlong Lyu



Reply via email to