Re: [DISCUSS] Improve the name and structure of job vertex and operator name for job

Chesnay Schepler Tue, 16 Nov 2021 03:14:42 -0800

Why should this be specific to the table API? The datastream API hassimilar issues with long operator names (like windowing).


On 16/11/2021 11:22, wenlong.lwl wrote:

Thanks Godfrey for the suggestion.
Regarding 1, how about table.optimizer.simplify-operator-name-enabled,
which means that we would simplify the name of operator and keep the
details in description only.
"table.optimizer.operator-name.description-enabled" can not describe what
it means I think.
Regarding 2, I agree that it is better to use enum instead of boolean. For
key I think you are meaning "pipeline.vertex-description-pattern" instead
of "pipeline.vertex-name-pattern", and I would like to choose DEFAULT/TREE
for values.


Best,
Wenlong

On Tue, 16 Nov 2021 at 17:28, godfrey he <[email protected]> wrote:

Thanks for creating this FLIP Wenlong.

The FLIP already looks pretty solid, I think the config options can be
improved a little:
1) about table.optimizer.separate-name-and-description, I think
"operator-name" should be considered in the option,
how about table.optimizer.operator-name.description-enabled ?
2) about pipeline.tree-mode-vertex-description, I think we can make
the mode accept string value,
which is more flexible. How about pipeline.vertex-name-pattern, the
default value is "TREE",
another option is "CASCADE" (or "DEFAULT", which is more simple)

What do you think?

Best,
Godfrey

wenlong.lwl <[email protected]> 于2021年11月15日周一 下午6:36写道：

Hi, all, FYI the FLIP doc has been created ：

https://cwiki.apache.org/confluence/display/FLINK/FLIP-195%3A+Improve+the+name+and+structure+of+vertex+and+operator+name+for+sql+job

Best,
Wenlong

On Mon, 15 Nov 2021 at 11:41, wenlong.lwl <[email protected]>

wrote:

Hi all,
Thanks for the feedback, It seems that the proposal is accepted by all

of

you guys. I will prepare a formal FLIP document and then go ahead to

the

vote stage.
If any one has any other comments or suggestions, please let me know,
thanks.

Best,
Wenlong

On Fri, 12 Nov 2021 at 05:54, Neng Lu <[email protected]> wrote:

+1 (non-binding)
This change will really help to ease developer life.

On Thu, Nov 11, 2021 at 6:33 AM Guowei Ma <[email protected]>

wrote:

+1
This would be very helpful for our debugging online job.

Best,
Guowei


On Thu, Nov 11, 2021 at 8:03 PM Yuepeng Pan <[email protected]> wrote:

+1. It's useful to understand the job topology.
Looking forward to this feature.
Best,
Yuepeng Pan.






At 2021-11-11 19:44:44, "Yangze Guo" <[email protected]> wrote:

+1. That's gonna help a lot for debugging.

Best,
Yangze Guo

On Thu, Nov 11, 2021 at 7:37 PM Till Rohrmann <

[email protected]>

wrote:

This improvement looks like it makes the life of our users a lot

easier

when it comes to understanding logs and reading the UI. Hence

+1.

Cheers,
Till

On Thu, Nov 11, 2021 at 11:59 AM JING ZHANG <

[email protected]>

wrote:

Big +1.

This is a problem frequently encountered in our production

platform,

look

forward to this improvement.

Best,
Jing Zhang

Martijn Visser <[email protected]> 于2021年11月11日周四

下午6:26写道：

+1. Looks much better now

On Thu, 11 Nov 2021 at 11:07, godfrey he <

[email protected]>

wrote:

Thanks for driving this, this improvement solves a

long-complained

problem, +1

Best,
Godfrey

Jark Wu <[email protected]> 于2021年11月11日周四 下午5:40写道：

+1 for this. It looks much more clear and structured.

Best,
Jark

On Thu, 11 Nov 2021 at 17:23, Chesnay Schepler <

[email protected]>

wrote:

I'm generally in favor of it, and there are already

tickets

that

proposed a dedicated operator/vertex description:

https://issues.apache.org/jira/browse/FLINK-20388
https://issues.apache.org/jira/browse/FLINK-21858

On 11/11/2021 10:02, wenlong.lwl wrote:

Hi, all, I would like to start a discussion about an

improvement

on

name

and structure of job vertex name, mainly to improve

experience of

debugging

and analyzing sql job at runtime.

the main proposed changes including:
1. separate description and name for operator, so

that

we

can

have

detailed

info at description and shorter name, which could be

more

friendly

for

external systems like logging/metrics without losing

useful

information.

2. introduce a tree-mode vertex description which

can

make

the

description

more readable and easier to understand
3. clean up and improve description for sql operator

here is an example with the changes for a sql job:

vertex name:
GlobalGroupAggregate[52] -> (Calc[53] ->

NotNullEnforcer[54] ->

Sink:

tb_ads_dwi_pub_hbd_spm_dtr_002_003[54], Calc[55] ->

NotNullEnforcer[56]

->

Sink: tb_ads_dwi_pub_hbd_spm_dtr_002_004[56])
vertex description:
[52]:GlobalGroupAggregate(groupBy=[stat_date,

spm_url_ab,

client],

select=[stat_date, spm_url_ab, client,

COUNT(count1$0)

AS

clk_cnt_app_mtr_001, COUNT(distinct$0 count$1) AS

clk_uv_app_mtr_001,

COUNT(count1$2) AS clk_cnt_app_mtr_002,

COUNT(distinct$0

count$3)

AS

clk_uv_app_mtr_002, COUNT(count1$4) AS

clk_cnt_app_mtr_003,

COUNT(distinct$0 count$5) AS clk_uv_app_mtr_003]) :-
[53]:Calc(select=[CASE((client <> ''),

CONCAT_WS('\u0004',

CONCAT(SUBSTRING(MD5(CONCAT(spm_url_ab, '12345')),

1,

4),

':md5'),

CONCAT(spm_url_ab, ':spmab'), '12345:app',

CONCAT(client,

':client'),

CONCAT('ddd:', stat_date)),

null:VARCHAR(2147483647)) AS

rowkey,

clk_cnt_app_mtr_001 AS clk_cnt_app_dtr_001,

clk_uv_app_mtr_001 AS

clk_uv_app_dtr_001, clk_cnt_app_mtr_002 AS

clk_cnt_app_dtr_002,

clk_uv_app_mtr_002 AS clk_uv_app_dtr_002,

clk_cnt_app_mtr_003 AS

clk_cnt_app_dtr_003, clk_uv_app_mtr_003 AS

clk_uv_app_dtr_003]) :

+-

[54]:NotNullEnforcer(fields=[rowkey]) : +-

[54]:Sink(table=[default_catalog.default_database.tb_ads_dwi_pub_hbd_spm_dtr_002_003],

fields=[rowkey, clk_cnt_app_dtr_001,

clk_uv_app_dtr_001,

clk_cnt_app_dtr_002, clk_uv_app_dtr_002,

clk_cnt_app_dtr_003,

clk_uv_app_dtr_003]) +-

[55]:Calc(select=[CASE((client

<>

''),

CONCAT_WS('\u0004',

CONCAT(SUBSTRING(MD5(CONCAT(spm_url_ab,

'12345')), 1,

4), ':md5'), CONCAT(spm_url_ab, ':spmab'),

'12345:app',

CONCAT('ddd:',

stat_date), CONCAT(client, ':client')), (client =

''),

CONCAT_WS('\u0004',

CONCAT(SUBSTRING(MD5(CONCAT(spm_url_ab, '92459')),

1,

4),

':md5'),

CONCAT(spm_url_ab, ':spmab'), '92459:app',

CONCAT('ddd:',

stat_date)),

null:VARCHAR(2147483647)) AS rowkey,

clk_cnt_app_mtr_001 AS

clk_cnt_app_dtr_001, clk_uv_app_mtr_001 AS

clk_uv_app_dtr_001,

clk_cnt_app_mtr_002 AS clk_cnt_app_dtr_002,

clk_uv_app_mtr_002 AS

clk_uv_app_dtr_002, clk_cnt_app_mtr_003 AS

clk_cnt_app_dtr_003,

clk_uv_app_mtr_003 AS clk_uv_app_dtr_003]) +-
[56]:NotNullEnforcer(fields=[rowkey]) +-

[56]:Sink(table=[default_catalog.default_database.tb_ads_dwi_pub_hbd_spm_dtr_002_004],

fields=[rowkey, clk_cnt_app_dtr_001,

clk_uv_app_dtr_001,

clk_cnt_app_dtr_002, clk_uv_app_dtr_002,

clk_cnt_app_dtr_003,

clk_uv_app_dtr_003])

For more detail on the proposal:

https://docs.google.com/document/d/1VUVJeHY_We09GY53-K2lETP3HUNZG9wMKyecFWk_Wxk

https://docs.google.com/document/d/1VUVJeHY_We09GY53-K2lETP3HUNZG9wMKyecFWk_Wxk/edit#


Looking forward to your feedback, thanks.

Bests

Wenlong Lyu

Re: [DISCUSS] Improve the name and structure of job vertex and operator name for job

Reply via email to