I'm generally in favor of it, and there are already tickets that proposed a dedicated operator/vertex description:

https://issues.apache.org/jira/browse/FLINK-20388
https://issues.apache.org/jira/browse/FLINK-21858

On 11/11/2021 10:02, wenlong.lwl wrote:
Hi, all, I would like to start a discussion about an improvement on name
and structure of job vertex name, mainly to improve experience of debugging
and analyzing sql job at runtime.

the main proposed changes including:
1. separate description and name for operator, so that we can have detailed
info at description and shorter name, which could be more friendly for
external systems like logging/metrics without losing useful information.
2. introduce a tree-mode vertex description which can make the description
more readable and easier to understand
3. clean up and improve description for sql operator

here is an example with the changes for a sql job:

vertex name:
GlobalGroupAggregate[52] -> (Calc[53] -> NotNullEnforcer[54] -> Sink:
tb_ads_dwi_pub_hbd_spm_dtr_002_003[54], Calc[55] -> NotNullEnforcer[56] ->
Sink: tb_ads_dwi_pub_hbd_spm_dtr_002_004[56])
vertex description:
[52]:GlobalGroupAggregate(groupBy=[stat_date, spm_url_ab, client],
select=[stat_date, spm_url_ab, client, COUNT(count1$0) AS
clk_cnt_app_mtr_001, COUNT(distinct$0 count$1) AS clk_uv_app_mtr_001,
COUNT(count1$2) AS clk_cnt_app_mtr_002, COUNT(distinct$0 count$3) AS
clk_uv_app_mtr_002, COUNT(count1$4) AS clk_cnt_app_mtr_003,
COUNT(distinct$0 count$5) AS clk_uv_app_mtr_003]) :-
[53]:Calc(select=[CASE((client <> ''), CONCAT_WS('\u0004',
CONCAT(SUBSTRING(MD5(CONCAT(spm_url_ab, '12345')), 1, 4), ':md5'),
CONCAT(spm_url_ab, ':spmab'), '12345:app', CONCAT(client, ':client'),
CONCAT('ddd:', stat_date)), null:VARCHAR(2147483647)) AS rowkey,
clk_cnt_app_mtr_001 AS clk_cnt_app_dtr_001, clk_uv_app_mtr_001 AS
clk_uv_app_dtr_001, clk_cnt_app_mtr_002 AS clk_cnt_app_dtr_002,
clk_uv_app_mtr_002 AS clk_uv_app_dtr_002, clk_cnt_app_mtr_003 AS
clk_cnt_app_dtr_003, clk_uv_app_mtr_003 AS clk_uv_app_dtr_003]) : +-
[54]:NotNullEnforcer(fields=[rowkey]) : +-
[54]:Sink(table=[default_catalog.default_database.tb_ads_dwi_pub_hbd_spm_dtr_002_003],
fields=[rowkey, clk_cnt_app_dtr_001, clk_uv_app_dtr_001,
clk_cnt_app_dtr_002, clk_uv_app_dtr_002, clk_cnt_app_dtr_003,
clk_uv_app_dtr_003]) +- [55]:Calc(select=[CASE((client <> ''),
CONCAT_WS('\u0004', CONCAT(SUBSTRING(MD5(CONCAT(spm_url_ab, '12345')), 1,
4), ':md5'), CONCAT(spm_url_ab, ':spmab'), '12345:app', CONCAT('ddd:',
stat_date), CONCAT(client, ':client')), (client = ''), CONCAT_WS('\u0004',
CONCAT(SUBSTRING(MD5(CONCAT(spm_url_ab, '92459')), 1, 4), ':md5'),
CONCAT(spm_url_ab, ':spmab'), '92459:app', CONCAT('ddd:', stat_date)),
null:VARCHAR(2147483647)) AS rowkey, clk_cnt_app_mtr_001 AS
clk_cnt_app_dtr_001, clk_uv_app_mtr_001 AS clk_uv_app_dtr_001,
clk_cnt_app_mtr_002 AS clk_cnt_app_dtr_002, clk_uv_app_mtr_002 AS
clk_uv_app_dtr_002, clk_cnt_app_mtr_003 AS clk_cnt_app_dtr_003,
clk_uv_app_mtr_003 AS clk_uv_app_dtr_003]) +-
[56]:NotNullEnforcer(fields=[rowkey]) +-
[56]:Sink(table=[default_catalog.default_database.tb_ads_dwi_pub_hbd_spm_dtr_002_004],
fields=[rowkey, clk_cnt_app_dtr_001, clk_uv_app_dtr_001,
clk_cnt_app_dtr_002, clk_uv_app_dtr_002, clk_cnt_app_dtr_003,
clk_uv_app_dtr_003])

For more detail on the proposal:
https://docs.google.com/document/d/1VUVJeHY_We09GY53-K2lETP3HUNZG9wMKyecFWk_Wxk
<https://docs.google.com/document/d/1VUVJeHY_We09GY53-K2lETP3HUNZG9wMKyecFWk_Wxk/edit#>

Looking forward to your feedback, thanks.

Bests

Wenlong Lyu


Reply via email to