Hi, all, I would like to start a discussion about an improvement on name
and structure of job vertex name, mainly to improve experience of debugging
and analyzing sql job at runtime.
the main proposed changes including:
1. separate description and name for operator, so that we can have detailed
info at description and shorter name, which could be more friendly for
external systems like logging/metrics without losing useful information.
2. introduce a tree-mode vertex description which can make the description
more readable and easier to understand
3. clean up and improve description for sql operator
here is an example with the changes for a sql job:
vertex name:
GlobalGroupAggregate[52] -> (Calc[53] -> NotNullEnforcer[54] -> Sink:
tb_ads_dwi_pub_hbd_spm_dtr_002_003[54], Calc[55] -> NotNullEnforcer[56] ->
Sink: tb_ads_dwi_pub_hbd_spm_dtr_002_004[56])
vertex description:
[52]:GlobalGroupAggregate(groupBy=[stat_date, spm_url_ab, client],
select=[stat_date, spm_url_ab, client, COUNT(count1$0) AS
clk_cnt_app_mtr_001, COUNT(distinct$0 count$1) AS clk_uv_app_mtr_001,
COUNT(count1$2) AS clk_cnt_app_mtr_002, COUNT(distinct$0 count$3) AS
clk_uv_app_mtr_002, COUNT(count1$4) AS clk_cnt_app_mtr_003,
COUNT(distinct$0 count$5) AS clk_uv_app_mtr_003]) :-
[53]:Calc(select=[CASE((client <> ''), CONCAT_WS('\u0004',
CONCAT(SUBSTRING(MD5(CONCAT(spm_url_ab, '12345')), 1, 4), ':md5'),
CONCAT(spm_url_ab, ':spmab'), '12345:app', CONCAT(client, ':client'),
CONCAT('ddd:', stat_date)), null:VARCHAR(2147483647)) AS rowkey,
clk_cnt_app_mtr_001 AS clk_cnt_app_dtr_001, clk_uv_app_mtr_001 AS
clk_uv_app_dtr_001, clk_cnt_app_mtr_002 AS clk_cnt_app_dtr_002,
clk_uv_app_mtr_002 AS clk_uv_app_dtr_002, clk_cnt_app_mtr_003 AS
clk_cnt_app_dtr_003, clk_uv_app_mtr_003 AS clk_uv_app_dtr_003]) : +-
[54]:NotNullEnforcer(fields=[rowkey]) : +-
[54]:Sink(table=[default_catalog.default_database.tb_ads_dwi_pub_hbd_spm_dtr_002_003],
fields=[rowkey, clk_cnt_app_dtr_001, clk_uv_app_dtr_001,
clk_cnt_app_dtr_002, clk_uv_app_dtr_002, clk_cnt_app_dtr_003,
clk_uv_app_dtr_003]) +- [55]:Calc(select=[CASE((client <> ''),
CONCAT_WS('\u0004', CONCAT(SUBSTRING(MD5(CONCAT(spm_url_ab, '12345')), 1,
4), ':md5'), CONCAT(spm_url_ab, ':spmab'), '12345:app', CONCAT('ddd:',
stat_date), CONCAT(client, ':client')), (client = ''), CONCAT_WS('\u0004',
CONCAT(SUBSTRING(MD5(CONCAT(spm_url_ab, '92459')), 1, 4), ':md5'),
CONCAT(spm_url_ab, ':spmab'), '92459:app', CONCAT('ddd:', stat_date)),
null:VARCHAR(2147483647)) AS rowkey, clk_cnt_app_mtr_001 AS
clk_cnt_app_dtr_001, clk_uv_app_mtr_001 AS clk_uv_app_dtr_001,
clk_cnt_app_mtr_002 AS clk_cnt_app_dtr_002, clk_uv_app_mtr_002 AS
clk_uv_app_dtr_002, clk_cnt_app_mtr_003 AS clk_cnt_app_dtr_003,
clk_uv_app_mtr_003 AS clk_uv_app_dtr_003]) +-
[56]:NotNullEnforcer(fields=[rowkey]) +-
[56]:Sink(table=[default_catalog.default_database.tb_ads_dwi_pub_hbd_spm_dtr_002_004],
fields=[rowkey, clk_cnt_app_dtr_001, clk_uv_app_dtr_001,
clk_cnt_app_dtr_002, clk_uv_app_dtr_002, clk_cnt_app_dtr_003,
clk_uv_app_dtr_003])
For more detail on the proposal:
https://docs.google.com/document/d/1VUVJeHY_We09GY53-K2lETP3HUNZG9wMKyecFWk_Wxk
<https://docs.google.com/document/d/1VUVJeHY_We09GY53-K2lETP3HUNZG9wMKyecFWk_Wxk/edit#>
Looking forward to your feedback, thanks.
Bests
Wenlong Lyu