hi、 我这面state backend用的是FsStateBackend,状态保存在hdfs On Mon, May 11, 2020 at 11:19 AM Benchao Li <[email protected]> wrote:
> Hi, > > 你用的是什么state backend呢?看你的情况很有可能跟这个有关系。比如用的是rocksdb,然后是普通磁盘的话,很容易遇到IO瓶颈。 > > 宇张 <[email protected]> 于2020年5月11日周一 上午11:14写道: > > > hi、 > > 我这面使用flink1.9的Blink sql完成数据转换操作,但遇到如下问题: > > 1、使用row_number函数丢失主键 > > 2、row_number函数和时态表关联联合使用程序吞吐量严重降低,对应sql如下: > > // 理论上这里面是不需要 distinct的,但sql中的主键blink提取不出来导致校验不通过,所以加了一个 > > SELECT distinct t1.id as > order_id,...,DATE_FORMAT(t1.proctime,'yyyy-MM-dd > > HH:mm:ss') as etl_time FROM (select id,...,proctime from (select > > data.index0.id,...,proctime,ROW_NUMBER() OVER (PARTITION BY > data.index0.id > > ORDER BY es desc) AS rowNum from installmentdb_t_line_item)tmp where > > rowNum<=1) t1 left join SNAP_T_OPEN_PAY_ORDER FOR SYSTEM_TIME AS OF > > t1.proctime t2 on t2.LI_ID= t1.id left join SNAP_T_SALES_ORDER FOR > > SYSTEM_TIME AS OF t1.proctime t4 ON t1.so_id =t4.ID > > > > > 上面的sql吞吐率很低,每秒就处理几条数据,而下面两种情况分开跑,吞吐量都能达标,仅时态表关联能到到几千条,仅rownumber能达到几万条,但不知道为什么他们俩联合后就只有几条了 > > > > SELECT distinct t1.id as > order_id,...,DATE_FORMAT(t1.proctime,'yyyy-MM-dd > > HH:mm:ss') as etl_time FROM (select id,...,proctime from (select > > data.index0.id,...,proctime from installmentdb_t_line_item)tmp ) t1 left > > join SNAP_T_OPEN_PAY_ORDER FOR SYSTEM_TIME AS OF t1.proctime t2 on > > t2.LI_ID= t1.id left join SNAP_T_SALES_ORDER FOR SYSTEM_TIME AS OF > > t1.proctime t4 ON t1.so_id =t4.ID > > > > SELECT distinct t1.id as > order_id,...,DATE_FORMAT(t1.proctime,'yyyy-MM-dd > > HH:mm:ss') as etl_time FROM (select id,...,proctime from (select > > data.index0.id,...,proctime,ROW_NUMBER() OVER (PARTITION BY > data.index0.id > > ORDER BY es desc) AS rowNum from installmentdb_t_line_item)tmp where > > rowNum<=1) t1 > > > > > -- > > Benchao Li > School of Electronics Engineering and Computer Science, Peking University > Tel:+86-15650713730 > Email: [email protected]; [email protected] >
